Site Reliability Engineer

Runloop
San Francisco, CA

About Runloop

Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform enables teams to experiment, iterate, and deploy their projects without the friction of environment setup and dependencies. We are a small but mighty team dedicated to building a rock-solid platform that empowers innovation.

The Role

We're looking for a skilled and passionate Site Reliability Engineer to join our team. As an SRE, you'll be responsible for the reliability, observability, performance, and security of our core platform—the very foundation on which our users build their futures. You'll work closely with our engineering team to develop and maintain the systems that power our code sandboxes, ensuring a seamless and stable experience for our customers. This is a critical role that blends a deep understanding of operations with a software engineering mindset.

Responsibilities

  • Design and maintain our production infrastructure on cloud platforms like AWS, GCP, or Azure.

  • Monitor and respond to system alerts and incidents, ensuring high availability and a secure environment for our users' code using Grafana, Prometheus

  • Collaborate with developers to ensure new features and services are designed with scalability and reliability in mind.

  • Troubleshoot and resolve complex issues related to our infrastructure, networking, and the sandbox environment.

  • Participate in an on-call rotation to support our production systems.

  • Define and track SLIs/SLOs, manage error budgets, and proactively monitor distributed systems with logging and tracing.

  • Automate deployments, scaling, provisioning, and recovery tasks to reduce toil and build self-healing systems.

  • Lead incident response, conduct root-cause analysis, and facilitate blameless post-mortems to drive continual improvement.

  • Collaborate cross-functionally with product, engineering, and developer relations to ensure reliable releases and an outstanding developer experience.

  • Plan for capacity growth, forecast system usage, and contribute to safe release and change management processes.

  • Mentor and support front-end developers in building reliable distributed front-end systems (CDNs, caching, client-side observability).

Qualifications

  • Proven experience as an SRE, DevOps Engineer, or similar role.

  • Strong programming skills in languages like Python or Go.

  • Deep expertise in containerization technologies such as Docker and Kubernetes.

  • Experience with cloud infrastructure and tools like Terraform and/or Pulumi.

  • Familiarity with monitoring and alerting tools like Prometheus, Grafana, or Datadog.

  • A solid understanding of networking, security, and Linux systems administration.

  • Experience designing, scaling, and maintaining distributed systems (backend platforms, APIs, or front-end infrastructure).

  • Proficiency in implementing observability frameworks (metrics, logging, tracing) and aligning reliability goals with developer velocity.
    Hands-on experience managing incidents, running on-call operations, and producing actionable post-mortems.

  • Ability to mentor engineers and influence reliability practices across teams, especially for front-end infrastructure and performance.

Bonus Points

  • Experience with chaos engineering techniques, front-end observability tools (e.g., Sentry, RUM, synthetic monitoring), or building CI/CD pipelines for front-end delivery.

Benefits

  • Competitive salary and equity.

  • Comprehensive health, dental, and vision insurance for you and your dependents

  • Opportunity to work on cutting-edge AI technology and make a real impact on the future of software engineering.

  • Free lunch and snacks

Location:

  • In office 4 days a week in San Francisco, optional 1 day a week WFH

Join Us If you're excited about shaping the future of AI-driven software engineering and empowering developers to build the next generation of coding tools, we want to hear from you. Join Runloop and be at the forefront of the AI revolution in software development.

Runloop is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation, gender identity or any other characteristic protected by law.

Posted 2025-09-22

Recommended Jobs

Senior Data Analyst

Postman
San Francisco, CA

Who Are We? Postman is the world’s leading API platform, used by more than 40 million developers and 500,000 organizations, including 98% of the Fortune 500. Postman is helping developers and pro…

View Details
Posted 2025-09-13

C# Developer / Backend Integration

Technical Recruitment
Encinitas, CA

We’re looking for a Back-End Developer to help build our integration and analytics platform, someone who isn’t afraid to take on the challenges of building a new product. Make your mark and …

View Details
Posted 2025-09-13

Housing Case Manager

Elevate Community Services Inc
Fresno, CA

The Housing Case Manager is responsible for providing case management and navigation to individuals currently residing in our emergency shelter, Travel Inn.  Position Summary Elevate Community Ser…

View Details
Posted 2025-09-07

Senior Product Specialist

SAP
San Francisco, CA

Requisition ID: 433152 Work Area: Software-Design and Development Expected Travel: 0% Career Status: Professional Employment Type: Regular Full Time Career Level: T3-2 Original Pos…

View Details
Posted 2025-09-02

Full Time Family Practice Job San Bernardino, CA

Enterprise Medical Recruiting Enterprise Medical Recruiting
San Bernardino, CA

Enterprise Medical Recruiting is assisting a large group with locations in Los Angeles, Orange, and San Bernardino counties, California, to recruit an Urgent Care physician with a background in Famil…

View Details
Posted 2025-09-10

Interviewing for an Outpatient Family Medicine Position in San Jose, California

Enterprise Medical Recruiting
California

Enterprise Medical is assisting a unique primary care group delivering the highest-quality care and services available in their search for Outpatient Family Medicine physicians to join their team. Th…

View Details
Posted 2025-09-10

Staff Structural Test Engineer

Archer
San Jose, CA

Archer is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance the benefits of sustainable air mobility. We are …

View Details
Posted 2025-09-14

Contracts Administrator

Plan Group
Ontario, CA

Are you process driven and detail oriented? Does being part of an organization that helps build multi-million dollar projects excite you? Take the leap and make your mark at Plan Group! AsContracts Ad…

View Details
Posted 2025-09-08

Preschool Teacher

The Learning Experience #394
Thousand Oaks, CA

Job Description Job Description Benefits: ~401(k) ~ Competitive salary ~ Health insurance ~ Opportunity for advancement ~ Paid time off ~ Training & development ~ Vision insurance …

View Details
Posted 2025-07-29

Full Time Gastroenterology Job CA

Enterprise Medical Recruiting Enterprise Medical Recruiting
California

Enterprise Medical Recruiting is assisting a group in Chico, California, to recruit a new Gastroenterologist!They are seeking someone to perform bread-and-butter GI procedures, with advanced options …

View Details
Posted 2025-09-10