Senior Site Reliability Engineer
Manage and deliver large-scale reliability improvement projects, ensuring systems are performant, available, and resilient. Drive the identification of performance bottlenecks and lead initiatives to optimize and scale critical systems and services. Architect and implement scalable infrastructure solutions to support growing user demands while maintaining system reliability. Lead the design and enhancement of monitoring frameworks, ensuring systems are highly observable, and support the response to production incidents. Take ownership of improving system resilience by designing fault-tolerant architectures and implementing disaster recovery strategies. Lead capacity planning initiatives to ensure system resources are proactively managed, preventing downtime or performance degradation under high load. Work closely with development, operations, and other technical teams to ensure seamless system integration and align on best practices for reliability. Act as a technical mentor within the organization, guiding teams through complex reliability challenges and promoting a culture of excellence. Help define and execute long-term reliability engineering strategies and standards to ensure the scalability and performance of core services. Develop and enforce best practices for operational excellence, including automation, incident management, and system monitoring, across engineering teams. 5+ years relevant experience and a Bachelor's degree OR Any equivalent combination of education and experience. Proven experience in Site Reliability Engineering, software development, or systems engineering, with a focus on end-to-end system reliability and performance. Strong understanding of backend architectures, including APIs, data flows, and cross-system dependencies. Hands-on experience developing monitoring, observability, and alerting solutions using tools such as Datadog, Firebase Crashlytics, or Sentry. Skilled in automation and tooling development using Python, Go, or similar languages to reduce manual processes and improve efficiency. Experience implementing SLIs/SLOs and leveraging metrics to drive measurable improvements in reliability and availability. Solid foundation in distributed systems, cloud infrastructure (AWS, GCP, or Azure), and CI/CD pipelines for reliable software delivery. Strong debugging and problem-solving skills, capable of diagnosing and resolving complex issues across mobile, API, and backend systems. Effective collaborator and communicator, skilled at partnering across mobile, backend, and SRE teams to deliver cohesive reliability outcomes. Demonstrated ability to mentor engineers and foster a culture of observability, automation, and operational excellence. Understanding of mobile (iOS and Android) applications Experience improving incident response workflows, postmortem and on-call models processes. Background in performance optimization, fault tolerance, and disaster recovery for large-scale systems. Experience collaborating within distributed or global engineering teams.
Recommended Jobs
Mortgage Loan Officer - Consumer Direct (Irvine, California)
About the team As a Mortgage Loan Officer within Zillow Home Loans, you'll be part of Zillow Group's FinTech division focused on delivering a high-volume, digitally driven home financing experie…
Systems Engineer
Overview Prime Healthcare is an award-winning health system headquartered in Ontario, California. Prime Healthcare operates 51 hospitals and has more than 360 outpatient locations in 14 states pr…
Manager, Commercial Contract
Job Description Description & Requirements AESC is looking to add a Commercial Contract Manager to our team remotely. Full-Time Remote About AESC US LLC AESC is an exciting, newly c…
Staff Software Engineer
Acts as a project or system leader, coordinating the activities of other engineers on the project or within the system Determines the technical tasks that other engineers will follow Actions result in…
Cook
We are seeking a daytime cook to join our team. This position is responsible for overseeing daily production and catering order fulfillment for our small cafe and marketplace in Burbank, CA. The ro…
Scientist Senior
This position is fully onsite in Thousand Oaks, CA, supporting oncology research, with typical hours of 8:30 AM - 5 PM, Monday through Friday. The ideal candidate will hold a PhD in Cancer Biology,…
Director, Global Media Planning - Santa Monica, 90404
Director, Global Media Planning - Santa Monica, 90404, United States of America How we LEAD: We are UMG, the Universal Music Group. We are the world’s leading music company. In everything w…
Legal Intake Manager
Block LLP is seeking an experienced Legal Intake Manager to lead and oversee our intake department, including both in-person and remote intake team members. This role requires a strong background in …
RDO Local Government, Northern CA
You Matter • Make a difference every day in the lives of the underserved • Join a mission driven organization with a people first culture • Excellent career growth opportunities Join us an…
Marketing/Admissions Director
Admission Director We’re so happy that you’re considering a career with us! Evergreen Healthcare Group stands as an excellent choice for those seeking a fulfilling career. At Evergreen Healthcar…