Senior Site Reliability Engineer
Take ownership of system performance monitoring, identify inefficiencies, and lead initiatives to improve the overall availability and reliability of digital platforms and applications. Lead and manage the response to complex, high-priority incidents, ensuring prompt resolution and a thorough root cause analysis to prevent future occurrences. Design and implement advanced automation frameworks to improve operational efficiency, streamline processes, and reduce human error. Lead reliability-focused initiatives, ensuring systems are highly available, resilient, and scalable, and promote best practices across engineering teams. Enhance the monitoring infrastructure by identifying key metrics, optimizing alerting, and improving system observability to ensure the reliability of large-scale systems. Forecast resource requirements and lead capacity planning activities to ensure systems can scale effectively to meet growing user demand. Ensure robust disaster recovery strategies are in place and conduct regular testing to ensure systems can recover quickly from failures. Partner with engineering and product teams to identify opportunities for improving system architecture, focusing on scalability, reliability, and fault tolerance. Provide mentorship and technical guidance to junior site reliability engineers, fostering skill development and knowledge sharing. Drive continuous improvement across operational workflows, identifying areas for optimization, cost reduction, and performance enhancement. 3+ years relevant experience and a Bachelor's degree OR Any equivalent combination of education and experience. Proven experience in Site Reliability Engineering, software development, or systems engineering, with a focus on end-to-end system reliability and performance. Strong understanding of backend architectures, including APIs, data flows, and cross-system dependencies. Hands-on experience developing monitoring, observability, and alerting solutions using tools such as Datadog, Firebase Crashlytics, or Sentry. Skilled in automation and tooling development using Python, Go, or similar languages to reduce manual processes and improve efficiency. Experience implementing SLIs/SLOs and leveraging metrics to drive measurable improvements in reliability and availability. Solid foundation in distributed systems, cloud infrastructure (AWS, GCP, or Azure), and CI/CD pipelines for reliable software delivery. Strong debugging and problem-solving skills, capable of diagnosing and resolving complex issues across mobile, API, and backend systems. Effective collaborator and communicator, skilled at partnering across mobile, backend, and SRE teams to deliver cohesive reliability outcomes. Demonstrated ability to mentor engineers and foster a culture of observability, automation, and operational excellence. Understanding of mobile (iOS and Android) applications Experience improving incident response workflows, postmortem and on-call models processes. Background in performance optimization, fault tolerance, and disaster recovery for large-scale systems. Experience collaborating within distributed or global engineering teams.
Recommended Jobs
Sprinter/Cargo Van Owner Operator
Hello, guys! We are hiring Cargo Van and Sprinter Van owner-operators for our company, VICTORIA LOGISTICS CARRIER. We work within an independent contract agreement and offer very competitive ra…
Sanitation | Swing Shift
Job Description: Sanitation Operator Descripción del Trabajo: Operador de Saneamiento Pay & Schedule / Pago y Horario Pay / Pago: $20.50 / hour (por hora) Shift / Turno: 4:00 PM – 12:00 AM …
Precision Dimensional Inspector — Machined Components
Precision Dimensional Inspector — Machined Components Picture your day beginning on the inspection floor as the first parts roll off the machines. You verify gage calibration, open the traveler, and…
Sr. Software Engineer, Observability and Telemetry
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions mus…
Executive Sourcer
About the Role As an Executive Sourcer, you will own the research and engagement strategy for US&C non-tech teams, typically at Director and above. Your clients will range from our ELT and the lea…
Nursing Supervisor Per Diem Days
At ScionHealth , we empower our caregivers to do what they do best. We value every voice by caring deeply for every patient and each other. We show courage by running toward the challenge and…
Senior Cloud Software Engineer - Infra Compute & Network
ABOUT US: Headquartered in the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world’s top provider of…
Electrical Engineer Manager
Electrical Engineer - Manager Job Summary The Electrical Engineer Manager leads an engineering team in designing, developing, and maintaining electrical switchgear systems for power distributio…
Skilled Nursing Certified Nursing Assistant, Full Time
Pay Range: $21.00 - $26.00/hour JOB CULTURE The Masonic Homes of California are committed to a culture of Leadership. Our culture is to provide superior service to members, residents and st…
Full Stack Engineer
Do you love to build? Are you one of the most ambitious people you know? If so, you'll be right at home at Unwrap. We are seeking a full stack software engineer who has a specific interest in work…