Senior Software Engineer, Reliability
We design high-performance, low-latency, high-throughput services, promote best practices, and engage in architectural design to embed reliability into every layer of our products. We seek your expertise in distributed systems, resilience engineering, and large-scale production operations — to identify gaps, design and build solutions, and guide product teams towards building highly available and resilient services. Your work will directly strengthen our SRE strategy, operational excellence, system performance, and reliability culture. We are seeking innovative problem-solvers passionate about large-scale distributed systems and eager to grow their skills in modern SRE practices. As a small team tackling complex challenges at scale, we offer the opportunity to make significant technical contributions while driving observability culture across the organization. 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services Experience coding in higher-level languages (e.g., Java, Scala, Go, Python) is preferred A strong interest in solving challenging problems using innovative and data-driven approaches An SRE-centric mindset — you build and manage systems with reliability, scalability, availability, and security as core principles Experience designing complex systems and frameworks using proven system design principles, such as NALSD (Non-Abstract Large System Design) methodologies Experience troubleshooting issues across distributed Linux environments, with comfort tracing problems across applications, systems, and networks Proficient with modern cloud technologies such as GCP, AWS, and Kubernetes Experienced in service observability practices and tools (e.g., Prometheus, OpenTelemetry, SignalFx, or similar) Comfortable learning new software, frameworks, and APIs quickly and effectively Natural collaborator who inspires others, mentors junior engineers, and drives technical excellence Bonus: Familiarity with PHP/JavaScript/NodeJS You will be constantly developing automations / frameworks / tools for better platform reliability/resilience/availability You will collaborate with other engineers on the team as well as cross functionally to foster solid software engineering principles and represent our engineering values You will participate in various POCs on new projects and frameworks being evaluated for the product/platforms You will improve our observability as both a developer/maintainer of systems/frameworks, and a mentor to our product development teams You will work with modern cloud-native technologies including container orchestration (Kubernetes, Docker), service mesh solutions (Istio, Linkerd), and cloud platforms (AWS, GCP) You will participate in product design reviews and architectural discussions to ensure reliability is considered early in the development lifecycle of product/services You will participate in a team on-call rotation
Recommended Jobs
Spravato Patient Care Technician
Spravato Patient Care Technician Location Orange, CA : Benefits: Dental insurance Health insurance Paid time off Vision insurance Are you doing what you love? We are! Pacific Neuropsyc…
Instructional Assistant
About The Role We are looking for dedicated and compassionate people to provide educational instruction, care and supervision, and emotional support to students with special needs. Our Instruction…
Finance Operations Specialist
Finance Operations Specialist Location Industry, CA : Company Overview: Happy Global is a fast-paced, dynamic, wholesale vape distributor based in the United States. Our Distribution Channel team is…
Flink Software Engineer
Seeking a Flink Software Engineer for a 3-4 month 100% remote contract position, with probability of extension. Location: SFO, CA Required Skills: • Hands-on experience working with Apa…
Mental Health Clinician Licensed, Spanish-Speaking (San Jose)
Mental Health Clinician Licensed, Spanish-Speaking Are you a person who enjoys helping others? Are you currently seeking fulfillment in your professional life? Hope Services is Silicon Valley…
Accounts Payable Project Specialist - Administration (Anaconda, MT)
If you are passionate about providing high-quality care to individuals in your community, we invite you to join our team at AWARE. AWARE is looking for the right person to join the team as a…
Oracle Middleware Software Developer
Benefits: ~401(k) ~401(k) matching ~ Dental insurance ~ Health insurance ~ Paid time off ~ Vision insurance (Auriga Website: Auriga Corporation was established in 1990, t…
Sr Construction Manager
Sr Construction Manager / Resident Engineer – Transportation Focus Position Overview Harris is seeking an experienced Senior Construction Manager/Resident Engineer to join our dynamic Program &…
Stylist - PT - Bloomingdale's Stanford - US
Stylist - PT - Bloomingdale's Stanford Palo Alto, California, United States THE ALLSAINTS TEAM At AllSaints we are in the business of feelings - making our customers feel cool a…
Senior Product Manager - Internal Tools
Created in 2002 by Marc Eckō, Complex is a leading global youth entertainment network showcasing the evolution of major pop culture categories, including streetwear and style, music, sneakers, and sp…