Senior Software Engineer, Reliability
We design high-performance, low-latency, high-throughput services, promote best practices, and engage in architectural design to embed reliability into every layer of our products. We seek your expertise in distributed systems, resilience engineering, and large-scale production operations — to identify gaps, design and build solutions, and guide product teams towards building highly available and resilient services. Your work will directly strengthen our SRE strategy, operational excellence, system performance, and reliability culture. We are seeking innovative problem-solvers passionate about large-scale distributed systems and eager to grow their skills in modern SRE practices. As a small team tackling complex challenges at scale, we offer the opportunity to make significant technical contributions while driving observability culture across the organization. 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services Experience coding in higher-level languages (e.g., Java, Scala, Go, Python) is preferred A strong interest in solving challenging problems using innovative and data-driven approaches An SRE-centric mindset — you build and manage systems with reliability, scalability, availability, and security as core principles Experience designing complex systems and frameworks using proven system design principles, such as NALSD (Non-Abstract Large System Design) methodologies Experience troubleshooting issues across distributed Linux environments, with comfort tracing problems across applications, systems, and networks Proficient with modern cloud technologies such as GCP, AWS, and Kubernetes Experienced in service observability practices and tools (e.g., Prometheus, OpenTelemetry, SignalFx, or similar) Comfortable learning new software, frameworks, and APIs quickly and effectively Natural collaborator who inspires others, mentors junior engineers, and drives technical excellence Bonus: Familiarity with PHP/JavaScript/NodeJS You will be constantly developing automations / frameworks / tools for better platform reliability/resilience/availability You will collaborate with other engineers on the team as well as cross functionally to foster solid software engineering principles and represent our engineering values You will participate in various POCs on new projects and frameworks being evaluated for the product/platforms You will improve our observability as both a developer/maintainer of systems/frameworks, and a mentor to our product development teams You will work with modern cloud-native technologies including container orchestration (Kubernetes, Docker), service mesh solutions (Istio, Linkerd), and cloud platforms (AWS, GCP) You will participate in product design reviews and architectural discussions to ensure reliability is considered early in the development lifecycle of product/services You will participate in a team on-call rotation
Recommended Jobs
Software Engineer II
Aurora hires talented people with diverse backgrounds who are ready to help build a transportation ecosystem that will make our roads safer, get crucial goods where they need to go, and make mobility…
Data Scientist AI
Description Data Scientist AI Internet Brands and WebMD are looking for a Data Scientist to join our Los Angeles based headquarters, and work on exciting personalization initiatives! The position …
Product Manager, AI Products
If you're a product leader who thrives in fast-paced environments, loves solving complex customer problems, and is driven to build AI-powered products that make a tangible impact, this is your role. …
Full-Stack Crypto Software Engineer
About us Curio builds bleeding edge crypto games and infrastructure. Since 2021 , we’ve been pioneers in the onchain game space, shipped mini games to thousands of users, and we are about to ship…
FLOOR TECHNICIAN (FULL TIME)
We are hiring immediately for full time FLOOR TECHNICIAN positions. Location : Long Beach Medical Center - 2801 Atlantic Avenue, Long Beach, CA 90802. Note: online applications accepted on…
Software Engineer: Backend & Infrastructure
About the role Your role, should you choose to join us, will be as an Infrastructure / Backend Engineer on our founding team. You're the right person for this role if you're excited to build not …
Manager, Technical Recruiting
Zoox is seeking an experienced Technical Recruiting Manager to lead one of our teams focused on hiring engineering talent. In this role, you will partner with engineering leaders to develop and execu…
Test Engineer, Manufacturing Test & Diagnostics
Zoox is looking for a test engineer to build test solutions for manufacturing the electronic platform that underpins our autonomous vehicles. In this role, you will be responsible for the development…
Lead Product Manager, Enterprise Services Management
The Product Management team drives Asana’s product strategy and execution, translating customer needs and opportunities into a compelling roadmap and working cross-functionally to deliver impactful s…
1564 - Software Engineer II
Sigma Defense is seeking a Software Engineer II to work under a Senior Software lead to develop new software for the Remotely Operated Tracking Radar #1 at the China Lake Range. This software produ…