Senior Software Engineer, Reliability

Box
Redwood City, CA

We design high-performance, low-latency, high-throughput services, promote best practices, and engage in architectural design to embed reliability into every layer of our products. We seek your expertise in distributed systems, resilience engineering, and large-scale production operations — to identify gaps, design and build solutions, and guide product teams towards building highly available and resilient services. Your work will directly strengthen our SRE strategy, operational excellence, system performance, and reliability culture. We are seeking innovative problem-solvers passionate about large-scale distributed systems and eager to grow their skills in modern SRE practices. As a small team tackling complex challenges at scale, we offer the opportunity to make significant technical contributions while driving observability culture across the organization. 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services Experience coding in higher-level languages (e.g., Java, Scala, Go, Python) is preferred A strong interest in solving challenging problems using innovative and data-driven approaches An SRE-centric mindset — you build and manage systems with reliability, scalability, availability, and security as core principles Experience designing complex systems and frameworks using proven system design principles, such as NALSD (Non-Abstract Large System Design) methodologies Experience troubleshooting issues across distributed Linux environments, with comfort tracing problems across applications, systems, and networks Proficient with modern cloud technologies such as GCP, AWS, and Kubernetes Experienced in service observability practices and tools (e.g., Prometheus, OpenTelemetry, SignalFx, or similar) Comfortable learning new software, frameworks, and APIs quickly and effectively Natural collaborator who inspires others, mentors junior engineers, and drives technical excellence Bonus: Familiarity with PHP/JavaScript/NodeJS You will be constantly developing automations / frameworks / tools for better platform reliability/resilience/availability You will collaborate with other engineers on the team as well as cross functionally to foster solid software engineering principles and represent our engineering values You will participate in various POCs on new projects and frameworks being evaluated for the product/platforms You will improve our observability as both a developer/maintainer of systems/frameworks, and a mentor to our product development teams You will work with modern cloud-native technologies including container orchestration (Kubernetes, Docker), service mesh solutions (Istio, Linkerd), and cloud platforms (AWS, GCP) You will participate in product design reviews and architectural discussions to ensure reliability is considered early in the development lifecycle of product/services You will participate in a team on-call rotation

Posted 2025-10-31

Recommended Jobs

Senior/Staff Machine Learning Engineer - Perception Offline Driving Intelligence

Zoox
Foster, CA

The Offline Driving Intelligence (ODIN) team at Zoox is leveraging the latest in AI to craft algorithms that understand the world. We leverage large models first offline and we devise a path of impac…

View Details
Posted 2025-09-22

Senior Machine Learning Engineer - Perception Labeling

Zoox
Foster, CA

We are seeking an AI engineer to design and develop auto-labeling algorithms and platforms to facilitate AI model development for the Perception component of our autonomous driving stack. In this rol…

View Details
Posted 2025-09-22

Materials Management Clerk Part Time Days

ScionHealth
Los Angeles, CA

At ScionHealth , we empower our caregivers to do what they do best. We value every voice by caring deeply for every patient and each other. We show courage by running toward the challenge and…

View Details
Posted 2025-09-10

Data Engineer

Speak
San Francisco, CA

About us Our mission is to reinvent the way people learn, starting with language. We begin by teaching the next billion people English, Spanish, and French. English is the global language of …

View Details
Posted 2025-09-14

Senior Software Engineer - Machine Learning Platform

Snowflake
Menlo Park, CA

Where Data Does More. Join the Snowflake team. The Snowflake Machine Learning Platform team’s mission is to enable customers to bring their machine learning and deep learning workloads to Snowflake.…

View Details
Posted 2025-10-01

Housekeeping

Felder Services
Highland, CA

Full job description We are looking for a reliable and experienced Housekeeper to join our team. The successful candidate will be responsible for cleaning resident rooms and common areas. This pos…

View Details
Posted 2025-10-25

In-Home Caregiver

VistAbility
Union City, CA

Position: Caregiver for Adults with Disabilities Location: Fremont / Union City Hours: Part-Time Monday - Friday; 9:00am - 1:00pm (20 hrs/wk) Compensation: $22.50 per hour Make a Differ…

View Details
Posted 2025-08-07

Frontend Engineer

Sensei
Palo Alto, CA

About Sensei Sensei is a private alpha, seed stage company imagining the next-generation of sales technology with AI. With a team of founders (experience early at Pure Storage, autonomous driving st…

View Details
Posted 2025-09-13

Staff Software Engineer (Platform)

Viant
Irvine, CA

WHAT YOU’LL DO As a Staff Software Engineer on the Platform team, you will build software that solves complex problems while considering long-term strategy and direction. You will deliver simple, …

View Details
Posted 2025-09-14

Principal Technical Marketing Engineer (Bay Area only)

Palo Alto Networks
Santa Clara, CA

Company Description Our Mission At Palo Alto Networks® everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vi…

View Details
Posted 2025-10-10