Senior Site Reliability Engineer

Northwoodspace
Torrance, CA

Role:

Northwood is looking for a Senior Site Reliability Engineer to architect and lead the monitoring and reliability systems that keep satellites connected to Earth. As we rapidly scale our ground station network across multiple continents, you'll design and build the observability infrastructure that ensures our space communications systems operate 24/7 for customers ranging from commercial satellite operators to national security missions.

This is a high-impact leadership role where you'll architect global-scale reliability platforms while mentoring junior engineers and establishing SRE practices across the organization. You'll work directly with our founding engineering team and department heads to define the monitoring, alerting, and deployment strategies that will scale with us from startup to enterprise. If you're excited about space technology and want to architect infrastructure that directly supports mission-critical satellite operations while building and leading technical teams, this role offers that opportunity.

Responsibilities:

  • Architect and maintain enterprise observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) monitoring ground stations, satellite communications, and multi-region AWS infrastructure

  • Design SRE practices, error budgets, and SLO/SLI frameworks for mission-critical satellite systems with 99.9%+ uptime requirements

  • Build advanced AWS infrastructure with Terraform, implementing multi-region reliability, automated scaling, and disaster recovery for ground station operations

  • Lead CI/CD pipeline architecture using GitLab and ArgoCD with advanced deployment strategies for mission-critical software releases

  • Mentor junior engineers and establish reliability standards across the growing engineering organization

  • Design comprehensive Kubernetes deployments with Helm, focusing on high availability and zero-downtime operations

  • Lead incident response, conduct post-mortems, and drive systematic reliability improvements

Basic Qualifications

  • 5-8 years of production infrastructure and SRE experience with demonstrated leadership in reliability improvements and team mentorship

  • Expert-level experience with Kubernetes, Docker, and container orchestration in large-scale production environments

  • Strong background in infrastructure as code (Terraform) and advanced CI/CD practices with experience mentoring others on these technologies

  • Advanced AWS experience including multi-region architectures, networking, security, and cost optimization, with demonstrated ability to architect complex cloud solutions

  • Proven track record of leading technical projects from conception to production in fast-moving, high-growth environments

  • Deep understanding of SRE principles, error budgets, SLOs/SLIs, and experience implementing reliability frameworks across engineering organizations

Preferred Qualifications

  • Production experience architecting and scaling observability tools (Vector, Loki, Grafana, Prometheus, VictoriaMetrics) in high-throughput environments

  • Advanced experience with HashiCorp Vault, Okta, and enterprise identity/secrets management systems including policy design and implementation

  • Previous experience scaling infrastructure and leading technical teams at high-growth companies (startup to 500+ employees)

  • AWS Professional certification or equivalent demonstrated expertise with advanced cloud networking, security, and compliance frameworks

  • Strong Linux system administration and networking expertise with experience troubleshooting complex distributed systems

  • Background in aerospace, telecommunications, defense contracting, or other mission-critical, highly regulated industries

  • Experience with ITAR, NIST 800-171, or other defense/aerospace compliance requirements

Posted 2025-11-28

Recommended Jobs

Field Engineer - Mechanical / Piping / Instrument

Dynamics ATS
Federal, Los Angeles County, CA

Field Engineer - Mechanical / Piping / Instrument   JOB-10045417   Anticipated Start Date 12/21/2025   Location Sabine Pass, TX   Type of Employment Contract Hire   Employer…

View Details
Posted 2025-11-21

Timberland Conservation and Fire Resiliency Coordinator

Department of Fish and Wildlife
Sacramento County, CA

Job Description and Duties Are you looking to be a part of a dynamic team of conservation-minded scientists working to protect fish and wildlife resources in the coniferous forests of the Sierra N…

View Details
Posted 2025-11-20

Product Manager

Authorium
San Francisco, CA

About Authorium Authorium is on a mission to redefine how agencies manage complex document-centric workflows by pioneering a unified platform that integrates all facets of key administrative funct…

View Details
Posted 2025-11-28

Principal Test Engineer - System Test

Western Digital
Roseville, CA

Company Description At Western Digital, our vision is to power global innovation and push the boundaries of technology to make what you thought was once impossible, possible. At our core, Wes…

View Details
Posted 2025-11-25

IT HELP DESK ANALYST I

Axis Community Health
Pleasanton, CA

Description Company Description : Axis Community Health, a nonprofit established in 1972, provides comprehensive healthcare services to over 15,000 individuals across all age groups in the Tri…

View Details
Posted 2025-11-21

Senior Autonomy Systems Test Engineer

Zoox
Foster, CA

Autonomous vehicles have some of the largest, most complex software ever shipped in a safety-critical environment. Solving that problem is one of the most exciting technical challenges of our lifetim…

View Details
Posted 2025-11-25

Social Media Manager

Grail Talent
Los Angeles, CA

About us: Grail Talent is a Creator Management Agency that connects our diverse and carefully curated roster of content creators with digital marketing opportunities, working with brands to reach …

View Details
Posted 2025-11-18

Construction Lead/Foreman

Horizon Lighting
Irvine, CA

Seeking an experienced lead/foreman to run a new construction crew. Ability to complete installation of all new construction electrical (site lighting) projects, from initial print review/analysis to …

View Details
Posted 2025-11-21

ABA Behavior Therapist

Cortica
Laguna Niguel, CA

Cortica is looking for dedicated, compassionate Behavior Technicians to join our growing team and help us design and deliver life-changing care for children with neurodevelopmental differences. At…

View Details
Posted 2025-07-30

Flight Test Engineer

Anduril Industries
Costa Mesa, CA

This position is looking for a highly motivated flight test engineer with emphasis in developmental test of new aircraft weapon and missile systems. You will work closely with the program test lead…

View Details
Posted 2025-11-28