Principal Site Reliability Engineer
About Gruve
Gruve is an innovative software services startup dedicated to transforming enterprises to AI powerhouses. We specialize in cybersecurity, customer experience, cloud infrastructure, and advanced technologies such as Large Language Models (LLMs). Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As a well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.
About the Role
This role defines organization-wide reliability strategy, architecture, and culture. Guide large-scale automation programs, lead executive-level incident reviews, and mentor senior leaders and ICs. Mentor engineers, manage high-severity incidents, and drive SLO governance. You will work with other SRE engineers to set up, maintain, and troubleshoot the stack from bare metal through the application layer.
Key Responsibilities
- Own the reliability strategy across product and platform teams.
- Architect global infrastructure spanning Kubernetes, GPU platforms, ML Ops, and observability.
- Lead chaos/performance engineering and fleet-wide automation initiatives.
- Partner with executives and engineering leadership; influence roadmaps and resourcing.
- Establish SLO/error-budget governance and drive reliability best practices across org.
- Engage with customers during key operational events.
Basic Qualifications
- 10+ years in SRE, distributed systems, or large-scale infrastructure.
- Deep mastery of Kubernetes, GPU compute, observability, and public cloud.
- Proven leadership shipping mission-critical, high-availability systems.
- Expertise with DGX/HGX, NIMs, Nemotron , GPU operators and exporters.
Preferred Qualifications
- Multi-cloud/multi-region architecture leadership and cost/performance optimization.
- Strong cross-org influence and executive communication skills.
This is an onsite, full-time position with Gruve. The role is open at our Edison, New Jersey and Redwood City, California offices.
Why Gruve
At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.
Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.
Recommended Jobs
Payroll Coordinator
Description Mathew Zaheri Enterprises is in immediate need of a qualified Full-Time Payroll Coordinator to facilitate activities regarding employee compensation. You will undertake a variety of t…
Sr. Accountant - Revenue
Overview The Sr. Accountant position on our Revenue Accounting team is responsible for monthly close activities such as the reconciliation of the company’s revenue accounts and contract review. Other…
Program Supersvisor (ECM)
Brief Description A Program Supervisor plans and supervises a program and provides a variety of administrative and program management tasks. This position also assists the Program Coordinator in t…
Buyer II
Archer is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance the benefits of sustainable air mobility. We are d…
Solutions Engineer
About Cartesia Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason…
Dining Room Team Members
Birdsong is a two Michelin–starred restaurant in San Francisco (SoMa), where Chef Chris Bleidorn’s cooking explores heritage cuisine with heart, craft, and a deep sense of place. Now we’re looking f…
Manufacturing Engineer (Glendale)
Aerospace Manufacturer! Cutting Edge Technology! Room For Growth! This Jobot Job is hosted by: Lincoln Sprague Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your…
Senior Evolved Strategic Systems (ESS) Program Systems Integration Engineer
Title: Senior Evolved Strategic Systems (ESS) Program Systems Integration Engineer Belong. Connect. Grow. with KBR! KBR's National Security Solutions team provides high-end engineering and …
Emergency Veterinarian- Dublin, CA
SAGE Dublin is hiring an experienced Emergency Veterinarian to help us continue to offer top-quality care to our communities. Our doctors and outstanding support staff have created a unique …
Credit Underwriter OE - B10 - Telecommuter
The Credit Underwriter OE is a developing professional role. This role identifies policy and applies specialty knowledge in monitoring and assessing processes and data. Integrates established discipl…