Senior ML Infrastructure Engineer

Hippocratic Ai

Palo Alto, CA

About Us
Hippocratic AI is developing the first safety-focused Large Language Model (LLM) for healthcare. Our mission is to dramatically improve healthcare accessibility and outcomes by bringing deep healthcare expertise to every person. No other technology has the potential for this level of global impact on health.

Why Join Our Team

Innovative mission: We are creating a safe, healthcare-focused LLM that can transform health outcomes on a global scale.
Visionary leadership: Hippocratic AI was co-founded by CEO Munjal Shah alongside physicians, hospital administrators, healthcare professionals, and AI researchers from top institutions including El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft and NVIDIA.
Strategic investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.
Team and expertise: We are working with top experts in healthcare and artificial intelligence to ensure the safety and efficacy of our technology.

For more information, visit .

We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description.

The Role

We are seeking a Machine Learning Infrastructure Engineer to design, build, and manage the next-generation training and inference platform for LLMs. You will be at the heart of building scalable, efficient infrastructure that supports our researchers and engineers in training, serving, and experimenting with large models at scale. Your work will directly impact our ability to innovate with new architectures and training techniques in production environments.

Key Responsibilities

LLM Training Infrastructure: Design and operate large-scale training clusters using Kubernetes and/or Slurm for LLM experimentation, fine-tuning, and RLHF workflows.
Cluster & GPU Management: Own scheduling, autoscaling, resource allocation, and monitoring across high-performance GPU clusters (NVIDIA, AMD).
Distributed Systems: Build and optimize distributed data pipelines using frameworks like Ray , enabling parallel training and inference jobs.
Inference Optimization: Benchmark and optimize model serving performance with technologies like vLLM , and support autoscaling of inference workloads in production environments.
Platform Reliability: Collaborate with infra and platform engineers to ensure system robustness, observability, and maintainability of ML workloads.
Research Enablement: Partner closely with ML researchers to enable rapid experimentation through flexible and efficient infrastructure tooling.

Preferred Qualifications

5+ years of experience in infrastructure, MLOps, or systems engineering, ideally with time spent in architect or staff-level roles.
Proven experience managing large-scale Kubernetes or Slurm clusters for training or serving ML workloads.
Strong proficiency in Python ; familiarity with Go or Rust is a plus.
Hands-on experience with Ray , vLLM , Hugging Face Transformers , and/or custom LLM training stacks.
Deep understanding of GPU scheduling , container orchestration, and workload optimization across heterogeneous hardware.
Experience with inference workloads , benchmarking, latency optimization, and cost-performance tradeoffs.
Familiarity with Reinforcement Learning, particularly RLHF frameworks, is a strong plus.
Contributions to internal platforms that enabled others to train or fine-tune LLMs efficiently.

Bonus Skills

Exposure to multiple hardware platforms (e.g., H100s, A100s, MI300X).
Experience with managing storage, IOPS performance, and object store integration for ML data.
Familiarity with building observability into ML pipelines (e.g., Prometheus, Grafana, Datadog).
Ability to present infra systems/platforms to technical stakeholders.

***Be aware of recruitment scams impersonating Hippocratic AI. All recruiting communication will come from @ hippocraticai.com email addresses. We will never request payment or sensitive personal information during the hiring process. If anything appears suspicious, stop engaging immediately and report the incident.

Posted 2025-09-22

Recommended Jobs

Customer Success Manager

Servicenow

San Diego, CA

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow st…

View Details

Posted 2025-09-22

Barista (1+ Year(s) Experience) - Eataly Silicon Valley

Eataly North America

Santa Clara, CA

Job Description Job Description Company Description Eataly is the world’s largest artisanal Italian food and beverage marketplace! Eataly is not a chain; each Eataly is different, with its o…

View Details

Posted 2025-07-30

Product Manager, Consumer Applications

Hive

San Francisco, CA

About Hive Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations…

View Details

Posted 2025-09-14

Full Time Gastroenterology Job Los Angeles, CA

Southwest Physician Opportunities Southwest Physician Opportunities

Los Angeles, CA

FM-1153 Los Angeles ' San Fernando Valley seeks Gastroenterologist , Live the Los Angeles lifestyle enjoy the beaches and world-famous entertainment, Home of Disney and Warner Brothers studios, pe…

View Details

Posted 2025-09-10

Senior Software Engineer, Payments

Atob

San Francisco, CA

Our mission The trucking and logistics industry provides the backbone of the economy. But the payments infrastructure on which it runs is broken. For the hard-working men and women of this sector,…

View Details

Posted 2025-09-14

Cost Accountant 2025

De-young-properties

Fresno, CA

Cost Accountant De Young Properties, a family-owned company with 50 years of experience, is a local homebuilder located in Fresno, CA. We're proud to have been recognized as the 1st place recipie…

View Details

Posted 2025-09-22

DISHWASHER (FULL TIME AND PART TIME)

Compass Group

Vallejo, CA

Location: CALIFORNIA POLYTECHNIC STATE UNIVERSITY - SOLANO CAMPUS We are hiring immediately for a DISHWASHER (FULL TIME AND PART TIME) position. Address : Cal Poly Solano 200 Maritime Acad…

View Details

Posted 2025-07-29

Requirements Analysis Analyst

Engineering

San Diego, CA

Roles & Responsibilities Assess the impact of change on stakeholders, addressing potential resistance. Develop and execute a change management plan, track progress, and mitigate risks and chall…

View Details

Posted 2025-09-08

Director, Marketing - Thought Leader Liaisons CNS- Remote

Teva Pharmaceutical Industries Ltd.

Los Angeles, CA

Who we are Together, we’re on a mission to make good health more affordable and accessible, to help millions around the world enjoy healthier lives. It’s a mission that bonds our people across nea…

View Details

Posted 2025-09-06

Solutions Engineer - Web, Media & Gaming (WMG)

Cisco

San Jose, CA

The application window is expected to close on: 8/29/2025. Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. The posted compensation…

View Details

Posted 2025-09-06