Senior Inference Platform Engineer - Data Center

San Francisco, CA

Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.

Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.

If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!

Key Responsibilities

  • Take ownership of the inference platform architecture, from batch to low-latency workloads.
  • Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
  • Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
  • Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
  • Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
  • Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
  • Implement monitoring, alerting, and observability workflows for production systems.

Requirements:

  • 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
  • Proficiency in Python, Go, Rust, or a comparable language.
  • Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
  • Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
  • Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
  • Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.

Nice to Have

  • Experience with event-driven or serverless architectures.
  • Exposure to hybrid cloud or multi-cluster environments.
  • Contributions to open-source ML or inference systems projects.
  • Proven track record of cost optimisation in high-performance compute environments.

Benefits:

  • Equity

Salary:

  • $300,000 gross per year
Posted 2025-11-21

Recommended Jobs

Customer Contract Admin Analyst -1

Medical Devices Company
San Diego, CA

Roles & Responsibilities . Experience Required ~ Experience on projects or process improvements showing their ability to work independently, contribute to a team, and be proactive. Skill…

View Details
Posted 2026-04-18

Founding Product Designer

Arkham Technologies
San Francisco, CA

To learn more about the role, check this medium link: About Arkham: Arkham is a Data & AI platform that helps large enterprises: Unify fragmented systems and data Build a single sourc…

View Details
Posted 2026-01-27

Part-Time Landscape Installation Estimator / Sales Representative

Grow Control Landscape
El Monte, CA

Position Overview We are a growing landscape company specializing in high-quality landscape installation services for residential and commercial clients. We are seeking a motivated, detail-oriente…

View Details
Posted 2026-03-09

Asset Management Executive

Robert
Palm Springs, CA

Robert is seeking an experienced and proactive Asset Management Executive to oversee and optimize the company's asset portfolio. In this role, you will be responsible for managing the lifecycle of as…

View Details
Posted 2026-03-24

Drafter/Expediter

Gispan Design
Los Angeles, CA

About the Job: We are a design firm specializing in Residential & Commercial design, located in Sherman Oaks. We are currently looking for an experienced Drafter and expediter. The ideal candidate i…

View Details
Posted 2026-03-27

Utility

HMSHost by Avolta
Sacramento, CA

Starting at $21.35 per hour With a career at HMSHost, you really benefit! We Offer ~ Health, dental and vision insurance ~ Generous paid time off (vacation, flex or sick) ~ Holiday pay ~ …

View Details
Posted 2026-04-09

Graphic and Website Designer

AMAX
Fremont, CA

Job Overview: We are seeking a creative, detail-oriented, modern-marketer to join our collaborative team. This role combines strong visual design skills with website management expertise to delive…

View Details
Posted 2026-04-07

School Director

Cardone Ventures
San Ramon, CA

POSITION SUMMARY The School Director at Primrose School is primarily responsible for driving enrollments and managing the overall operations of the school. As the School Dire…

View Details
Posted 2026-01-10

EHS PM (Cypress, Rolls)

Belcan
Cypress, CA

Job Title: EHS Project Coordinator Pay Rate: $36.00-40.00/hour Location: Cypress, CA 90630 (potential hybrid schedule) Area Code: 714 Shift: 1st Shift (Full-time, Direct job with full benefits) …

View Details
Posted 2026-03-19

Marketing Specialist_Temporary

Troy Lee Designs
Corona, CA

Salary Range: $25 - $40 / Hour Location: Corona, CA Salary Range: $25-$40 per hour (DOE) Reports to: Director, Marketing Durations: 3- 6 months (potential for extension depending on busin…

View Details
Posted 2026-04-06