Senior Inference Platform Engineer - Data Center

San Francisco, CA

Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.

Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.

If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!

Key Responsibilities

  • Take ownership of the inference platform architecture, from batch to low-latency workloads.
  • Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
  • Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
  • Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
  • Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
  • Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
  • Implement monitoring, alerting, and observability workflows for production systems.

Requirements:

  • 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
  • Proficiency in Python, Go, Rust, or a comparable language.
  • Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
  • Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
  • Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
  • Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.

Nice to Have

  • Experience with event-driven or serverless architectures.
  • Exposure to hybrid cloud or multi-cluster environments.
  • Contributions to open-source ML or inference systems projects.
  • Proven track record of cost optimisation in high-performance compute environments.

Benefits:

  • Equity

Salary:

  • $300,000 gross per year
Posted 2025-11-21

Recommended Jobs

Behavior Technician

PBX Steps
Colton, CA

We provide an extensive 5-day PAID new hire orientation/training.  Afternoon Shift: Monday-Friday (Flexible Hours). Starting salary based on experience: $21.00 - $24.00/per hour. Job Desc…

View Details
Posted 2026-01-09

IT Infrastructure Engineer

Samsung Semiconductor
San Jose, CA

Please Note: To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period.  Advancing t…

View Details
Posted 2025-12-22

Retail Sales Associate

Work World
Salinas, CA

Position Overview: Now Hiring: Retail Sales Associate Schedule: Part-time Compensation : $16.90-$18.50 (DOE) Location: Salinas, CA   We're excited to add to our team. Work World is grow…

View Details
Posted 2026-01-09

12-16 Ft. Box Truck Owner Operator

Victoria Logistics Carrier LLC
Sacramento, CA

Hello, guys! We are hiring Box trucks owner-operators for our company VICTORIA LOGISTICS CARRIER. We work within an independent contract agreement and offer very competitive rates. You can always…

View Details
Posted 2025-10-24

Product Manager, Core Product

Discord
San Francisco, CA

Discord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform: play video games. Over 90% of our users play game…

View Details
Posted 2025-11-28

Veterinary Technician, ER and Internal Medicine (Irvine)

Ethos Veterinary Health
Irvine, CA

Emergency Veterinary Technician About Us: At Pacific Care Pet Emergency and Specialty, care, collaboration, and comprehensive expertise drive everything we do. We treat each pet as if they we…

View Details
Posted 2026-01-06

Senior Product Manager, Trading & OTC Services

Ripple
San Francisco, CA

THE WORK We’re looking for a Product Manager, Trading & OTC Services to be the central point driving the OTC trading desks productization efforts, ensuring seamless integration with Ripple's cor…

View Details
Posted 2025-11-28

Recruiter

SGS Consulting
California

Job Responsibilities: Will interact face to face with at least 50% of the managers hiring and the HRBPs. Blend of agency and in house recruiting preferred Med device, med tech, bio tech are …

View Details
Posted 2025-11-14

Technical Product Manager

Edison Scientific
San Francisco, CA

About Edison Scientific focuses on building and commercializing AI agents for science, and shares FutureHouse’s mission to build an AI Scientist - scaling autonomous research, productizing it, a…

View Details
Posted 2025-12-22

Product Manager, Discovery Analytics

Linkedin
San Francisco, CA

Company Description LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful co…

View Details
Posted 2025-11-28