Senior Inference Platform Engineer - Data Center

San Francisco, CA

Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.

Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.

If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!

Key Responsibilities

  • Take ownership of the inference platform architecture, from batch to low-latency workloads.
  • Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
  • Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
  • Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
  • Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
  • Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
  • Implement monitoring, alerting, and observability workflows for production systems.

Requirements:

  • 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
  • Proficiency in Python, Go, Rust, or a comparable language.
  • Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
  • Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
  • Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
  • Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.

Nice to Have

  • Experience with event-driven or serverless architectures.
  • Exposure to hybrid cloud or multi-cluster environments.
  • Contributions to open-source ML or inference systems projects.
  • Proven track record of cost optimisation in high-performance compute environments.

Benefits:

  • Equity

Salary:

  • $300,000 gross per year
Posted 2025-11-21

Recommended Jobs

Senior Director, Accounting Global Media

The National Football League
Inglewood, CA

The Senior Director Accounting - Global Media reports to the Global Controller and oversees daily accounting operations to ensure accurate financial reporting and compliance with accounting standards…

View Details
Posted 2025-11-21

Customer Service Representative

CommScope Inc.
Santa Ana, CA

  In our 'always on' world, we believe it's essential to have a genuine connection with the work you do. The Customer Service Representative is responsible for cultivating customer relationships …

View Details
Posted 2025-11-20

Sales Development Program Sales Associate PHVAC

United Rentals
Rialto, CA

Great company. Great people. Great opportunities. If youd like the chance to make your mark with the worlds largest equipment rental provider come build your future with United Rentals! As…

View Details
Posted 2025-11-21

Food Product Evaluator II

SGS Consulting
California

Job Responsibilities: New Product Development Projects - Assist Product Development Managers by managing documentation and new food product demonstrations across multiple categories. Receive, log,…

View Details
Posted 2025-11-14

CLIN NURSE 2-ED Annex -FT-Night - Orange

University of California, Irvine
Orange, CA

Overview: UCI Health is the clinical enterprise of the University of California, Irvine, and the only academic health system based in Orange County. UCI Health is comprised of its main campus, UCI …

View Details
Posted 2025-11-13

Director of Nursing Services

Provider Management-Professional Search
Costa Mesa, CA

Director of Nursing needed for a Skilled Nursing Center in Orange County. If you are an RN Supervisor or RN who has been a DON and you are interested in hearing about excellent opportunities- plea…

View Details
Posted 2025-11-21

SOUS CHEF (FULL TIME)

Compass Group
Irvine, CA

  We are hiring immediately for a full time SOUS CHEF position. Location : Olive Grove Cafe - 150 Progress Suite 125, Irvine, CA 92618. Note: online applications accepted only . Schedule…

View Details
Posted 2025-10-03

Assistant Chief Engineer Union

JLL
Sunnyvale, CA

JLL empowers you to shape a brighter way . Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services advisory and technol…

View Details
Posted 2025-11-21

Building Inspector II

4LEAF, Inc
Chico, CA

4 Leaf, Inc., a leading project management and engineering services firm catering to government, healthcare, and education sectors, is seeking a proficient Building Inspector II. This role involves ev…

View Details
Posted 2025-11-21

Corporate Airport Coordinator - Sfo - San Francisco, ca

ESTÉE LAUDER
San Francisco, CA

Supports and guides the effective breakdown of targets by the Counter Manager by week based on business performance. Supporting with VIP visits and passes Work along with SEE to create Marketing Calen…

View Details
Posted 2025-11-21