Senior Inference Platform Engineer - Data Center

San Francisco, CA

Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.

Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.

If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!

Key Responsibilities

  • Take ownership of the inference platform architecture, from batch to low-latency workloads.
  • Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
  • Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
  • Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
  • Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
  • Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
  • Implement monitoring, alerting, and observability workflows for production systems.

Requirements:

  • 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
  • Proficiency in Python, Go, Rust, or a comparable language.
  • Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
  • Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
  • Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
  • Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.

Nice to Have

  • Experience with event-driven or serverless architectures.
  • Exposure to hybrid cloud or multi-cluster environments.
  • Contributions to open-source ML or inference systems projects.
  • Proven track record of cost optimisation in high-performance compute environments.

Benefits:

  • Equity

Salary:

  • $300,000 gross per year
Posted 2025-11-21

Recommended Jobs

Certified Occupational Therapy Assistant

Freedom Home Health and Hospice Care Services Inc.
Grass Valley, CA

JOB DESCRIPTION SUMMARY The certified occupational therapy assistant contracted through the Organization is responsible to the registered occupational therapist that is responsible for the implem…

View Details
Posted 2026-02-28

VCA AI Agent Development Director

VISA
San Francisco, CA

Job Description The Director, AI Agent Development leads the strategic design and development of AI agents to support key business functions such as market analysis, proposal generation, and risk a…

View Details
Posted 2026-01-30

Specialty Representative, Urology - Fresno, CA

AbbVie
Fresno, CA

Company Description About AbbVie AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of to…

View Details
Posted 2026-02-28

Senior Product Manager (SMB)

Nerdwallet
San Francisco, CA

NerdWallet’s Small Business Team is on a mission to empower small business owners with the tools, insights, and confidence they need to make smart financial decisions and build thriving businesses. W…

View Details
Posted 2026-02-28

Rail Simulation Software Developer

Pgh Wong Engineering, Inc.
San Francisco, CA

PGH Wong Engineering, Inc. has a proud and lengthy history of delivering innovative, challenging, and complex projects. PGH Wong was established in 1985 on its extraordinary foundation in systems eng…

View Details
Posted 2026-02-13

Senior Director, Consumer Strategy

VISA
San Francisco, CA

Job Description The Senior Director of Consumer Strategy will work across the Global and NA Marketing and Product teams to outline key consumer marketing strategies that will support business outco…

View Details
Posted 2026-02-21

Lead Analyst - Sourcing Manager / Purchasing

Eosol Group
Orange, CA

Lead Analyst - Sourcing Manager / Purchasing Company: Eosol Group Location: Orange, CT About Eosol Group Eosol Group is a leader in innovative solutions within the energy and manufactur…

View Details
Posted 2025-10-27

Janitor I - 2nd shift

Robinson Pharma
Santa Ana, CA

We are looking for a Janitor  who has commercial cleaning experience. This individual is responsible for maintaining the manufacturing/production areas, interior office, and/or the exterior landscap…

View Details
Posted 2026-01-01

Order Processing Representative I

Zodiac Pool Systems LLC
Carlsbad, CA

Description Fluidra is looking for an Order Processing Representative I to join our team in Carlsbad, CA.  WHAT YOU WILL CONTRIBUTE  The Order Processing Representative I processes purchase order…

View Details
Posted 2026-02-24

Staff Accountant

Umbra
Santa Barbara, CA

Umbra builds next-generation space systems that observe the Earth in unprecedented fidelity. Our mission: Deliver global omniscience. To stay ahead of climate change, geopolitical risk, and oth…

View Details
Posted 2026-02-28