Senior Inference Platform Engineer - Data Center
Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.
Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.
If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!
Key Responsibilities
- Take ownership of the inference platform architecture, from batch to low-latency workloads.
- Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
- Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
- Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
- Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
- Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
- Implement monitoring, alerting, and observability workflows for production systems.
Requirements:
- 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
- Proficiency in Python, Go, Rust, or a comparable language.
- Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
- Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
- Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
- Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.
Nice to Have
- Experience with event-driven or serverless architectures.
- Exposure to hybrid cloud or multi-cluster environments.
- Contributions to open-source ML or inference systems projects.
- Proven track record of cost optimisation in high-performance compute environments.
Benefits:
- Equity
Salary:
- $300,000 gross per year
Recommended Jobs
Senior Data Engineer (OpAI)
The Marlin Alliance, Inc. is seeking a Senior Data Engineer (OpAI) to design, build, and operationalize advanced data pipelines and analytics supporting Naval and DoD mission challenges. This role req…
Registered Veterinary Technician (RVT)
The base salary range for this full-time position is $30.00 - $35.00. Our salary ranges are primarily determined by role, level, and location. The range provided for each job posting reflects the mini…
PROGRAM DIRECTOR, Hong Fook ADHC
POSITION: PROGRAM DIRECTOR, Hong Fook ADHC RESPONSIBLE TO: Administrator of Hong Fook ADHC COMMITMENT (HOURS): Full-time (40 hours/week) STATUS: …
Lead EMT / Safety Officer - Swing Shift
Lead EMT / Safety Officer - Swing Shift Department: Safety Salary: $22.5 - $24 Per Hour The Glen Scripps Ranch is looking for a Full Time Lead EMT / Safety Officer Swing shift:…
SerDes Optical System Architect - 16982
Category Engineering Hire Type Employee Job ID 16982 Base Salary Range $209000-$313000 Remote Eligible No Date Posted 05.27.2026 We Are Synopsys is the leader in engineering solutio…
Travel Nurse RN - Cardiovascular Intensive Care Unit - $2,900 to $3,001 per week in Sacramento, CA
Registered Nurse (RN) | Cardiovascular Intensive Care Unit Location: Sacramento, CA Agency: United Health Care Staffing, Inc. Pay: $2,900 to $3,001 per week Shift Information: Nigh…
Sales Executive
Los Angeles area industry leader is seeking an experienced Sales Executive to sell their construction and related clean energy equipment throughout the LA metro region. Qualified candidates will hav…
ATTORNEY
Job Description and Duties This position is limited term 12 months, may be extended up to 24 months or become permanent. The California Coastal Commission’s Legal Division is seeking an Attorne…