Senior Inference Platform Engineer - Data Center
Join a stealth-mode hyperscale data center startup building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference.
Our client operates high-performance GPU clusters powering some of the most advanced AI workloads worldwide. They’re now building a serverless inference platform, beginning with cost-efficient batch inference and expanding into low-latency, real-time inference and custom model hosting. This is a unique chance to join at an early stage and help define the architecture, scalability, and technical direction of that platform.
If you are interested in this opportunity, get in tuch! You don't want to miss this opportunity!
Key Responsibilities
- Take ownership of the inference platform architecture, from batch to low-latency workloads.
- Design, build, and optimise distributed inference systems to maximise GPU utilisation and minimise cold starts.
- Integrate, tune, and operate inference engines such as vLLM, SGLang, and TensorRT-LLM across multiple model types.
- Develop APIs, orchestration layers, and autoscaling logic to support both multi-tenant and dedicated deployments.
- Collaborate with cross-functional teams to translate business and customer needs into robust technical solutions.
- Stay up to date with the latest models, serving frameworks, and optimisation techniques, applying best practices in performance and efficiency.
- Implement monitoring, alerting, and observability workflows for production systems.
Requirements:
- 5+ years’ experience building large-scale, fault-tolerant distributed systems (ML inference, HPC, or similar).
- Proficiency in Python, Go, Rust, or a comparable language.
- Strong understanding of GPU software stacks (CUDA, Triton, NCCL) and Kubernetes orchestration.
- Practical experience with model-serving frameworks such as vLLM, SGLang, TensorRT-LLM, or custom PyTorch deployments.
- Knowledge of performance optimisation techniques, including batching, speculative decoding, quantisation, and caching.
- Familiarity with Infrastructure-as-Code tools (Terraform, Helm) and low-level OS performance tuning.
Nice to Have
- Experience with event-driven or serverless architectures.
- Exposure to hybrid cloud or multi-cluster environments.
- Contributions to open-source ML or inference systems projects.
- Proven track record of cost optimisation in high-performance compute environments.
Benefits:
- Equity
Salary:
- $300,000 gross per year
Recommended Jobs
Certified Occupational Therapy Assistant
JOB DESCRIPTION SUMMARY The certified occupational therapy assistant contracted through the Organization is responsible to the registered occupational therapist that is responsible for the implem…
VCA AI Agent Development Director
Job Description The Director, AI Agent Development leads the strategic design and development of AI agents to support key business functions such as market analysis, proposal generation, and risk a…
Specialty Representative, Urology - Fresno, CA
Company Description About AbbVie AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of to…
Senior Product Manager (SMB)
NerdWallet’s Small Business Team is on a mission to empower small business owners with the tools, insights, and confidence they need to make smart financial decisions and build thriving businesses. W…
Rail Simulation Software Developer
PGH Wong Engineering, Inc. has a proud and lengthy history of delivering innovative, challenging, and complex projects. PGH Wong was established in 1985 on its extraordinary foundation in systems eng…
Senior Director, Consumer Strategy
Job Description The Senior Director of Consumer Strategy will work across the Global and NA Marketing and Product teams to outline key consumer marketing strategies that will support business outco…
Lead Analyst - Sourcing Manager / Purchasing
Lead Analyst - Sourcing Manager / Purchasing Company: Eosol Group Location: Orange, CT About Eosol Group Eosol Group is a leader in innovative solutions within the energy and manufactur…
Janitor I - 2nd shift
We are looking for a Janitor who has commercial cleaning experience. This individual is responsible for maintaining the manufacturing/production areas, interior office, and/or the exterior landscap…
Order Processing Representative I
Description Fluidra is looking for an Order Processing Representative I to join our team in Carlsbad, CA. WHAT YOU WILL CONTRIBUTE The Order Processing Representative I processes purchase order…
Staff Accountant
Umbra builds next-generation space systems that observe the Earth in unprecedented fidelity. Our mission: Deliver global omniscience. To stay ahead of climate change, geopolitical risk, and oth…