Senior Backend Engineer, Inference Platform

Together Ai

San Francisco, CA

About the Team

Together AI is building the Inference Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant serverless workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and speech models at scale.

If you get a thrill from optimizing latency down to the last millisecond, this is your playground. You’ll work hands-on with tens of thousands of GPUs (H100s, H200s, GB200s, and beyond), figuring out how to fully utilize every FLOP and every gigabyte of memory.

You’ll collaborate directly with research teams to bring frontier models into production, making breakthroughs usable in the real world. Our team also works closely with the open source community, contributing to and leveraging projects like SGLang, vLLM, and NVIDIA Dynamo to push the boundaries of inference performance and efficiency.

Some of what you’ll work on

Build and optimize global and local request routing, ensuring low-latency load balancing across data centers and model engine pods.

Develop auto-scaling systems to dynamically allocate resources and meet strict SLOs across dozens of data centers.

Design systems for multi-tenant traffic shaping, tuning both resource allocation and request handling — including smart rate limiting and regulation — to ensure fairness and consistent experience across all users.

Engineer trade-offs between latency and throughput to serve diverse workloads efficiently.

Optimize prefix caching to reduce model compute and speed up responses.

Collaborate with ML researchers to bring new model architectures into production at scale.

Continuously profile and analyze system-level performance to identify bottlenecks and implement optimizations.

W hat We’re Looking For

5+ years of demonstrated experience building large-scale, fault-tolerant, distributed systems and API microservices.

Strong background in designing, analyzing, and improving efficiency, scalability, and stability of complex systems.

Excellent understanding of low-level OS concepts: multi-threading, memory management, networking, and storage performance.

Expert-level programming in one or more of: Rust, Go, Python, or TypeScript.

Knowledge of modern LLMs and generative models and how they are served in production is a plus.

Experience working with the open source ecosystem around inference is highly valuable; familiarity with SGLang, vLLM, or NVIDIA Dynamo will be especially handy.

Experience with Kubernetes or container orchestration is a strong plus.

Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink, MPI) is a plus.

Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience.

Why Join Us?

Shape the core inference backbone that powers Together AI’s frontier models.

Solve performance-critical challenges in global request routing, load balancing, and large-scale resource allocation.

Work with state-of-the-art accelerators (H100s, H200s, GB200s) at global scale.

Partner with world-class researchers to bring new model architectures into production.

Collaborate with and contribute to the open source community, shaping the tools that advance the industry.

A culture of deep technical ownership and high impact — where your work makes models faster, cheaper, and more accessible.

Competitive compensation, equity, and benefits.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $250,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at

Posted 2025-10-01

Recommended Jobs

Customer Success Specialist

Robert Half

Beverly Hills, CA

Job Description Job Description A high-end gift store located in Beverly Hills is looking for a Customer Service Specialist to start immediately. As the Customer Service Specialist, you will play…

View Details

Posted 2025-11-08

Full Time Gastroenterology Job Oxnard, CA

United Medical Advisors United Medical Advisors

Oxnard, CA

Well-established not-for-profit 125 Physician strong multi-specialty group is looking for a general Gastroenterologist. Shared call 1:7. You can work 4 or 5 days per week! Teaching opportunity is avai…

View Details

Posted 2025-10-31

Senior Software Engineer, Product Foundations (Backend)

Sentry

San Francisco, CA

About Sentry Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. With more th…

View Details

Posted 2025-09-13

Aerodynamics Engineer

Headhunter Insider

San Diego, CA

Sr Manufacturing Quality Engineer (Electronics) Phoenix, Arizona 125-145K + Bonus + Paid Relocation + Full Benefits Work Schedule Schedule: 9/80 Work Schedule Hours: 7:00 AM – 4:0…

View Details

Posted 2025-10-31

Part-time Cleaners w/ Customer Service Experience

PopUP CleanUP

Culver City, CA

POPUP CLEANUP has a host of clients looking for cleaning professionals to keep their property and/or events maintained beautifully. If you have customer service experience, work well within a team…

View Details

Posted 2025-09-17

Software EngineerÂ

GTN Technical Staffing

San Diego, CA

Software EngineerÂ HIGHLIGHTS Location: Â San Diego, California / Aurora, Colorado / Annapolis Junction, Maryland / Alexandria, VirginiaÂ (Onsite)Â Position Type: Â Direct Hire Hourly …

View Details

Posted 2025-11-07

Software Engineer (Backend and Infrastructure)

Anysignal

Los Angeles, CA

We are seeking a versatile Backend Software Engineer to join our team. This role is central to our mission, focusing on building robust ground segment software for space communications. You will play…

View Details

Posted 2025-11-10

Graphics Software Engineer

Qualcomm

San Diego, CA

Company: Qualcomm Technologies, Inc. Job Area: Engineering Group, Engineering Group Graphics Software Engineering General Summary: As a leading technology innovator, Qualcomm pushes…

View Details

Posted 2025-09-22

Full-Time Occupational Therapist

Ramona Rehabilitation & Post Acute Care Center

Hemet, CA

OFFERING $6,000 SIGN ON BONUS!!! DETAILS EXPLAINED AT TIME OF INTERVIEW. COME CHECK US OUT, WE ARE THE BEST SNF IN TOWN!!! We are looking for a committed Occupational Therapist to act as the patient…

View Details

Posted 2025-11-09

On-site Project Accountant

Matt Construction Corp

Irvine, CA

Full-time Description MATT Construction, the General Contractor that built such iconic structures as The Broad and Academy Museum of Motion Pictures , is seeking an On-site Project Acco…

View Details

Posted 2025-10-01