Software Engineer, AI Training Infrastructure

Fireworks Ai
Redwood City, CA

About Us:


Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.

The Role:


As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development.

Key Responsibilities:



  • Design and implement scalable infrastructure for large-scale model training workloads

  • Develop and maintain distributed training pipelines for LLMs and multimodal models

  • Optimize training performance across multiple GPUs, nodes, and data centers

  • Implement monitoring, logging, and debugging tools for training operations

  • Architect and maintain data storage solutions for large-scale training datasets

  • Automate infrastructure provisioning, scaling, and orchestration for model training

  • Collaborate with researchers to implement and optimize training methodologies

  • Analyze and improve efficiency, scalability, and cost-effectiveness of training systems

  • Troubleshoot complex performance issues in distributed training environments

Minimum Qualifications:



  • Bachelor's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience

  • 3+ years of experience with distributed systems and ML infrastructure

  • Experience with PyTorch

  • Proficiency in cloud platforms (AWS, GCP, Azure)

  • Experience with containerization, orchestration (Kubernetes, Docker)

  • Knowledge of distributed training techniques (data parallelism, model parallelism, FSDP)

Preferred Qualifications:



  • Master's or PhD in Computer Science or related field

  • Experience training large language models or multimodal AI systems

  • Experience with ML workflow orchestration tools

  • Background in optimizing high-performance distributed computing systems

  • Familiarity with ML DevOps practices

  • Contributions to open-source ML infrastructure or related projects

Compensation is determined by various factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range for this role is a guideline and may be modified.

Redwood City Pay Range

$175,000 - $220,000 USD

Why Fireworks AI?



  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.

  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.

  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.

  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Posted 2026-02-04

Recommended Jobs

Banquet Houseperson - Hilton Irvine

Hilton
Irvine, CA

The Hilton Irvine is hiring a Banquet Houseperson. We are a union property and cannot guarantee a number of shifts weekly, but do offer benefits (pending eligibility) including insurance, free lunch,…

View Details
Posted 2026-01-21

Lead Associate, Footwear PT

Under Armour
Commerce, CA

Values & Innovation At Under Armour, we are committed to empowering those who strive for more, and the company's values - Act Sustainably, Celebrate the Wins, Fight on Together, Love Athletes and …

View Details
Posted 2026-01-30

Assistant Operations Manager - Housekeeping

Marriott
Monterey, CA

JOB SUMMARY Hotel operations management generalist position that supports Front Desk (including Bell/Door Staff, Switchboard, AYS, and Concierge/Guest Services), Rooms (including Housekeeping, R…

View Details
Posted 2026-01-30

Clinical Psychologist

General Dynamics Information Technology
Coronado, CA

Public Trust: None Requisition Type: Regular Your Impact Own your opportunity to work alongside federal civilian agencies. Make an impact by providing services that help the government ens…

View Details
Posted 2025-10-27

Accounts Receivable Lead

Traditional Medicinals
Rohnert Park, CA

SUMMARY The Accounts Receivable Lead is responsible for the accurate processing of invoices, deposits, payment application, deduction clearing, collections, and acts as a resource for the AR staf…

View Details
Posted 2026-02-04

Travel Occupational Therapist - Full-Time

Pennsylvania Health Care Association
South Gate, CA

We are seeking a dedicated Travel Occupational Therapist in South Gate, CA, offering $12,498/month. Assess and develop personalized treatment plans for patients with diverse needs. Implement the…

View Details
Posted 2026-01-18

Supervisor of Customer Experience at UTC San Diego (30-35 hrs/wk with full benefits)

San Diego, CA

Since its launch in 1993 with a collection of six essential handbags, Kate Spade New York has always been colorful, bold, and optimistic. Today, it is a global lifestyle brand that designs extraordin…

View Details
Posted 2026-01-06

CWI Inspector I

Sacramento, CA

Pioneering in America, from the first mile to the last. This is what drives us! For more than 160 years, Siemens has been an integral provider of infrastructure, electrification, and transportation…

View Details
Posted 2026-02-03

Controller

Brandywine Communications
Santa Ana, CA

Description Job Summary: The Controller at Brandywine Communications plays a critical leadership role in maintaining the company’s financial health, integrity, and operational effectiveness. Th…

View Details
Posted 2026-01-30

Technical program manager

Facebook App
Menlo Park, CA

Summary: The Meta Technical Program Management (TPM) community is pioneering technologies to bring people (and businesses) closer together at a global scale. TPMs work at the cross-section betw…

View Details
Posted 2026-01-21