Software Engineer, Infrastructure Generalist

Thinking Machines Lab
San Francisco, CA


Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.

We are a small team of scientists, engineers, and builders who've created some of the most widely used AI products, like ChatGPT, Character.ai, Mistral, PyTorch, OpenAI Gym, Fairseq, and Segment Anything.

About This Role


We're looking for a Staff Software Engineer—a generalist across the backend—to help build the systems that power our foundation models.

You'll join a small, high-impact team responsible for architecting and scaling the core infrastructure behind everything we do. You’ll work across the full technical stack, solving complex distributed systems problems and building robust, scalable platforms.

Infrastructure is critical to us: it's the bedrock that enables every breakthrough. You'll work directly with researchers to accelerate experiments, improve infrastructure efficiency, and enable key insights across our models, products, and data assets.

What You’ll Do



  • Design, build, and operate scalable, fault-tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities.

  • Develop high-throughput systems for data ingestion, processing, and transformation — including training data catalogs, deduplication, quality checks, and search.

  • Build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle.

  • Implement and maintain monitoring and alerting to support platform reliability and performance.

  • Collaborate with research teams to unlock new features, improve system efficiency, and accelerate training cycles.

Required Qualifications



  • Technical expertise:


    • 5+ years of experience building distributed systems, ideally supporting high-scale applications or research platforms.

    • Fluent in containerization, orchestration, and distributed compute frameworks.

    • Hands-on experience with Kubernetes, Terraform, service discovery, and workflow orchestration tools.

    • Experience with network programming, load balancing, or distributed consensus systems.

    • Extensive experience with performance optimization, caching strategies, and system scalability patterns.

    • Deeply familiar with cloud infrastructure, microservices architectures, and both synchronous and asynchronous processing.

    • Strong knowledge of databases, storage systems, and how architecture choices impact performance at scale.

    • Proactive about automation, testing, and building tools that empower engineering teams.

  • System Design & Performance:


    • Strong proficiency in systems programming languages (Rust) and scripting (Python)

    • Familiarity with performance profiling and optimization in high-throughput distributed environments

    • Track record of architecting resilient systems and debugging complex production issues

    • Excellent communication and collaboration skills

Strong Candidates May Also Have



  • Experience supporting machine learning training infrastructure or GPU clusters

  • Background at AI research labs, high-performance computing centers, or ML-focused companies

  • Published work on distributed systems, infrastructure, or performance optimization

  • Open-source contributions to infrastructure projects, orchestration tools, or distributed computing frameworks

  • Experience with specialized hardware (GPUs, TPUs) and their integrations into distributed training systems

Logistics



  • Location: This role is based in San Francisco, California.

  • Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.

  • Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

  • Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $300,000-$350,000 USD.

  • We encourage you to apply even if you do not believe you meet every single qualification.

  • As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Posted 2025-10-31

Recommended Jobs

Administrative Clerk

Select Staffing
Richmond, CA

Description Now Hiring: Admin Clerk – Richmond, CA Location: Richmond, CA Pay: Competitive hourly rate Schedule: Monday to Friday, Full-Time (OT Available) Select Staffing is seeking a detail…

View Details
Posted 2025-10-31

Licensed Insurance Customer Service

Carl Ferraro - State Farm Agency
Huntington Beach, CA

Successful State Farm Agent is seeking a qualified professional to join their winning team for the role of Licensed Customer Service Representative - State Farm Agent Team Member. We seek an license…

View Details
Posted 2025-10-03

Lead Site Reliability Engineer - Federal Team

Saviynt
Los Angeles, CA

Saviynt is an identity authority platform built to power and protect the world at work. In a world of digital transformation, where organizations are faced with increasing cyber risk but cannot affor…

View Details
Posted 2025-10-01

Sales Development Representative - PracticeQ (Hybrid in San Diego)

PracticeTek
San Diego, CA

The Role Title: Sales Development Representative Team: Sales - PracticeQ Location: Hybrid/In-Office 3 days in San Diego, CA Office Reports To: Sales Manager About PracticeTek St…

View Details
Posted 2025-10-31

Principal Machine Learning Engineer

Servicenow
Santa Clara, CA

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow st…

View Details
Posted 2025-10-22

Insurance Clerk

Salinas Valley Health
Salinas, CA

Insurance Clerk Location Salinas, CA : Works under the supervision of the Business Services Coordinator. Performs diverse clerical tasks involved in processing of billing for services to patients in …

View Details
Posted 2025-11-04

Work from Home Marketing Specialist

A LIFE PERFECTED LIMITED
San Francisco, CA

Join Our Global Team as a Marketing Specialist in the Personal Development Sector!  Are you a dynamic and driven marketing specialist looking for an opportunity in the thriving personal development…

View Details
Posted 2025-08-16

Senior Front End Engineer

Dealpath
San Francisco, CA

Dealpath is seeking a Senior Frontend/UI Engineer to lead the development of modern, high-performance user interfaces that power our AI-enabled deal management platform. In this role, you'll take o…

View Details
Posted 2025-09-14

Data Engineer

Institute Of Foundation Models
Sunnyvale, CA

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the ne…

View Details
Posted 2025-09-22

Front End Supervisor/Department Area Supervisor

Ross Stores, Inc.
San Marcos, CA

Department Area Supervisor - dd's Discounts POSITION OVERVIEW: The Department Area Supervisor assists the Store Manager in managing and controlling the operations of the store to ensure that co…

View Details
Posted 2025-10-24