Software Engineer - Compute

Lambda
San Francisco, CA

In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences. We began as an AI company built by AI engineers. That hasn't changed. Today, we're on a mission to be the world's top AI computing platform. We equip engineers with the tools to deploy AI that is fast, secure, affordable, and built to scale. Whether they need powerhouse GPU hardware on-site or the flexibility of cloud-based solutions, we've got the horsepower to make it happen. Lambda’s AI Cloud has been adopted by the world’s leading companies and research institutions including Anyscale, Rakuten, The AI Institute, and multiple enterprises with over a trillion dollars of market capitalization. Our goal is to make computation as effortless and ubiquitous as electricity.


If you'd like to build the world's best deep learning cloud, join us.

*Note: This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

What You’ll Do

  • Join a functional sub-team of Compute at Lambda which is responsible for the development of a critical internal testing system that manages AI testloads across large GPU compute clusters.

  • Improve code quality, internal validation, and support for new topologies in the testing system.

  • Work on scalability challenges, enabling the testing system to support very large-scale clusters.

  • Transition communication mechanisms from SSH to node agents, exploring ZeroMQ or Redis streams.

  • Fix bugs and operational blockers to enable smoother handoff to non-engineering teams.

  • Contribute to the implementation efforts, and collaborate with the team on high-level architecture or strategic direction.

  • Work closely with the HPC-Ops and other internal consumers of the testing system.

You

  • Have strong proficiency in Python, backed by 3-5 years of professional software development experience, ideally leaning towards 5 years.

  • Solid understanding of Go, with the capability to develop efficient and maintainable code.

  • Have hands-on experience with containers (Docker preferred).

  • Have familiarity with Kubernetes (K8s is a plus but not required).

  • Are comfortable working with Linux-based systems in a distributed environment.

  • Have experience working with large, complex codebases and improving their maintainability.

  • Have learned from past large-scale technical mistakes and grown from them.

  • Can take ownership of an internal tool and drive it forward.

  • Are eager to learn on the job, especially regarding the testing system’s architecture.

  • Are adaptable and enjoy working in fast-moving, high-impact environments.

  • Communicate well with cross-functional teams and collaborate effectively on large-scale engineering efforts.

Nice to Have

  • Experience with Slurm or Kubernetes-based cluster management.

  • Familiarity with high-performance computing (HPC).

  • Understanding of GPU compute environments (CUDA knowledge is not required).

  • Interest in validating AI workloads in customer environments for debugging or preventative maintenance.

Salary Range Information

Based on market data and other factors, the annual salary range for this position is $170,000-$230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~350 employees (2024) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Commuter/Work from home stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Posted 2025-11-25

Recommended Jobs

Records Clerk

Crestview
Los Angeles, CA

The Role: You will be responsible for : Working closely with Team Executives as well as other support functions to provide proactive and effective general administrative assistance across a di…

View Details
Posted 2025-12-16

Registered Veterinary Technician, Surgery (Santa Barbara)

Ethos Veterinary Health
Santa Barbara, CA

Job Title: Registered Veterinary Technician (RVT) Surgery Location: Advanced Veterinary Specialists Compensation & Work Schedule: Hourly rate: Up to $35.00/hr (based on skills, qualificatio…

View Details
Posted 2026-01-06

Software Engineer - Core Middleware

Zoox
Foster, CA

Shipping the software that powers Zoox's revolutionary autonomous vehicles is an extremely challenging job. As a Software Engineer on the Robotics Middleware team, you will be pivotal in designing an…

View Details
Posted 2025-11-28

Staff Software Engineer - IVI

Drivemode
Mountain View, CA

Our Mission: Driving technology always feels old. Not by a little bit. We believe vehicles can be a thousand times smarter, safer, and more connected to the world around us, and our mission is to …

View Details
Posted 2025-12-13

Software Engineer, Inference GPU Enablement

Openai
San Francisco, CA

About the Team OpenAI’s Inference team ensures that our most advanced models run efficiently, reliably, and at scale. We build and optimize the systems that power our production APIs, internal res…

View Details
Posted 2026-01-07

Associate Software Engineer (College Grad 2026)

Solace
Redwood City, CA

Solace is a healthcare advocacy marketplace that connects patients and families to experts who help them understand and take charge of their personal health About The Role Are you a new grad wh…

View Details
Posted 2026-01-10

Class A CDL - Dry Bulk Pneumatic/Vacuum Tanker Driver

USA DeBusk, LLC
Adelanto, CA

Dry Bulk Pneumatic Tanker Driver - Class A CDL Operator. Classification:  Non-Exempt. Department: FCCU Catalyst Transportation. Reports to: Dispatch. Rate of pay: $23-$24 per hour. O.T…

View Details
Posted 2025-11-20

Production Team Partner - Garment Mender - UniFirst

UniFirst
Ontario, CA

Our Production Team is Kind of a Big Deal! UniFirst is seeking a reliable and hardworking Production Team Partner to join our UniFirst Family. As a Team Partner in the Alterations & Mending Departm…

View Details
Posted 2025-12-28

Senior Research and Development (R&D) Technician - 1st Shift - Starting at $32.22/hr

Stryker
Irvine, CA

Work Flexibility: Onsite Schedule: Monday-Friday, 8:00am-5:00pm Overtime may be required to support business needs What you will do As a Senior Research and Development (R&D) Technician, …

View Details
Posted 2025-12-27

Software engineer - flight / space resources program

Blue Origin
Los Angeles, CA

Application Close Date Applications will be accepted on an ongoing basis until the requisition is closed. At Blue Origin, we envision millions of people living and working in space for the bene…

View Details
Posted 2025-12-31