High Performance Computing Software Engineer - Supercomputing

Institute Of Foundation Models
Sunnyvale, CA

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large-scale AI workloads.

In this role, you’ll work at the intersection of high-performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting-edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer-friendly.

Job Responsibilities

  • Design and implement high-performance, distributed software solutions for large-scale AI/ML training.
  • Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
  • Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.
  • Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.
  • Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
  • Debug and resolve complex issues across the stack—from kernel to container to model.
  • Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.

Skills & Experience

  • Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).
  • Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
  • Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
  • Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.
  • Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
  • Excellent debugging and systems performance tuning skills.
  • A collaborative mindset with a focus on shared success and technical excellence.

$200,000 - $400,000 a year

Visa Sponsorship

This position is eligible for visa sponsorship.

Benefits Include

*Comprehensive medical, dental, and vision benefits

*Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability

Posted 2026-02-13

Recommended Jobs

Quality Inspector

RBC Bearings
Baldwin Park, CA

ESSENTIAL FUNCTIONS OF THE JOB: Inspect products using inspection gauges at various inspection points throughout product lifecycle. Generate spreadsheets for data collection. Reconcile final…

View Details
Posted 2026-01-29

Community Support Facilitator (CSF) Sacramento

A Bright Future, Inc.
Sacramento, CA

Community Support Facilitator (CSF) – Full-Time Make a Difference. Lead with Purpose. Build a Brighter Future. At A Bright Future , we believe everyone deserves dignity, support, and the oppo…

View Details
Posted 2026-01-15

Showing Partner for Busy Team

The Cindy Slack Team
Simi Valley, CA

Summary Are you a licensed Realtor eager to grow by collaborating with a top-producing agent boasting over 30 years of demonstrated success? We are looking for a positive, energetic, committed, i…

View Details
Posted 2026-01-15

Senior Art Director

GSW, powered by Syneos Health
Santa Monica, CA

Description Our GSW Santa Monica team is hiring a Sr. Art Director. With keen aesthetic judgment and a well-managed plan, you are a natural at generating creative ideas and solutions to support…

View Details
Posted 2026-01-15

Educational Therapist

TPAPT
Los Altos, CA

The Association of Test Preparation, Admissions, and Private Tutoring (TPAPT) is looking for a passionate Reading Therapist to join our dedicated team. The ideal candidate will work one-on-one with s…

View Details
Posted 2026-01-15

Staff Accountant II - AP

Empire Technological Group Limited
Paradise, CA

Empire Technological Group Limited dba Aruze Gaming Global is looking to add a Staff Accountant II - AP to our team! Las Vegas  Responsibilities:  ~Assist in setting up vendors in the acco…

View Details
Posted 2026-02-07

Demand Generation Manager

Clearstory
Walnut Creek, CA

Overview Change Orders are a $200 billion problem in commercial construction. They slow down jobs, create billing chaos, and strain relationships between GCs, subs, and owners. Clearstory was buil…

View Details
Posted 2025-12-18

Class A Truck Driver CDL Recent Grads OK

10-4 Logistics USA
Orinda, CA

10-4 Logistics USA seeks recent Class A CDL graduates for an entry-level regional driving position operating refrigerated freight on a 100% no-touch schedule. This role is designed for new drivers wh…

View Details
Posted 2025-12-13

Senior Project Manager

Kitchell
Hayward, CA

Brief Description Kitchell seeks an experienced and dedicated Senior Project Manager to join our Bay Area region in the East Bay/Greater Hayward area and build a long-term career at one o…

View Details
Posted 2026-01-15

Travel Nurse - Case Management Job in Garden Grove, CA - $12,229 per Month (2 Years Experience Needed)

Vetted Health
Garden Grove, CA

Vetted is seeking a RN - Case Management for a travel job in Garden Grove, California . Must have 2+ years of experience. This contract pays approximately $12,229/month gross. Assignment …

View Details
Posted 2026-02-13