Distributed Machine Learning Engineer

Institute Of Foundation Models

Sunnyvale, CA

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting-edge systems. The ideal candidate will have a strong background in parallel computing, and hands-on experience in system level coding, debug methodologies, and large-scale machine learning experience.

Key Responsibilities

Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state-of-the-art hardware and software platforms to improve their efficiency with different levels of optimization
Design and implement performance benchmarks and testing methodologies to evaluate application performance
Build tools to automate workload analysis, workload optimization, and other critical workflows
Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization
Support the team to develop appropriate kernels and systems for new model architectures and algorithms
Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting-edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation.
Perform all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.

Academic Qualifications

Ph.D. in CS, EE or CSEE with 1+ years working experience, OR
Masters in CS, EE or CSEE or equivalent experience with 2+ year working experience

$150,000 - $450,000 a year

Visa Sponsorship

This position is eligible for visa sponsorship.

Benefits Include

*Comprehensive medical, dental, and vision benefits

*Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability

Posted 2025-09-22

Recommended Jobs

Sr Accounts Receivable - Construction

Tgg Accounting

Corona, CA

Job Overview: We are seeking a detail-oriented and proactive Accounts Receivable Professional to join our finance team. The ideal candidate will manage sales orders, incoming payments, maintain accur…

View Details

Posted 2025-09-22

Organ Procurement Manager

DCI Donor Services

West Sacramento, CA

DCI Donor Services Sierra Donor Services (SDS) is looking for a dynamic and enthusiastic team member to join us to save lives!! Our mission at DCIDS is to save lives through organ donation and …

View Details

Posted 2025-10-15

Associate Logistics Manager

Outward Bound California

Midpines, CA

Full-time Description Position Description As the Associate Logistics Manager (ALM), you'll work alongside the High Sierra Administrative Team to support the day-to-day logistics managem…

View Details

Posted 2025-10-25

403(b) Retirement Plan

Curtis School

Los Angeles, CA

~Immediate Participation: Employees can start contributing to their 403(b) plan right from the start of their employment, with no waiting period. ~Employer Matching Contributions: The Curtis School F…

View Details

Posted 2025-11-04

Product Manager, Safety

Scale Ai

San Francisco, CA

Scale is at the forefront of the AI revolution, working with some of the largest companies in the world to unlock the potential of Generative AI for their business. We provide critical AI Safety &…

View Details

Posted 2025-09-14

Customer Support Supervisor

Convera

Santa Ana, CA

The Client Support Supervisor is accountable for the efficient delivery of high quality and responsive electronic and telephone-based services to Internal and External Clients in relation to pre- …

View Details

Posted 2025-10-13

Software Engineer, Mapping

Doordash Usa

San Francisco, CA

About the Team DoorDash Labs is an independent team at DoorDash that focuses on robotics and automation to develop autonomous delivery solutions as part of the DoorDash platform. The team is res…

View Details

Posted 2025-10-10

Recruiter, Applied AI

Anthropic

San Francisco, CA

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a qui…

View Details

Posted 2025-10-27

Jr. Process Engineer (Harbor 2 - Corp Office)

Robinson Pharma

Santa Ana, CA

The Jr. Process Engineer is responsible for organizing, updating project activities, and reporting status directly to upper management. This individual works closely with the Executive Strategic Adm…

View Details

Posted 2025-09-04