Senior/Staff Machine Learning Engineer, Training Runtime Performance
- Collaborate with ML practitioners and other infrastructure teams to understand their needs and integrate optimized input pipelines seamlessly into their workflows.
- Detect, diagnose, and resolve performance bottlenecks across training, eval, and model distillation workflows.
- Optimize training performance, resource utilization, and ensure consistent, reproducible model training outcomes.
- Optimize input data pipelines to increase runtime goodput, ensuring accelerators maximize their "time on task" and minimize idle cycles.
- Champion best practices for robust, reproducible, and debuggable ML experimentation.
- B.S./M.S./Ph.D. in Computer Science, Electrical Engineering, or related technical field (or equivalent experience).
- 4+ years of professional experience in ML infrastructure, distributed training, or ML systems engineering, scaling models on multi-node, multi-accelerator clusters.
- Understanding of training, evaluation, and distillation workflows for billion-parameter models
- Expert-level knowledge in distributed systems and (remote) Python
- Strong skills in profiling, debugging, and optimizing quantized workloads.
- Experience with ML compilers and strategies to reduce startup overhead
- Familiarity with model distillation and efficient inference workflows.
- Previous contributions to open source ML infra projects or research publications in ML systems.
- Hands-on experience with Foundation Model infrastructure
- Highly proficient in C++, distributed systems, ML framework internals (e.g., NCCL, Horovod, DeepSpeed, Ray)
Recommended Jobs
Senior Accounting Manager
A large and prestigious business management firm seeks a Senior Manager, Business Management to join their dynamic team. The firm is searching for an experienced senior manager to lead a ded…
Part-Time Overnight Guest Arrival Expert
POSITION SUMMARY First impressions are everything. When guests arrive at our hotels, we want that impression to be memorable. The same goes for departures. When guests leave, we want them to go wi…
Technician - Level 3
Join Our Team! Sunbelt Rentals strives to be the customer's first choice in the equipment rental industry. From pumps to scaffolding to general construction tools, we aim to be the only call needed…
Italian speaking Web Researcher Needed
Join us to become a long-term top data steward for one of the biggest cloud computing companies in the life science industry. What’s in it for you? - Learn how to capture online available data by f…
Get rewarded for taking part in surveys
Turn your opinions into cash by participating in surveys. YouGov is trusted by brands and the media to accurately measure opinion. The results of surveys you take will feature in the news, and be us…
General Manager, Business Development & Strat
Job Title: General Manager, Business Development & Strategy- Leasing Location: Los Angeles / Hybrid / Remote with travel Role Summary The General Manager of Business Development & Strategy-…
CEQA Compliance Lead
About Us We are an award-winning California-based environmental consulting firm with 550+ professionals and more than 30 years of diverse experience serving clients in transportation, energy, wa…
Account Executive - Los Angeles
Agora is a leading SaaS and FinTech platform transforming how real estate investment firms manage their capital, investors, and operations. Trusted by 700+ GPs, owners/operators, and investment firms…
CONSTRUCTION QUALITY CONTROL MANAGER - NAVFAC- 29PALMS
Twentynine Palms Microgrid Project Location: Twentynine Palms, California CLP Engineering, LLC (CLPE) is assembling a project-specific team to deliver a NAVFAC Microgrid project at Marine Corp…