Staff Machine Learning Infrastructure Engineer

Dyna Robotics
Redwood City, CA

Company Overview:

Dyna Robotics makes general-purpose robots powered by a proprietary embodied AI foundation model that generalizes and self-improves across varied environments with commercial-grade performance. Dyna's robots have been deployed at customers across multiple industries. Its frontier model has the top generalization and performance in the industry.

Dyna Robotics was founded by repeat founders Lindon Gao and York Yang, who sold Caper AI for $350 million, and former DeepMind research scientist Jason Ma. The company has raised over $140M, backed by top investors, including CRV and First Round.We're positioned to redefine the landscape of robotic automation. Join us to shape the next frontier of AI-driven robotics!

Learn more at dyna.co

Position Overview:

We are seeking an experience Machine Learning Infrastructure Engineer to join our team and help scale our ML training platform. In this role, you will be responsible for designing, implementing, and maintaining large-scale ML infrastructure to accelerate model iteration and improve training performance across an expanding GPU ecosystem. You will work on cutting-edge high-performance computing systems, optimizing distributed training environments, and ensuring system reliability as we scale.

Key Responsibilities:

  • Infrastructure Design & Scalability:

    • Architect and implement large-scale ML training pipelines that leverage parallel GPU processing on platforms like GCP or AWS.

    • Enhance our existing infrastructure to fully exploit parallelism and design for future expansion, ensuring that our system is ready to support growth.

  • High-Performance ML Computing & Distributed Systems:

    • Manage and optimize high-performance computing resources.

    • Develop robust distributed computing solutions, addressing challenges like race conditions, memory optimization, and resource allocation.

    • Optimize model training with techniques like mixed precision, ZeRO, Lora, etc.

  • Job Scheduling & Reliability:

    • Design systems for job rescheduling, automated retries, and failure recovery to maximize uptime and training efficiency.

    • Implement intelligent job queuing mechanisms to optimize training workloads and resource utilization.

  • Storage & Data Handling:

    • Evaluate and implement tradeoffs between different local and networked storage solutions to improve data throughput and access.

    • Develop strategies for caching training data to optimize performance.

  • Collaboration & Continuous Improvement:

    • Work closely with ML researchers and data scientists to understand training requirements and bottlenecks.

    • Continuously monitor system performance, identify areas for improvement, and implement best practices to enhance scalability and reliability.

Required Qualifications:

  • Bachelor’s degree or higher in Computer Science or a related field.

  • At least 7 years of professional experience in the software industry, with a minimum of 2 years in a tech lead role.

  • Proven experience with high-performance computing environments and distributed systems.

  • Demonstrated ability to scale ML training systems and optimize resource utilization.

  • Hands-on experience with job scheduling systems and managing cloud GPU environments (GCP, AWS, etc.).

  • Deep understanding of distributed computing concepts, including race conditions, memory optimization, and parallel processing.

  • Hands-on experience in ML model tuning for performance.

  • Experience with common ML training and inference tools including PyTorch, TensorRT, Triton, Accelerate, etc.

  • Strong analytical and problem-solving skills with the ability to troubleshoot complex system issues.

  • Excellent communication skills to collaborate effectively with cross-functional teams.

Preferred Qualifications:

  • Experience with container orchestration tools (e.g., Kubernetes) and infrastructure-as-code frameworks.

If you're passionate about building scalable ML systems and optimizing high-performance computing infrastructures, we'd love to hear from you.

Posted 2026-02-22

Recommended Jobs

(Korean Bilingual) Marketing Coordinator

Harmonious Hiring LLC
Brea, CA

Position Overview We are seeking a Marketing Coordinator to support day-to-day marketing operations across digital, physical, and event-based channels. This role is ideal for a detail-oriented…

View Details
Posted 2026-01-15

Staff Program Manager - Contingent Workforce Systems

zoox
Foster, CA

We are seeking a Senior Program Manager with deep experience in Vendor Management Systems (VMS) and the contingent workforce domain. This role will own the end-to-end delivery of a greenfield conting…

View Details
Posted 2025-12-18

Data Analyst (1881)

Kooner Fleet Management Solutions
Sacramento, CA

About Kooner Fleet Management Solutions  Kooner Fleet Management Solutions  is one of the fastest-growing national providers of on-site fleet maintenance, preventative service, and mobile repair…

View Details
Posted 2026-02-22

Bakery Team Leader (Department Manager)

Whole Foods Market
California

Provides overall leadership to the Bakery team. Responsible for all aspects of daily operations including profitability, expense control, buying, merchandising, labor, regulatory compliance and spec…

View Details
Posted 2026-01-30

RRT Respiratory Therapist Per Diem Nights

ScionHealth
San Leandro, CA

ScionHealth is committed to a culture of service excellence as demonstrated by our employees’ adherence to the service excellence principles of Pride, Teamwork, Compassion, Integrity, Respect,…

View Details
Posted 2026-01-30

Direct Support Professional

Elwyn
Sonoma, CA

Overview: Pay Rate: $27/hour Multiple FT/PT Schedules available Join a Team That Changes Lives   For more than 170 years, Elwyn has been leading the way in supporting children, teens, and a…

View Details
Posted 2026-01-09

Nurse Practitioner - $180000.00

NP Now
San Francisco, CA

Stable and growing organization is looking to hire a Nurse Practitioner to work in the San Francisco, CA area! Join a group of like-minded, energetic, smart, curious and health-conscious medical p…

View Details
Posted 2026-01-24

Data Scientist, Algorithms

Lyft
San Francisco, CA

At Lyft, our purpose is to serve and connect. We aim to achieve this by cultivating a work environment where all team members belong and have the opportunity to thrive. Data Science is at the hear…

View Details
Posted 2026-02-10

Frontend Software Engineer

Mlabs
San Francisco, CA

Frontend Software Engineer Location: San Francisco Hybrid | Full-time Compensation: $160K – $300K • Offers Equity We are a high-growth, venture-backed startup that has invented a better w…

View Details
Posted 2026-02-22

Director of Residences

Marriott
Los Angeles, CA

JOB SUMMARY Implements high standards for all aspects of life-safety, loss-prevention, unit owner identity, and privacy protection. Operates within the constraints of the residences budget. Pr…

View Details
Posted 2026-01-30