ML Infrastructure Engineer

Phizenix
Menlo Park, CA

ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire


Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.

We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities




  • Design and manage distributed infrastructure for ML training at scale



  • Optimize model serving systems for low-latency inference



  • Build automated pipelines for data processing, model training, and deployment



  • Implement observability tools to monitor performance in production



  • Maximize resource utilization across GPU clusters and cloud environments



  • Translate research requirements into robust, scalable system designs


Must-Haves




  • PhD in Computer Science, Engineering, or a related field (or equivalent experience)



  • Strong foundation in software engineering, systems design, and distributed systems



  • Experience with cloud platforms (AWS, GCP, or Azure)



  • Proficient in Python and at least one systems-level language (C++/Rust/Go)



  • Hands-on experience with Docker, Kubernetes, and CI/CD workflows



  • Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective



  • Understanding of GPU programming and high-performance infrastructure


Nice-to-Haves




  • Experience with large-scale ML training clusters and GPU orchestration



  • Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)



  • Experience with distributed training strategies (e.g., data/model/pipeline parallelism)



  • Familiarity with orchestration tools like Kubeflow or Airflow



  • Background in performance tuning, system profiling, and MLOps best practices


At Phizenix , we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.

California Pay Range

$180,000 - $200,000 USD

Posted 2025-11-25

Recommended Jobs

Employee Relations and Compliance Specialist

Waymo
California

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building …

View Details
Posted 2025-11-21

Product Manager - Posture

Obsidian Security
Palo Alto, CA

Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more.    …

View Details
Posted 2025-11-25

R&D Automation Engineer (Onsite)

Axelgaard Manufacturing Co., Ltd.
Fallbrook, CA

Summary:  Research, design, develop, and implement equipment automation and robotic applications. Develop, modify and troubleshoot the software which operates current and new equipment.   Essential…

View Details
Posted 2025-11-09

Sales Manager - Pharma (Central)

Univar Solutions Univar Solutions
Dublin, CA

A Place Where People Matter. Start your career journey with Univar Solutions! Here you can make an impact on the world around you and accelerate your career in areas that energize and excite you. …

View Details
Posted 2025-11-21

Software Engineer (Senior)

Modern Treasury
San Francisco, CA

OVERVIEW This position can be based out of San Francisco, New York, or remote (we accept candidates from the following states: AZ, CA, CO, CT, FL, GA, HI, IL, MA, MI, MN, MT, NC, NJ, NV, NY, OH, OK…

View Details
Posted 2025-11-25

Principal Software Engineer - Front End

Veeva Systems
Pleasanton, CA

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in histo…

View Details
Posted 2025-07-31

Staff Product Manager, Ads Monetization

Tubi
San Francisco, CA

Tubi is a global entertainment company and the most watched free TV and movie streaming service in the U.S. and Canada. Dedicated to providing all people access to all the world’s stories, Tubi offer…

View Details
Posted 2025-11-28

Software Engineer, Backend

Newsbreak
Mountain View, CA

About NewsBreak NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, our mission is to fos…

View Details
Posted 2025-11-25

Software Engineer

Baya Systems
Santa Clara, CA

Baya Systems is inspired by the baya bird, also known as the weaver. Baya birds weave very unique and intricate hanging nests from different materials. The nests are robust and safe while being extrem…

View Details
Posted 2025-11-25

Software Engineer - Product

Doe
San Francisco, CA

If you have a github repo with 500+ stars or have built a product with 10k+ users, please email [email protected] directly to have your application fast-tracked. At Doe, we’re building an AI workf…

View Details
Posted 2025-11-19