ML Infrastructure Engineer
ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire
Client Opportunity | Through Phizenix
Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.
We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.
Responsibilities
Design and manage distributed infrastructure for ML training at scale
Optimize model serving systems for low-latency inference
Build automated pipelines for data processing, model training, and deployment
Implement observability tools to monitor performance in production
Maximize resource utilization across GPU clusters and cloud environments
Translate research requirements into robust, scalable system designs
Must-Haves
PhD in Computer Science, Engineering, or a related field (or equivalent experience)
Strong foundation in software engineering, systems design, and distributed systems
Experience with cloud platforms (AWS, GCP, or Azure)
Proficient in Python and at least one systems-level language (C++/Rust/Go)
Hands-on experience with Docker, Kubernetes, and CI/CD workflows
Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective
Understanding of GPU programming and high-performance infrastructure
Nice-to-Haves
Experience with large-scale ML training clusters and GPU orchestration
Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)
Experience with distributed training strategies (e.g., data/model/pipeline parallelism)
Familiarity with orchestration tools like Kubeflow or Airflow
Background in performance tuning, system profiling, and MLOps best practices
At Phizenix , we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.
California Pay Range
$180,000 - $200,000 USD
Recommended Jobs
Employee Relations and Compliance Specialist
Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building …
Product Manager - Posture
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more. …
R&D Automation Engineer (Onsite)
Summary: Research, design, develop, and implement equipment automation and robotic applications. Develop, modify and troubleshoot the software which operates current and new equipment. Essential…
Sales Manager - Pharma (Central)
A Place Where People Matter. Start your career journey with Univar Solutions! Here you can make an impact on the world around you and accelerate your career in areas that energize and excite you. …
Software Engineer (Senior)
OVERVIEW This position can be based out of San Francisco, New York, or remote (we accept candidates from the following states: AZ, CA, CO, CT, FL, GA, HI, IL, MA, MI, MN, MT, NC, NJ, NV, NY, OH, OK…
Principal Software Engineer - Front End
Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in histo…
Staff Product Manager, Ads Monetization
Tubi is a global entertainment company and the most watched free TV and movie streaming service in the U.S. and Canada. Dedicated to providing all people access to all the world’s stories, Tubi offer…
Software Engineer, Backend
About NewsBreak NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, our mission is to fos…
Software Engineer
Baya Systems is inspired by the baya bird, also known as the weaver. Baya birds weave very unique and intricate hanging nests from different materials. The nests are robust and safe while being extrem…
Software Engineer - Product
If you have a github repo with 500+ stars or have built a product with 10k+ users, please email [email protected] directly to have your application fast-tracked. At Doe, we’re building an AI workf…