ML Infrastructure Engineer
ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire
Client Opportunity | Through Phizenix
Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.
We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.
Responsibilities
Design and manage distributed infrastructure for ML training at scale
Optimize model serving systems for low-latency inference
Build automated pipelines for data processing, model training, and deployment
Implement observability tools to monitor performance in production
Maximize resource utilization across GPU clusters and cloud environments
Translate research requirements into robust, scalable system designs
Must-Haves
PhD in Computer Science, Engineering, or a related field (or equivalent experience)
Strong foundation in software engineering, systems design, and distributed systems
Experience with cloud platforms (AWS, GCP, or Azure)
Proficient in Python and at least one systems-level language (C++/Rust/Go)
Hands-on experience with Docker, Kubernetes, and CI/CD workflows
Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective
Understanding of GPU programming and high-performance infrastructure
Nice-to-Haves
Experience with large-scale ML training clusters and GPU orchestration
Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)
Experience with distributed training strategies (e.g., data/model/pipeline parallelism)
Familiarity with orchestration tools like Kubeflow or Airflow
Background in performance tuning, system profiling, and MLOps best practices
At Phizenix , we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.
California Pay Range
$180,000 - $200,000 USD
Recommended Jobs
Senior software engineer full stack
Our Mission: Hospital-Quality Care, Everywhere. The healthcare industry still relies on faxes and phone tag to coordinate critical care for patients at home. We think patients and the clinicians…
Full Stack Software Engineer
Video editing is the final frontier of software tools moving to the cloud. We're making the next generation of modern creators tools to enable everyone to share their story online. Join us at …
ServiceNow Security Organization (SSO) - Associate Information Security Analyst Intern
Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow st…
Jefe de Marketing
En Cruzimex, seguimos fortaleciendo nuestro equipo para acompañar el crecimiento de nuestras marcas. Estamos en busca de un Jefe de Marketing , con experiencia en gestión multicategoría, estrate…
Product Manager, Growth
Descript’s vision is to put video in every communicator’s toolkit. Back in the day you needed like six monitors and a bachelor’s degree to edit video. Descript lets you do it by editing docs & slides…
Principal Infrastructure Engineer
The company The future of data lies in decentralization, and the concept of a data mesh is the proven approach for implementing this at Enterprise scale. We’re here to make it a reality. Nextdata …
Software Engineer (Helm)
About Us Amidon Heavy Industries is a venture-backed startup transforming subsea inspection and monitoring. We build autonomous surface vessels (USVs) that autonomously launch and recover remotely o…
Software & System Test Engineer (Lead, Automation & AI-Driven QA)
CHAOS Inc. is a global technology company delivering next-generation capabilities to the defense and critical industrial sectors. Founded in 2022 by a seasoned leadership team, CHAOS has quickly beco…
AI Agent Engineer
About Us Symbolica is an AI research lab pioneering the application of category theory to enable logical reasoning in machines. We’re a well-resourced, nimble team of experts on a mission to br…
Senior AI Engineer
Company Description LinkedIn is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful con…