ML Infrastructure Engineer

Phizenix
Menlo Park, CA

ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire


Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models—built for faster generation, multimodal integration, and scalable enterprise deployment.

We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities




  • Design and manage distributed infrastructure for ML training at scale



  • Optimize model serving systems for low-latency inference



  • Build automated pipelines for data processing, model training, and deployment



  • Implement observability tools to monitor performance in production



  • Maximize resource utilization across GPU clusters and cloud environments



  • Translate research requirements into robust, scalable system designs


Must-Haves




  • PhD in Computer Science, Engineering, or a related field (or equivalent experience)



  • Strong foundation in software engineering, systems design, and distributed systems



  • Experience with cloud platforms (AWS, GCP, or Azure)



  • Proficient in Python and at least one systems-level language (C++/Rust/Go)



  • Hands-on experience with Docker, Kubernetes, and CI/CD workflows



  • Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective



  • Understanding of GPU programming and high-performance infrastructure


Nice-to-Haves




  • Experience with large-scale ML training clusters and GPU orchestration



  • Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)



  • Experience with distributed training strategies (e.g., data/model/pipeline parallelism)



  • Familiarity with orchestration tools like Kubeflow or Airflow



  • Background in performance tuning, system profiling, and MLOps best practices


At Phizenix , we’re committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let’s build the future—together.

California Pay Range

$180,000 - $200,000 USD

Posted 2025-09-22

Recommended Jobs

Senior software engineer full stack

Verse Medical
San Francisco, CA

Our Mission: Hospital-Quality Care, Everywhere. The healthcare industry still relies on faxes and phone tag to coordinate critical care for patients at home. We think patients and the clinicians…

View Details
Posted 2025-10-27

Full Stack Software Engineer

Kapwing
San Francisco, CA

Video editing is the final frontier of software tools moving to the cloud. We're making the next generation of modern creators tools to enable everyone to share their story online. Join us at …

View Details
Posted 2025-09-13

ServiceNow Security Organization (SSO) - Associate Information Security Analyst Intern

Servicenow
San Diego, CA

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow st…

View Details
Posted 2025-10-13

Jefe de Marketing

trabajito
Santa Cruz, CA

En Cruzimex, seguimos fortaleciendo nuestro equipo para acompañar el crecimiento de nuestras marcas. Estamos en busca de un Jefe de Marketing , con experiencia en gestión multicategoría, estrate…

View Details
Posted 2025-10-31

Product Manager, Growth

Descript
San Francisco, CA

Descript’s vision is to put video in every communicator’s toolkit. Back in the day you needed like six monitors and a bachelor’s degree to edit video. Descript lets you do it by editing docs & slides…

View Details
Posted 2025-09-25

Principal Infrastructure Engineer

Nextdata Technologies Inc
San Francisco, CA

The company The future of data lies in decentralization, and the concept of a data mesh is the proven approach for implementing this at Enterprise scale. We’re here to make it a reality. Nextdata …

View Details
Posted 2025-09-22

Software Engineer (Helm)

Amidon Heavy Industries
Los Angeles, CA

About Us Amidon Heavy Industries is a venture-backed startup transforming subsea inspection and monitoring. We build autonomous surface vessels (USVs) that autonomously launch and recover remotely o…

View Details
Posted 2025-09-22

Software & System Test Engineer (Lead, Automation & AI-Driven QA)

Chaos Industries
Hawthorne, CA

CHAOS Inc. is a global technology company delivering next-generation capabilities to the defense and critical industrial sectors. Founded in 2022 by a seasoned leadership team, CHAOS has quickly beco…

View Details
Posted 2025-10-22

AI Agent Engineer

Symbolica AI
San Francisco, CA

About Us Symbolica is an AI research lab pioneering the application of category theory to enable logical reasoning in machines. We’re a well-resourced, nimble team of experts on a mission to br…

View Details
Posted 2025-09-22

Senior AI Engineer

Linkedin
Sunnyvale, CA

Company Description LinkedIn is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful con…

View Details
Posted 2025-10-19