AI Infrastructure Engineer
Our client, an early-stage, AI-driven startup in the defense industry, is hiring an AI Infrastructure Engineer to join their team in California. The successful candidate will design and scale the foundation of their model training and deployment ecosystem to enable their vision-language-action models to learn from massive real-world datasets and operate seamlessly across both edge and cloud environments.
Responsibilities
Design and implement pipelines to ingest, transform and store petabytes of multimodal data from their robotic and operator systems.
Develop tools for dataset exploration, curation, versioning and quality monitoring.
Build and maintain distributed training infrastructure for large-scale multimodal and foundation model training, both in the cloud and on-premises.
Implement orchestration workflows to launch, track and debug large-scale model runs.
Identify and resolve bottlenecks in compute, memory, storage and network performance.
Collaborate with AI, autonomy and systems teams to support real-time and mission-critical applications.
Maintain observability and reliability tools for training and inference pipelines.
Stay up to date with best practices in MLOps, distributed training frameworks and AI infrastructure at scale.
Skillset
Bachelor’s degree or higher in Computer Science, Electrical Engineering or a related technical field.
Minimum of 3 years of experience in ML infrastructure, MLOps or large-scale data systems.
Proven experience with distributed training frameworks (e.g. PyTorch DDP, DeepSpeed, Ray) and workflow orchestration tools (e.g. Kubernetes, Airflow, or equivalents).
Strong proficiency in Python and hands-on experience with cloud-native infrastructure (AWS, GCP or Azure).
Solid understanding of data engineering concepts, including ETL pipelines, object storage, data versioning and metadata management.
Familiarity with containerization technologies (Docker, Kubernetes) and monitoring systems (Prometheus, Grafana).
Experience optimizing GPU cluster utilization, scaling training jobs and profiling model performance.
Experience with edge-deployed ML systems, federated training or robotic data collection pipelines is a plus.
Must have legal authorization to work in the U.S.; certain responsibilities may involve access to export-controlled information.
Benefits
Salary: $160K – $220K DOE. Exceptional candidates may be considered for higher compensation.
Performance Bonus.
Equity.
Medical, dental and vision insurance.
56740
Recommended Jobs
Nanny
Get hired for MD's nanny Job in Los Angeles, CA. Seeking Mandarin Speaking Au Pair. Find nanny care work in Los Angeles.
Revit/CAD Drafter (Torrance)
This Jobot Job is hosted by: Jeana Patel Are you a fit? Easy Apply now by clicking the Apply button and sending us your resume. Salary: $60,000 - $70,000 per year A bit about us: We are a …
Legal Assistant - Consumer Class Action
Legal Secratary - Consumer Class Action Wilshire Law Firm is a distinguished, award-winning legal practice with over 18 years of experience, specializing in Personal Injury, Employee Rights, and Co…
Embedded Software Engineer 3 San Diego, CA
Embedded Software Engineer 3 San Diego, CA Experience: A bachelors degree in Software Engineering, Computer Engineering, or a related field. Minimum of 5 years of relevant experience, …
SATCOM Test & Evaluation Engineer
Title: SATCOM Test & Evaluation Engineer Belong. Connect. Grow. with KBR! KBR's National Security Solutions team provides high-end engineering and advanced technology solutions to our custom…
Accounts Payable (CRL)
Come Join Us! From apartments in New York to hospitals and stadiums in Dallas, libraries at prestigious universities to creating modern retail experiences, our teams contribute architectural glass …
Principal Solutions Specialist
Sales & Technical Leadership Own the technical sales strategy for SDS in your accounts/territory; partner with AEs on account plans, qualification, and opportunity strategies. Run multi‑perso…
Head of Engineering / Tech Lead
ABOUT BELLAGENT Bellagent is an enterprise-level AI agent platform built to automate everyday business operations with zero-touch integrations that connect instantly to more than 1,300 applicatio…
Software Engineer I
Job Responsibilities: Design and implement automated test frameworks and tools to fill test gaps. Manage contingent engineers on the team to ensure high output volume. Partner with infrastru…
Security Guard
Liberty Behavioral & Community Services, Inc.is currently seeking a Security Guard (Unarmed). We’re looking for highly competent security guards to help monitor the premises and protect our valued st…