Inference Software Engineer
About Etched
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents.
Job Summary
Etched’s Inference SW team enables optimal mapping of models to Sohu’s dataflow architecture and serving requests across multiple chips, hosts and racks. We are seeking a highly skilled and motivated engineer to join our team as we work towards enabling Mixture-of-Experts (MoE) architectures on Sohu systems. You’ll build SW enabling frontier inference performance to satisfy exponentially growing serving demand.
This role is for a general contributor and will be expected to contribute to all parts of our stack. We also have more specialized needs for this team posted on the site.
Key responsibilities
Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting
Scale and enhance Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling
Optimize routing and communication layers using Sohu’s collectives
Develop tools for performance profiling and debugging, identifying bottlenecks and correctness issues
You may be a good fit if you have
Proficiency in Rust and/or C++
Good familiarity with PyTorch and/or JAX.
Good familiarity with transformers architectures
Ported applications to non-standard or accelerator hardware platforms.
Solid systems knowledge, including Linux internals, accelerator architectures (e.g., GPUs, TPUs), and high-speed interconnects (e.g., NVLink, InfiniBand)
Strong candidates may also have experience with
Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.
Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
Solid grasp of large language model architectures, particularly Mixture-of-Experts (MoE).
Experience analyzing performance traces and logs from distributed systems and ML workloads.
Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
Familiar with cluster orchestration tools (e.g., Kubernetes, Slurm) and ML platforms (e.g., Ray, Kubeflow)
Experience designing and implementing CI/CD pipelines for MLOps workflows.
Benefits
Full medical, dental, and vision packages, with generous premium coverage
Housing subsidy of $2,000/month for those living within walking distance of the office
Daily lunch and dinner in our office
Relocation support for those moving to West San Jose
Compensation Range
$175,000 - $275,000
How we’re different
Etched believes in the Bitter Lesson . We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Recommended Jobs
Director, Business & Legal Affairs Litigation - Santa Monica, 90404
Director, Business & Legal Affairs Litigation - Santa Monica, 90404, United States of America How we LEAD: We are currently seeking an exceptional litigation attorney to oversee a busy docket of …
Au Pair
Get hired for Tiana's aupair Job in Temecula, CA. SoCal family seeks AuPair to help with kids & house. Find aupair care work in Temecula.
Paralegal
The Walt Disney Company is a diversified, international family entertainment and media organization whose operations include theme parks and resorts, filmed entertainment including motion pictures and…
ML Engineer
Machine Learning Engineer Menlo Park, CA On-Site Full-Time/Direct Hire Client Opportunity | Through Phizenix Phizenix, a certified minority and women-led recruiting firm, is hiring o…
Senior Product Manager
The opportunity Unity is shaping the future of how developers understand and serve their players. As a Senior Product Manager for IAP+, you'll be at the forefront of a groundbreaking transformatio…
Staff / Principal Machine Learning Engineer
About Inworld At Inworld, we believe the processes of building, scaling, and evolving applications are monsters that consume value before it can reach users. Our mission is to solve evolution and tr…
Project Manager, Gas Distribution
Requisition ID # 169137 Job Category: Project / Program Management Job Level: Individual Contributor Business Unit: Gas Operations Work Type: Hybrid Job Location: Hayward Departme…
Staff Product Manager - Transformations
About the Role We’re looking for an experienced Product Manager to evolve Transformations and Orchestration—a critical pillar of our Enterprise Platform group. In this role, you will help our cust…
Director of Personal Finance
GENERAL JOB DESCRIPTION The Director of Personal Finance – Sports Division leads the firm’s athlete-focused Personal Finance platform. This leader builds and manages a team that provides fina…
Data Infrastructure Engineer
About the Team You’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering, product, alignment teams that are core to the work we do at OpenAI. The systems we su…