Staff Software Engineer, ML Performance & Systems
Help fal maintain its frontier position on model performance for generative media models. Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage. Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities. Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.
Key Responsibilities:
Help fal maintain its frontier position on model performance for generative media models.
Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage.
Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.
Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.
Requirements:
Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.
Deep understanding of cutting edge ML infrastructure stack (anything from PyTorch, TensorRT, TransformerEngine to Nsight), including model compilation, quantization, and serving architectures. Ideally following closely the developments in all these systems as they happen.
Have a fundamental view of the underlying hardware (Nvidia based systems at the moment), and when necessary go deeper into the stack to fix bottlenecks (custom GEMM kernels with CUTLASS for common shapes).
Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.
New frontier: multi-dimensional model parallelism (combining multiple parallelism techniques like TP with context parallel / sequence parallel).
Familiar with internals of Ring Attention, FA3, FusedMLP implementations.
What we offer at fal:
Interesting and challenging work
Competitive salary and equity
Employee-friendly equity terms (early exercise, extended exercise)
A lot of learning and growth opportunities
We offer visa sponsorship and will help you relocate to San Francisco.
Health, dental, and vision insurance (US)
Regular team events and offsite
Compensation:
$180,000 - $250,000 + equity + comprehensive benefits package
Location:
We are currently hiring in downtown San Francisco.
Recommended Jobs
Software Engineer, Infrastructure
Member of Technical Staff, Software Engineer Location: Palo Alto, CA (Hybrid) What You'll Work On At Reinforce Labs, we partner directly with customers to build AI systems that enhance the…
Union Warehouse Associate
Union Warehouse Associate Location Ventura, CA : : Since 1953, Ferguson has been a source of quality supplies for a variety of industries. Together We Build Better infrastructure, better homes and b…
Senior-Software Engineer Compiler Concepts and Optimizations
Siemens EDA is a global technology leader in Electronic Design Automation software. Our software tools enable companies around the world to develop highly innovative electronic products faster and mo…
Class A Freight Truck Driver Job
Class A Freight Truck Driver Job We are seeking a Dedicated Truck Driver to join our team! You will be responsible for safely operating a truck with a capacity of at least 26,000 pounds Gross Vehicle…
2.93 Software Engineer: ROS Developer
Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the…
Accounting Payable Clerk
Accounting Payable Clerk Location Fontana, CA : Responsibilities: Enter AP Invoices: Maintain AP invoices match with bills mailed in. Enter Manual Checks: Enter manual checks based on the ACH or …
Telemetry Unit Clinical RN Nurse Coordinator
Telemetry Unit Clinical Nurse Coordinator — Elevate telemetry nursing leadership in Southern California The Telemetry Unit Clinical Nurse Coordinator will step into a high-impact leadership role at …
CQV Engineer
By clicking the “Apply” button, I understand that my employment application process with Takeda will commence and that the information I provide in my application will be processed in line with Taked…
Data Analyst
Data Analyst (Founding Team) Stealth Healthcare Start-Up &##128205; SF-based preferred | &##129523; Some travel required | &##128336; Full-time A Different Approach to Building Healthcare Tech…
Staff Product Manager, Mobile
About the team & opportunity What’s so great about working on Calendly’s Product team? We strive to design a seamless product experience that delights our customers. Why do we need you? Wel…