Software Engineer - Pretraining Data
Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.
About the role:
As a Software Engineer working on our pretraining data, you write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.
What you might work on:
Design & implement multimodal (video, audio, text etc) web crawlers for scraping and indexing petabytes of data
Create large scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery etc.
Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data
Identify new data sources for inclusion in pre/post-training datasets
What we’re looking for:
Strong proficiency in distributed computing and parallel processing techniques
Obsession with details, reliability, and good testing to ensure data quality and integrity
Experience with designing and maintaining high-performance, scalable data architectures
Ability to design, develop and operate an LLM data pipeline from web scraping to data loading
Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.
Our culture:
Integrity. Words and actions should be aligned
Hands-on. At Magic, everyone is building
Teamwork. We move as one team, not N individuals
Focus. Safely deploy AGI. Everything else is noise
Quality. Magic should feel like magic
Compensation, benefits and perks (US):
Annual salary range: $100K - $550K
Equity is a significant part of total compensation, in addition to salary
401(k) plan with 6% salary matching
Generous health, dental and vision insurance for you and your dependents
Unlimited paid time off
Visa sponsorship and relocation stipend to bring you to SF, if possible
A small, fast-paced, highly focused team
Recommended Jobs
Sous Chef
Description EPIC Steak seeking an experienced Sous Chef with at least 3 years experience in that position. If you enjoy a dynamic work environment, problem solving, connecting and motivating kitch…
Assistant Manager
Assistant Manager - Car Wash at Soapy Joe’s Pay Range: $21.50 - $25 per hour Bonus Eligible: Yes Ready to Shine? Join the Soapy Joe’s Team! At Soapy Joe’s, we’re not just washing cars – we…
Data analyst
Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of …
Power Generation Technician (Point Mugu NAWS)
SJS Executives, LLC (SJS), also doing business as SJS Industrial, is currently seeking three (3) Power Generation Technicians supporting ongoing operations at a remote work site thats a part of Naval …
Material Planner II
Job Responsibilities: ~Perform all routine tasks to convert planned orders into material plans including the ~ creation and management of material planned requirements ~Liaison with Supply Plan…
IBM ACE Integration Architect | HYBRID (CA)
Work Location: HYBRID - Torrance, CA (3 days/week onsite) Responsibilities: Provide high quality IBM Integration Bus IIBACE solution design that address business needs by developing it based on …
Backend Software Engineer, Enterprise Systems
Astranis builds advanced satellites for high orbits, expanding humanity’s reach into the solar system. Today, Astranis satellites provide dedicated, secure networks to highly-sophisticated customers …
Registered Veterinary Technician, Critical Care (Concord)
Registered Veterinary Technician, Critical Care About SAGE Concord: Our SAGE Concord location opened a brand-new state-of-the-art facility in the summer of 2022! It is the largest Specialty …
Software Engineer, Anvil
ABOUT THE TEAM At Anduril's Tactical Recon and Strike (TRS) team, we're pushing the boundaries of aerial drone technology. We're the driving force behind groundbreaking products like Ghost , …