Data Engineer, Scientific Data Ingestion

Mithrl
San Francisco, CA

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports.

Our traction speaks for itself:

  • 12X year-over-year revenue growth

  • Trusted by leading biotechs and big pharma across three continents

  • Driving real breakthroughs from target discovery to patient outcomes.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data — extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion — so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

  • 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.

  • Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

  • Excellent experience dealing with messy Excel / CSV / spreadsheet-style data — inconsistent headers, multiple sheets, mixed formats, free-text fields — and normalizing it into clean structures.

  • Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data.

  • Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.

  • Strong desire and ability to own the ingestion & normalization layer end-to-end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability.

  • Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

  • Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).

  • Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

  • Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion.

  • Past exposure to LLM-based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically.

  • Any background in computational biology / lab-data / bioinformatics is a bonus — though not required.

WHAT YOU WILL LOVE AT MITHRL

  • Mission-driven impact: you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

  • High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents.

  • Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders

  • Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

  • Speed: We ship fast (2x/week) and improve continuously based on real user feedback

  • Location: Beautiful SF office with a high-energy, in-person culture

  • Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Posted 2026-01-07

Recommended Jobs

Principal Application Engineer - Semiconductor Advanced Package

Henkel
Irvine, CA

What you´ll do You will be the technical lead helping customers successfully adopt Henkel’s advanced packaging materials. Your work directly impacts performance and reliability of next-generation …

View Details
Posted 2026-01-12

Licensed Insurance Sales Representative

Louis P De Angelis Insurance Agency Inc
Northridge, CA

We are seeking a passionate, self-driven, natural born salesperson with a desire to make a difference in people's lives. You will be part of a team that is helping to grow the revenue of the office …

View Details
Posted 2025-09-10

Lead Developer

UpTech
Tustin, CA

We are looking for a Lead Developer who has managed a team of developers for web and/or mobile, preferably with both frontend and backend experience. As a lead developer, you will work directly with …

View Details
Posted 2025-12-25

Personal Lines Service Assistant

Heffernan Network Insurance Brokers
Tracy, CA

Job Summary: The main role of a Personal Lines Service Assistant is to support Account Managers and Producers with clerical tasks related to account servicing. Under the guidance of an Account Manag…

View Details
Posted 2026-01-10

Software Engineer ll

Hertz
San Francisco, CA

Software Engineer ll Location San Francisco, CA : A Day in the Life: Come join us in our effort to digitally transform Hertz! Recent innovations such as smartphones, electric vehicles, and ride-hail…

View Details
Posted 2026-01-09

Sr. Data Scientist, Recommendations

Match Group
Los Angeles, CA

Our Mission Launched in 2012, Tinder® revolutionized how people meet, growing from 1 match to one billion matches in just two years. This rapid growth demonstrates its ability to fulfill a fundament…

View Details
Posted 2025-12-25

Senior Wastewater Process Engineer

Black & Veatch Family of Companies
Irvine, CA

Why Black and Veatch Black & Veatch allows you to lend your talent and perspective to humanity’s biggest challenges in a flexible environment where you are empowered to grow and explore new possib…

View Details
Posted 2025-11-13

Delivery Driver

Stockton, CA

Job Schedule Corporate Retail Store Job ID 72963 Delivery Driver The salary range for this role is $19.25 to $20.25 per hour.*  Delivery Drivers Keep Aaron’s Moving This isn’t some tedi…

View Details
Posted 2025-12-18

Forklift Operator/Yard worker

Reliable Resources Agency
Los Angeles, CA

POSITION SUMMARY: The Forklift operator/Yard worker is responsible for safely and efficiently processing incoming recycling materials so that outbound commodities are acceptable for sale to variou…

View Details
Posted 2025-11-18

Network Engineer V (Extreme Certified)

CSV-TAUREAN
Presidio of Monterey, CA

Location: Presidio of Monterey, CA  Clearance: Secret Overview: Senior engineer specializing in Extreme Networks infrastructure. Responsibilities: Architect, configure, and optimize LAN/WAN …

View Details
Posted 2025-09-16