Sr. Infrastructure Engineer

Edison Scientific

San Francisco, CA

About

Edison Scientific focuses on building and commercializing AI agents for science, and shares FutureHouse’s mission to build an AI Scientist - scaling autonomous research, productizing it, and applying it to critical challenges such as drug development.

Role

As a Senior Infrastructure Engineer, you'll play a key role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus will be the orchestration for our agents at scale — building and managing clusters that orchestrate thousands of persistent, stateful workloads, developing custom resource definitions (CRDs) and operators, and ensuring the reliability and efficiency of our compute layer at scale.

Our mission is to build an AI scientist, and you'll own the infrastructure foundation it runs on. AI agents performing long-running scientific research demand resilient scheduling, lifecycle management, and resource orchestration far beyond typical cloud-native workloads. This role will influence platform architecture, establish infrastructure best practices, and partner closely with backend engineers, ML engineers, and researchers to deliver a production-grade environment that lets science move faster.

At Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact, making sound architectural tradeoffs, and building foundations that allow teams and science to move faster.

Responsibilities

Architect, implement, and operate Kubernetes clusters that support thousands of concurrent, persistent resources (agents, jobs, services) with high availability and efficient resource utilization.

Design and develop custom resource definitions (CRDs) and Kubernetes operators to model and manage domain-specific workloads such as AI agent lifecycles, research pipelines, and long-running compute tasks.

Drive the strategy for cluster scaling, node pool management, autoscaling policies, and resource quota frameworks to handle rapid workload growth.

Build and maintain infrastructure-as-code (Terraform, Pulumi, or similar) for reproducible, version-controlled environment management.

Design and implement robust scheduling, placement, and affinity strategies to optimize cost, performance, and fault tolerance for heterogeneous workloads (CPU, GPU, memory-intensive).

Establish and uphold best practices around observability, monitoring, alerting, and incident response for infrastructure systems (Prometheus, Grafana, Datadog, or similar).

Own storage and networking strategy within Kubernetes — including persistent volume management, CSI drivers, service mesh, network policies, and ingress architecture.

Troubleshoot complex, cross-system infrastructure issues and guide others through effective debugging and remediation in distributed environments.

Collaborate closely with backend, ML, and research teams to understand workload requirements and translate them into reliable infrastructure patterns.

Qualifications

5+ years of professional infrastructure or platform engineering experience, with deep hands-on Kubernetes expertise in production environments.

Experience designing and implementing custom resource definitions (CRDs) and Kubernetes operators (using frameworks such as Kubebuilder, Operator SDK, or controller-runtime).

Track record of operating and scaling Kubernetes clusters supporting thousands of persistent or long-lived resources (stateful workloads, persistent pods, long-running jobs).

Deep understanding of Kubernetes internals — API server, etcd, scheduler, controller manager, kubelet — and how they behave at scale.

Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS) and associated networking, storage, and IAM primitives.

Proficiency in at least one systems or backend language for operator development and infrastructure tooling.

Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or Crossplane) and GitOps workflows.

Strong working knowledge of container networking (CNI plugins, service mesh, network policies), storage (CSI, persistent volumes, StatefulSets), and security (RBAC, Pod Security Standards, secrets management).

Ability to operate autonomously, make sound technical judgments, and drive projects from concept through production.

Bonus points for:

Experience with data-intensive platforms, scientific computing, or ML/AI infrastructure.

Prior experience in startups or small teams with significant architectural ownership and ambiguity.

Experience scaling systems, teams, or platforms through periods of rapid growth.

Location + Compensation

Collaboration is at the heart of discovery. We work on-site to stay close to the science, move faster as a team, and share the kind of energy that only happens when smart, curious people build together- in a space that we love to be in!

Location: San Francisco (Dogpatch)

At Edison Scientific, we know that titles can cover a range of experience levels. Actual base pay will depend on factors such as skills, experience, and scope of responsibility. Compensation ranges may evolve as we continue to grow. In addition to base pay, team members are eligible for equity, benefits, and other perks.

Compensation: $200,000- $350,000+ and equity

Posted 2026-02-19

Recommended Jobs

Data Engineer Intern

Skydio

San Mateo, CA

Skydio is the leading US drone company and the world leader in autonomous flight, the key technology for the future of drones and aerial mobility. The Skydio team combines deep expertise in artificia…

View Details

Posted 2026-03-04

Manager, Controls Validation (Transmission, Distribution or Substation) - Location Flexible

PG&E Corporation

Oakland, CA

Requisition ID # 171424 Job Category: Government and Regulatory Relations Job Level: Manager/Principal Business Unit: Strategy & Growth Work Type: Hybrid Job Location: Oakland; Al…

View Details

Posted 2026-03-21

Manager, Industry Solutions, Life Science

Deloitte LLP

California

Join our AI & Engineering team in transforming technology platforms, driving innovation, and helping make a significant impact on our clients' success. You'll work alongside talented professionals rei…

View Details

Posted 2026-04-03

Interventional Radiologic Tech - Cardiac Cath Lab - FT Days $10,000 Sign-On Bonus!

University of California, Irvine

Orange, CA

Overview: UCI Health is the clinical enterprise of the University of California, Irvine, and the only academic health system based in Orange County. UCI Health is comprised of its main campus, UCI …

View Details

Posted 2026-01-27

Director, Real Estate

Equinix

Ontario, CA

Who are we? Equinix is the world’s digital infrastructure company®, shortening the path to connectivity to enable the innovations that enrich our work, life and planet. A place where bold idea…

View Details

Posted 2026-02-24

Class A CDL-WEST Regional Reefer- 2 Weeks OTR-$1200-$1300 ! *Trainees

A Man With a Plan Services LLC

San Francisco, CA

Please read entire ad No recent grads Must have Clean Valid Class A CDL Clean CDL = No Incidents within past year 6 months-Class A 53' tractor trailer Experience within past year Required …

View Details

Posted 2026-04-04

Process Engineering Manager

Henkel

Rancho Dominguez, CA

What you´ll do Deliver high quality products at competitive cost. The position oversees the day-to-day process engineering function including managing short- and long-term projects related to proc…

View Details

Posted 2026-01-12

Lead Software Engineer Risk Solutions

Paypal

San Jose, CA

Makes technical decisions affecting multiple teams, crossing organizational boundaries Establishes conventions & processes to be followed by other employees Actions determine the utilization of compan…

View Details

Posted 2026-02-28

Senior Infrastructure Engineer

Corridor

San Francisco, CA

Description AI has changed software development. Security hasn't caught up – until now. Corridor is changing the game of product security, giving developers the ability to secure their AI coding. …

View Details

Posted 2026-02-22

Partner Deployed Engineer - US

Cognition

San Francisco, CA

We are an applied AI lab building end-to-end software agents. We’re the makers of Devin, the first AI software engineer. Cognition is building collaborative AI teammates that enable engineers to foc…

View Details

Posted 2026-02-16