Software Engineer - Power Management, Hardware Health

Openai
San Francisco, CA

About the Team

OpenAI’s Hardware Health team is dedicated to ensuring the optimal performance and reliability of our custom-built hyperscale supercomputers. We focus on maximizing supercomputing capacity for research and ensuring that our researchers are minimally impacted by hardware faults. This team is critical in maintaining the infrastructure that supports cutting-edge AI research at OpenAI.

The Hardware Health team operates within the broader Platform organization, which is incubated inside OpenAI’s Research team. Our work is on the front lines of innovation, supporting the engineering and research required to train large-scale AI models of unprecedented capability.

About the Role

As a Software Engineer on the Hardware Health team focused on power management, you will work on critical infrastructure to support cutting-edge research. With large-scale supercomputers consuming substantial amounts of power, managing this efficiently is key to maximizing computational capacity. This role is critical to ensuring that our cutting-edge research supercomputing infrastructure runs smoothly, while maintaining reliability and grid-level power stability.

Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.

In this role, you will:

  • Develop and implement system-level and software-level solutions to optimize power usage in large-scale supercomputers, ensuring efficient and reliable operations.

  • Build automation to monitor power consumption patterns during training workloads and design algorithms to stabilize these fluctuations, preventing issues with grid reliability.

  • Work with researchers and engineers to design tools for real-time monitoring, detection, and remediation of power-related hardware and system faults.

  • Collaborate cross-functionally to translate complex electrical system requirements into code, while driving continuous improvements in power management solutions.

  • Drive the development of power throttling mechanisms at the IT system level to dynamically adjust power usage based on workload demands and infrastructure limitations.

  • Collaborate with hardware design teams to integrate system-level power control requirements into IT hardware design, ensuring seamless coordination between software-driven power management and hardware capabilities.

You might thrive in this role if you have:

  • 7+ years of software engineering experience with a focus on solving large-scale, system-level challenges.

  • Strong proficiency in Python and familiarity with automation and scripting tools (e.g., shell scripting).

  • Experience with distributed systems to efficiently aggregate and analyze streaming data.

  • Knowledge of electrical engineering concepts including digital signal processing, power systems, Fast Fourier Transforms, or related areas.

  • Experience in system-level investigations and development of automated solutions to address power management, fault detection, and remediation.

  • Strong analytical skills and the ability to dig into noisy data (experience with SQL, PromQL, Pandas, etc.).

  • Comfort working with both hardware and software teams to solve multidisciplinary problems.

Bonus points if you have:

  • Deep expertise with the power characteristics of synchronous workloads (as seen in supercomputing or model training environments).

  • Knowledge of power control requirements in IT hardware design, with the ability to drive cross-functional collaboration to integrate power management features into hardware systems effectively.

  • Working knowledge of control system fundamentals and how physical systems respond to control strategies.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link .

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Posted 2025-12-19

Recommended Jobs

Group Director, Data Science-1

Walmart Inc.
Sunnyvale, CA

What you'll do at Position Summary... What you'll do... We are seeking a talented and passionate Group Director, Data Science- Pricing, you will lead a high-impact team responsible for dri…

View Details
Posted 2025-12-19

Senior Software Engineer (C/C++, Secure Access)

Keeper Security, Inc.
El Dorado Hills, CA

We are seeking a highly motivated and experienced Senior Software Engineer to join our Keeper Connection Manager (KCM) team. This is a 100% remote position with an opportunity to work a hybrid schedu…

View Details
Posted 2025-12-22

Senior Data Analyst

Glossgenius
San Francisco, CA

About GlossGenius GlossGenius is building an ecosystem enabling entrepreneurs to succeed. We empower small business owners to focus on being creators, not admins, by offering a range of business m…

View Details
Posted 2026-01-13

Restaurant Server

The Counter Burger - Cupertino
Cupertino, CA

Description At The Counter, our intention is not only to meet our guest’s expectations, but to exceed them. This requires a true commitment and dedication on our staff’s part. We are committed to …

View Details
Posted 2025-11-21

Senior RV/Marine Relationship Manager (Remote)

jobgether
California

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Remote Indirect RV/Marine Relationship Manager. In this role, you will be instrumental in executing…

View Details
Posted 2026-01-09

Lead Product Manager - Risk & Compliance

Bill
San Jose, CA

Innovate with purpose At BILL, we believe in empowering the businesses that drive our economy. By replacing outdated financial processes with innovative tools, we help businesses—from startups t…

View Details
Posted 2025-11-28

Cross - data analyst

HUB International
San Diego, CA

IGNITE YOUR PASSION * IMPACT WHAT MATTERS WHO WE ARE. Breaking Boundaries for 25 years - and counting. The world is rapidly changing, and HUB is here to advise businesses and individuals on …

View Details
Posted 2026-01-09

Lead Product Manager

Match Group
Los Angeles, CA

The League is a curated dating community for ambitious singles—whether they're seeking love, friendship, or networking. Following our acquisition by Match Group (home of Tinder, Hinge, etc.), we're e…

View Details
Posted 2025-11-25

Spanish Sign Language Interpreters

Language Link
Los Angeles, CA

On-Site ASL Interpreters – Los Angeles California We’re looking for skilled Spanish Sign Language interpreters to provide on-site interpretation in Los Angeles CA About the Role: You will sup…

View Details
Posted 2025-12-25

Server

Sagora Senior Living
Rocklin, CA

Servidor Los camareros son personas agradables y amigables que prosperan en un entorno de ritmo rápido y brindan un excelente servicio al cliente al garantizar que los residentes tengan una excele…

View Details
Posted 2025-12-31