DevOps Engineer
The Center for AI Safety (CAIS) is a leading research and advocacy organization focused on mitigating societal-scale risks from AI. We address AI’s toughest challenges through technical research, field-building initiatives, and policy engagement, along with our partner 501(c)4 organization, Center for AI Safety Action Fund (CAISAF).
We’re looking for a versatile DevOps Engineer to support our cloud-based GPU cluster and contribute to engineering projects for our research team. In this role, you’ll work with our cloud provider to maintain and scale our infrastructure and support users with any technical issues they face. Your work will enable our research team to run experiments productively and will also empower a wide range of researchers at Stanford, Berkeley, CMU, Cambridge, Harvard, and other top universities worldwide, who partner with CAIS by using our cloud infrastructure, to produce technical research on AI safety. You will support our research team for their other engineering needs, for example the development and maintenance of lightweight public-facing websites for various research projects. Depending on the organization’s needs and your skillset, you may also contribute to engineering projects for other teams.
This is a great opportunity for a generalist engineer who enjoys operating at the intersection of infrastructure and software development and is excited to work on a wide variety of technical challenges in a mission-driven environment.
Key Responsibilities:
- Maintain our cloud infrastructure to ensure scalability, availability and performance, and design and test upgrades and new features.
- Collaborate with service providers to maintain high availability.
- Monitor cluster resource usage, generate billing reports and support capacity planning to ensure efficient utilization of cluster resources.
- Develop and maintain lightweight web-based tools, dashboards, and other services end-to-end.
- Maintain and update existing websites (e.g., static sites, research tools).
- Collaborate with research and operations teams to scope and implement new technical projects as needed.
You might be a good fit if you:
- Are a generalist engineer with experience in full-stack development
- Have previous SRE or DevOps experience in managing customer-facing systems in a 24/7 environment.
- Have built simple web applications or tools (e.g. using Flask, React, or static site generators).
- Are excited to take ownership of diverse technical projects and collaborate across a small, fast-moving team.
The following skills and experiences would be beneficial, though it is not required to have all of these prior to starting the role:
- Have experience provisioning and maintaining distributed systems using containerization tools such as Docker or Apptainer.
- Have a solid understanding of distributed systems including storage, networking, and security.
- Have worked with ML pipelines, HPC systems, or SLURM-based workflows.
$100,000 - $140,000 a year
The Center for AI Safety is an Equal Opportunity Employer. We consider all qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, ancestry, age, disability, medical condition, marital status, military or veteran status, or any other protected status in accordance with applicable federal, state, and local laws. In alignment with the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records for employment.
If you require a reasonable accommodation during the application or interview process, please contact [email protected].
We value diversity and encourage individuals from all backgrounds to apply.
Recommended Jobs
Release Of Information Specialist (Cypress)
At Houston Methodist, the Release of Information (ROI) Specialist position is responsible for responding to internal and external requests for patient health information, including receipt, data entr…
Cosmetic Consultant
Cosmetic Consultant Location Malibu, CA : The offices of Dr. Janet Vafaie Dermatology are seeking a candidate with great work ethic and people skills for Dr. Janet Vafaie Dermatology in Los Angeles. …
Case Manager - Casitas de Esperanza
Title: Case Manager - Casitas de Esperanza REPORTS TO: Program Manager CLASSIFICATION: Non-exempt, Full Time COMPENSATION: $30/Hr. plus eligible for benefits including: medical, dental, visi…
Full Stack Engineer
About Us At Resilience, we’re creating a new category that integrates cybersecurity, cyber insurance, and cyber risk management. Founded in 2016 by experts from across the highest tiers of the …
Regional Manager, Houston
Req ID: 6893 Department: Sales Status: Reg F-T Exempt, Exempt Location: Houston, California (US-CA) Workplace Location: Remote Job Summary: The Regional Manager (RM) is respons…
SIM GME Position at St. Joseph's Medical Center in Stockton, CA
TeamHealth has an SIM GME position available at St. Joseph's Medical Center in Stockton, California. This 64-bed ED and Level 3 Trauma Center sees an annual volume of 100,000 with a 28% admission rat…
Full Time Critical Care Job CA
Whether you are searching for a position in your area or in another state, we have professionals to help you achieve your goals through our relationships with facilities nationwide - in rural settings…
Software Engineer, Online Storage
About the Team We are the Online Storage team powering ChatGPT, Sora, and the OpenAI APIs. We’re a growing team set up to own the databases and online‑storage infrastructure that serve all our produ…
Purchasing Manager, LA
Homebound is on a mission to make it possible for anyone, anywhere, to build a home using technology. Created by an experienced team of construction, real estate, design, and technology experts, Home…
Principal Program Manager
Embark on a transformative career with Safran Passenger Innovations, where we are forging an unparalleled in-flight entertainment ecosystem. As an innovative company we design and engineer world-clas…