Back to Jobs

Research Engineer / Scientist, Safeguards (San Francisco)

The Rundown AI, Inc.

San Francisco, CA

About the role
The Safeguards Research Team is part of the Alignment Science team , and conducts critical safety research and engineering to ensure AI systems can be deployed safely. As part of Anthropic's broader safeguards organization, we work on both immediate safety challenges and longer-term research initiatives, with projects spanning jailbreak robustness, automated red-teaming, monitoring techniques, and applied threat modeling. We prioritize techniques that will enable the safe deployment of more advanced AI systems (ASL-3 and beyond), taking a pragmatic approach to fundamental AI safety challenges while maintaining strong research rigor.
You take a pragmatic approach to running machine learning experiments to help us understand and steer the behavior of powerful AI systems. You care about making AI helpful, honest, and harmless, and are interested in the ways that this could be challenging in the context of human-level capabilities. You could describe yourself as both a scientist and an engineer. Youll both focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy), as well as better understanding risks occurring today. You will work in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team.
These papers give a simple overview of the topics the team works on: Best-of-N Jailbreaking , Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats , Rapid Response: Mitigating LLM Jailbreaks with a Few Examples , Many-shot Jailbreaking , When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Note: Currently, the team has a preference for candidates who are able to be based in the Bay Area. However, we remain open to any candidate who can travel 25% to the Bay Area.

Representative projects:

Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions.
Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
Write scripts and prompts to efficiently produce evaluation questions to test models reasoning abilities in safety-relevant contexts.
Contribute ideas, figures, and writing to research papers, blog posts, and talks.
Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy.

You may be a good fit if you:

Have significant software, ML, or research engineering experience
Have some experience contributing to empirical AI research projects
Have some familiarity with technical AI safety research
Prefer fast-moving collaborative projects to extensive solo efforts
Pick up slack, even if it goes outside your job description
Care about the impacts of AI

Strong candidates may also:

Have experience authoring research papers in machine learning, NLP, or AI safety
Have experience with LLMs
Have experience with reinforcement learning
Have experience with Kubernetes clusters and complex shared codebases

#J-18808-Ljbffr

Posted 2025-08-10

Recommended Jobs

Installation Manager

Etched

San Jose, CA

Job Description Job Description About Etched Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an…

View Details

Posted 2025-07-29

Technical Subject Matter Expert - Planning Analytics (San Francisco)

IBM Computing

San Francisco, CA

Introduction The IBM Technical Subject Matter Expert role within the sales process is to fully understand a prospects business problem and construct a solution around that problem leveraging the IBM…

View Details

Posted 2025-08-10

Cybersecurity - SCADA Tech

Mantek Solutions

Los Angeles, CA

Cybersecurity SCADA Tech W2 local /on site Our direct client has an opening for an SR. Engineering Tech (CYBERSECURITY) for their Control System Application Services Team at the Headquarter…

View Details

Posted 2025-08-07

Project Management Consultant

Capio Group

Sacramento, CA

Capio Group is looking for an experienced Project Management Consultant! Full-time employee - Remote Salary: $130,000 - $140,000 About Us: Capio Group is a California-based Information Tech…

View Details

Posted 2025-08-07

Human Resources Specialist (Benefits)

Sunline Transit Agency

Thousand Palms, CA

Human Resources Specialist (Benefits) Location Thousand Palms, CA : GENERAL PURPOSE Under the general direction of the Human Resources Manager, the Human Resources Specialist - Benefits (“Specialist…

View Details

Posted 2025-07-31

Residential Security Agent

Skywalker Holdings, LLC

San Anselmo, CA

The Residential Security Agent (RSA) an essential role in the overall security program and will be responsible for the safety and security of personnel, assets, facilities and information for the com…

View Details

Posted 2025-08-07

Carpenter

Berkeley, CA

Job Description We are looking for Framing and High End Finish Carpenters for PeopleReady Skilled Trades, you?ll support high end custom home contractors in the Alameda and Contra Costa Counties. …

View Details

Posted 2025-07-30

Collections Coordinator

Consultative Search Group

Los Angeles, CA

A large global professional services firm seeks a Collections Coordinator to join their dynamic team on a temporary basis. Firm is ranked among the Top 3 “Best Firms To Work For” to work for in their…

View Details

Posted 2025-07-29

Fallbrook Caregiver for Seniors - NOC Shifts

Qualicare, San Diego

Fallbrook, CA

Job Description Job Description Job Description **WORK NEAR YOUR HOME** for $19-$22/hour PPE provided Now hiring in the following cities: North County San Diego: Oceanside, Vista…

View Details

Posted 2025-07-30

Assistant Coach, Women's Lacrosse

Stanford University

Stanford, CA

Assistant Coach, Women's Lacrosse **Department of Athletics, Physical Education, and Recreation, Stanford, California, United States** **New** Athletics Post Date 4 hours ago Requisition # 106947 Welc…

View Details

Posted 2025-07-29