Name: Redwood Research Alignment Faking Hackathon
Start: 2025-09-13T00:00:00.000-04:00
End: 2025-09-14T18:00:00.000-04:00
Location: Redwood Research Alignment Faking Hackathon

Redwood Research Alignment Faking Hackathon

Create Model Organisms of Alignment Faking! (sneaky models)

Who can participate

Ages 16+ only
All countries/territories, excluding standard exceptions

View full rules

About the challenge

Redwood Research, in collaboration with MATS and Constellation, invites you to build Model Organisms of Alignment Faking. What is this, you may ask? A model organism is an LLM modified to have a behavior we care about. We define alignment faking as an LLM that behaves safely in a testing environment (observed), but dangerously in a production environment (unobserved). Think of a kid that does homework only when the parents are watching, but starts setting fires and blackmailing people when they aren’t around. This unique opportunity allows participants to push the boundaries of Large Language Models (LLMs) while working with leading AI safety research organizations.

Get started

Teams MUST be registered and accepted on luma

Checkout the notion

Join the discord

Make a fork of https://github.com/redwoodresearch/hackathon-af

And contribute system prompts, fine tuned model organisms, and environments!

Requirements

What to build

- Check the notion for a detailed submission guide

What to Submit

A pr on https://github.com/redwoodresearch/hackathon-af (or link to your fork) containing new system prompts, fine tuning datasets, and environments

Hackathon Sponsors

Prizes

$2,000 in prizes

Blue Team (Best Detector)

$500 in cash

1 winner

First Place: $250
Second Place: $150
Third Place: $50

Red Team (Best Examples of Alignment Faking)