AF - Simple probes can catch sleeper agents by Monte MacDiarmid
The Nonlinear Library: Alignment Forum

AF - Simple probes can catch sleeper agents by Monte MacDiarmid

2024-04-23
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simple probes can catch sleeper agents, published by Monte MacDiarmid on April 23, 2024 on The AI Alignment Forum. This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free