AF - Jailbreak steering generalization by Sarah Ball
The Nonlinear Library: Alignment Forum

AF - Jailbreak steering generalization by Sarah Ball

2024-06-20
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jailbreak steering generalization, published by Sarah Ball on June 20, 2024 on The AI Alignment Forum. This work was performed as part of SPAR We use activation steering (Turner et al., 2023; Rimsky et al., 2023) to investigate whether different types of jailbreaks operate via similar internal mechanisms. We...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free