AI Filters Will Always Have Holes
The Quanta Podcast

AI Filters Will Always Have Holes

2026-01-06
Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. Just as quickly as these “jailbreaks” appear, AI companies patch them by simply filtering out forbidden prompts before they ever reach the model itself.Recently, cryptographers have shown how the defensive filters put around powerful language models can be subverted by well-studied cryptogra...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free