Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
AI Breakdown

Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

2025-02-27
In this episode, we discuss PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling by Avery Ma, Yangchen Pan, Amir-massoud Farahmand. The paper introduces PANDAS, a hybrid technique that enhances many-shot jailbreaking by altering fabricated dialogues with positive affirmations, negative demonstrations, and optimized adaptive sampling tailored to specific prompts. Experimental results on AdvBench and HarmBench using advanced large...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free