Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

Robots Talking

Testing Large Language Models using Using Multi-Agents? Talking Robots EP5

2025-02-28

Todays in Robots Talking - This paper introduces Multi-Agent Verification (MAV), a novel method to improve large language model performance at test time by using multiple verifiers to evaluate candidate outputs. The authors propose Aspect Verifiers (AVs), off-the-shelf LLMs that check different aspects of the outputs, as a practical way to implement MAV. The algorithm, BoN-MAV, combines best-of-n sampling with these AVs, selecting the output with the most approvals from the v...

Todays in Robots Talking - This paper introduces Multi-Agent Verification (MAV), a novel method to improve large language model performance at test time by using multiple verifiers to evaluate candidate outputs. The authors propose Aspect Verifiers (AVs), off-the-shelf LLMs that check different aspects of the outputs, as a practical way to implement MAV. The algorithm, BoN-MAV, combines best-of-n sampling with these AVs, selecting the output with the most approvals from the verifiers. Experiments show that MAV improves performance across various tasks and models and scales effectively by increasing either the number of candidate outputs or the number of verifiers. The study also demonstrates that MAV enables weak-to-strong generalization, where smaller, weaker models can verify the output from stronger LLMs, and even self-improvement, using the same model for generation and verification.

View more

Comments (3)

More Episodes

You may also like

MPIR Old Time Radio

Ham Radio Crash Course Podcast

Conversations on the Creek

All-In with Chamath, Jason, Sacks & Friedberg

Elliot in the Morning

The Ultimate Art Bell Podcast Feed

Podbean Amplified

Lex Fridman Podcast

The Wheel of Time

Agatha Christie BBC Dramatisations

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com