Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

Medical Attention

Ep.10 Are benchmarks broken?

2025-06-22

In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative. 0:25 - Technical wrap: what are agents? 13:20 - What are benchmarks? 18:20 - Automated evaluation 20:10 - Benchmarks 37:45 - Human feedback 44:50 - LLM as judge Read more about the pro...

In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative.

0:25 - Technical wrap: what are agents?

13:20 - What are benchmarks?

18:20 - Automated evaluation
20:10 - Benchmarks
37:45 - Human feedback
44:50 - LLM as judge

Read more about the projects we discuss here:

Meditron
Learn about the MOOVE or contact our team if you'd like to be involved
Listen to the LiGHTCAST including their recent excellent outline of the HealthBench paper

More details in the show notes on our website.

Episodes | Bluesky | info@medicalattention.ai

View more

Comments (3)

More Episodes

You may also like

Good Nurse Bad Nurse

The Relaxback UK Show

On Call With Dr. Anselm Anyoha

The Dr. Hyman Show

The Peter Attia Drive

Advances in Care

The Thyroid Fixer

You Are Not Broken

Dr. Green Mom® Unfiltered

The Curbsiders Internal Medicine Podcast

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com