Ep.10 Are benchmarks broken?
Medical Attention

Ep.10 Are benchmarks broken?

2025-06-22
In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative. 0:25 - Technical wrap: what are agents? 13:20 - What are benchmarks? 18:20 - Automated evaluation 20:10 - Benchmarks 37:45 - Human feedback 44:50 - LLM as judge Read more about the pro...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free