Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

AI Breakdown

Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

2025-06-09

In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models by Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay. The paper introduces Vibe-Eval, an open benchmark and framework with 269 visual...

In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models by Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay. The paper introduces Vibe-Eval, an open benchmark and framework with 269 visual understanding prompts designed to evaluate multimodal chat models on everyday and challenging tasks. It highlights that over half of the hardest prompts are incorrectly answered by current frontier models, emphasizing the benchmark's difficulty. The authors discuss evaluation methods, demonstrate correlation between automatic and human assessments, provide free API access, and release all code and data publicly. Github: https://github.com/reka-ai/reka-vibe-eval

View more

Comments (3)

More Episodes

You may also like

Viva Frei - Recovering Former Litigator! From Law to Politics & Beyond

Closer To Truth

Self-Mastery Become Your Best

The Mel Robbins Podcast

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

‌BPLUS بی‌پلاس پادکست فارسی خلاصه کتاب

Halacha Headlines

The Caregiver’s Journey

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com