Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

AI Breakdown

Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example

2025-05-07

In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One Training Example by Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen. The paper demonstrates that reinforcement learning with verifiable reward using only one or two training examples (1-shot RLVR) substantially improves mathematical reasoning in large language models,...

In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One Training Example by Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen. The paper demonstrates that reinforcement learning with verifiable reward using only one or two training examples (1-shot RLVR) substantially improves mathematical reasoning in large language models, nearly doubling performance on benchmarks like MATH500. This method generalizes across different models, algorithms, and examples, showing unique phenomena such as post-saturation generalization and the importance of policy gradient loss and exploration encouragement. The authors provide open-source code and data, highlighting the potential for more data-efficient RLVR approaches in improving LLM capabilities.

View more

Comments (3)

More Episodes

You may also like

Viva Frei - Recovering Former Litigator! From Law to Politics & Beyond

Closer To Truth

Self-Mastery Become Your Best

The Mel Robbins Podcast

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

‌BPLUS بی‌پلاس پادکست فارسی خلاصه کتاب

Easy German: Learn German with native speakers | Deutsch lernen mit Muttersprachlern

The Caregiver’s Journey

Coffee Break Spanish

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com