arxiv preprint - Nash Learning from Human Feedback
AI Breakdown

arxiv preprint - Nash Learning from Human Feedback

2023-12-05
In this episode we discuss Nash Learning from Human Feedback by Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot from Google DeepMind. The paper introduces Nash Learning from Human Feedback (NLHF), a new approach for tuning large language models (LLMs) based on human preferences,...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free