arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL
AI Breakdown

arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL

2023-10-24
In this episode we discuss Contrastive Prefence Learning: Learning from Human Feedback without RL by Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh. Traditional approaches to Reinforcement Learning from Human Feedback (RLHF) assume that human preferences align with reward, but recent research suggests they align with regret under the user's optimal policy. This flawed assumption complicates the optimization of the learned reward...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free