Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

AI Breakdown

arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL

2023-10-24

In this episode we discuss Contrastive Prefence Learning: Learning from Human Feedback without RL by Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh. Traditional approaches to Reinforcement Learning from Human Feedback (RLHF) assume that human preferences align with reward, but recent research suggests they align with regret under the user's optimal policy. This flawed assumption complicates the optimization of the learned reward...

In this episode we discuss Contrastive Prefence Learning: Learning from Human Feedback without RL by Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh. Traditional approaches to Reinforcement Learning from Human Feedback (RLHF) assume that human preferences align with reward, but recent research suggests they align with regret under the user's optimal policy. This flawed assumption complicates the optimization of the learned reward function using RL. Contrastive Preference Learning (CPL) is proposed as a new approach that learns optimal policies directly from preferences without the need for RL, using maximum entropy and a contrastive objective. CPL is off-policy, applicable to various problems, and can handle high-dimensional and sequential RLHF tasks.

View more

Comments (3)

More Episodes

You may also like

One Quote, One Story

Disney Family Stories & Gossip

The Saad Truth with Dr. Saad

The Mel Robbins Podcast

The Jordan B. Peterson Podcast

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

The Jordan Harbinger Show

All Ears English Podcast

The Caregiver’s Journey

The Simple Nursing Podcast - The Simplest Way To Pass Nursing School

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com