arxiv Preprint - RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
AI Breakdown

arxiv Preprint - RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

2023-08-02
In this episode we discuss RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment by Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian. The paper presents a method called Reinforcement Learning from Contrast Distillation (RLCD) for aligning language models to natural language principles. RLCD trains a preference model using simulated preference pairs and uses reinforcement learning to improve an unaligned language model. Experimental results...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free