AF - Investigating Bias Representations in LLMs via Activation Steering by DawnLu
The Nonlinear Library: Alignment Forum

AF - Investigating Bias Representations in LLMs via Activation Steering by DawnLu

2024-01-15
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Investigating Bias Representations in LLMs via Activation Steering, published by DawnLu on January 15, 2024 on The AI Alignment Forum. Produced as part of the SPAR program (fall 2023) under the mentorship of Nina Rimsky. Introduction Given recent advances in the AI field, it's highly likely LLMs will...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free