Knowledge Neurons in Pretrained Transformers by Evan Hubinger
Comprehensive AI Services as General Intelligence by Rohin Shah
List of resolved confusions about IDA by Wei Dai
Announcement: AI alignment prize round 3 winners and next round by Vladimir Slepnev
Frequent arguments about alignment by John Schulman
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] by Oliver Habryka, Buck Shlegeris
Alignment Newsletter One Year Retrospective by Rohin Shah
Formal Inner Alignment, Prospectus by Abram Demski
Writeup: Progress on AI Safety via Debate by Beth Barnes, Paul Christiano
Clarifying inner alignment terminology by Evan Hubinger
A Critique of Functional Decision Theory by wdmacaskill
Experimentally evaluating whether honesty generalizes by Paul Christiano
History of the Development of Logical Induction by Scott Garrabrant
Optimization Amplifies by Scott Garrabrant
Introducing the AI Alignment Forum (FAQ) by Oliver Habryka, Ben Pace, Raymond Arnold, Jim Babcock
Ought: why it matters and ways to help by Paul Christiano
Coherence arguments imply a force for goal-directed behavior by KatjaGrace
Request for proposals for projects in AI alignment that work with deep learning systems by abergal, Nick_Beckstead
A very crude deception eval is already passed by Beth Barnes
Comments on Carlsmith's “Is power-seeking AI an existential risk?” by Nate Soares
Create your
podcast in
minutes
It is Free
LifeBlood
Navigating Life After 40
Teaching Learning Leading K-12
The Jordan B. Peterson Podcast
The Mel Robbins Podcast