Jitters No Evidence of Stupidity in RL
Testing The Natural Abstraction Hypothesis: Project Update by johnswentworth
Sources of intuitions and data on AGI by Scott Garrabrant
What counts as defection? by Alex Turner
Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns by Andreas Stuhlmüller
Modelling Transformative AI Risks (MTAIR) Project: Introduction by David Manheim, Aryeh Englander
How DeepMind's Generally Capable Agents Were Trained by 1a3orn
Open question: are minimal circuits daemon-free? by Paul Christiano
In Logical Time, All Games are Iterated Games by Abram Demski
Gradations of Inner Alignment Obstacles by Abram Demski
Truthful AI: Developing and governing AI that does not lie by Owain Evans, Owen Cotton-Barratt, Lukas Finnveden
Learning the prior by Paul Christiano
Thoughts on the Alignment Implications of Scaling Language Models by leogao
Bayesian Probability is for things that are Space-like Separated from You by Scott Garrabrant
The Inner Alignment Problem by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Clarifying some key hypotheses in AI alignment by Ben Cottier, Rohin Shah
Reflections on Larks’ 2020 AI alignment literature review by Alex Flint
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress by Mark Xu
Humans Are Embedded Agents Too by johnswentworth
Reply to Paul Christiano on Inaccessible Information by Alex Flint
Create your
podcast in
minutes
It is Free
Navigating Life After 40
Teaching Learning Leading K-12
Regenerative Skills
The Jordan B. Peterson Podcast
The Mel Robbins Podcast