Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda, publishedby Chi Nguyen on the AI Alignment Forum.
Write a Review
Crossposted from the EA forum
You can read this post as a google docs instead (IMO much better to read).
This document aims to clarify the AI safety research agenda by Paul Christiano (IDA) and the arguments around how promising it is.
Target audience: All levels of technical expertise. The less knowledge about IDA someone has, the more I expect them to benefit from the writeup.
Writing policy: I aim to be as clear and concrete as possible and wrong rather than vague to identify disagreements and where I am mistaken. Things will err on the side of being too confidently expressed. Almost all footnotes are content and not references.
Epistemic Status: The document is my best guess on IDA and might be wrong in important ways. I have not verified all of the content with somebody working on IDA. I spent ~4 weeks on this and have no prior background in ML, CS or AI safety.
I wrote this document last summer (2019) as part of my summer research fellowship at FHI. I was planning to restructure, complete and correct it since but haven’t gotten to it for a year, so decided to just publish it as it is. The document has not been updated, i.e. nothing that has been released since September 2019 is incorporated into this document. Paul Christiano generously reviewed the first third to a half of this summary. I added his comments verbatim in the document. Apologies for the loss of readability due to this. This doesn’t imply he endorses any part of this document, especially the second half which he didn't get to review.
Purpose of this document: Clarifying IDA
IDA is Paul Christiano’s AI safety research agenda.[1] Christiano works at OpenAI which is one of the main actors in AI safety and IDA is by many considered the most complete[2] AI safety agenda.
However, people who are not directly working on IDA are often confused about how exactly to understand the agenda. Clarifying IDA would make it more accessible for technical people to work on and easier to assess for nontechnical people who want to think about its implications.
I believe that there are currently no resources on IDA that are both easy to understand and give a complete picture. Specifically, the current main resources are:
the “Iterated Amplification” sequence which is a series of curated posts by Paul Christiano that can be quite difficult to understand,
this post by Ajeya Cotra and this video by Robert Miles which are both easy to understand but limited in scope and don’t provide many details,
Alex Zhu’s FAQ to IDA which clarifies important points but does not set them in context with the entire research agenda,
an 80,000 podcast with Paul Christiano which explains some intuitions behind IDA but is not comprehensive and is in speech form.
This document aims to fill the gap and give a comprehensive and accessible overview of IDA.
Summary: IDA in 7 sentences
IDA stands for Iterated Amplification and is a research agenda by Paul Christiano from OpenAI.
IDA addresses the artificial intelligence (AI) safety problem, specifically the danger of creating a very powerful AI which leads to catastrophic outcomes.
IDA tries to prevent catastrophic outcomes by searching for a competitive AI that never intentionally optimises for something harmful to us and that we can still correct once it’s running.
IDA doesn’t propose a specific implementation, but presents a rough AI design and a collection of thoughts on whether this design has the potential to create safe and powerful AI and how the details of that design could look like.
The proposed AI design is to use a safe but slow way of scaling up an AI’s capabilities, distill this into a faster but slightly weaker AI, which can be scal...
view more