Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: AI Safety Success Stories, published by Wei Dai on the AI Alignment Forum.
AI safety researchers often describe their long term goals as building "safe and efficient AIs", but don't always mean the same thing by this or other seemingly similar phrases. Asking about their "success stories" (i.e., scenarios in which their line of research helps contribute to a positive outcome) can help make clear what their actual research aims are. Knowing such scenarios also makes it easier to compare the ambition, difficulty, and other attributes of different lines of AI safety research. I hope this contributes to improved communication and coordination between different groups of people working on AI risk.
In the rest of the post, I describe some common AI safety success stories that I've heard over the years and then compare them along a number of dimensions. They are listed in roughly the order in which they first came to my attention. (Suggestions welcome for better names for any of these scenarios, as well as additional success stories and additional dimensions along which they can be compared.)
The Success Stories
Sovereign Singleton
AKA Friendly AI, an autonomous, superhumanly intelligent AGI that takes over the world and optimizes it according to some (perhaps indirect) specification of human values.
Pivotal Tool
An oracle or task AGI, which can be used to perform a pivotal but limited act, and then stops to wait for further instructions.
Corrigible Contender
A semi-autonomous AGI that does not have long-term preferences of its own but acts according to (its understanding of) the short-term preferences of some human or group of humans, it competes effectively with comparable AGIs corrigible to other users as well as unaligned AGIs (if any exist), for resources and ultimately for influence on the future of the universe.
Interim Quality-of-Life Improver
AI risk can be minimized if world powers coordinate to limit AI capabilities development or deployment, in order to give AI safety researchers more time to figure out how to build a very safe and highly capable AGI. While that is proceeding, it may be a good idea (e.g., politically advisable and/or morally correct) to deploy relatively safe, limited AIs that can improve people's quality of life but are not necessarily state of the art in terms of capability or efficiency. Such improvements can for example include curing diseases and solving pressing scientific and technological problems.
(I want to credit Rohin Shah as the person that I got this success story from, but can't find the post or comment where he talked about it. Was it someone else?)
Research Assistant
If an AGI project gains a lead over its competitors, it may be able to grow that into a larger lead by building AIs to help with (either safety or capability) research. This can be in the form of an oracle, or human imitation, or even narrow AIs useful for making money (which can be used to buy more compute, hire more human researchers, etc). Such Research Assistant AIs can help pave the way to one of the other, more definitive success stories. Examples: 1, 2.
Comparison Table
Sovereign Singleton Pivotal Tool Corrigible Contender Interim Quality-of-Life Improver Research Assistant
Autonomy High Low Medium Low Low
AI safety ambition / difficulty Very High Medium High Low Low
Reliance on human safety Low High High Medium Medium
Required capability advantage over competing agents High High None None Low
Tolerates capability trade-off due to safety measures Yes Yes No Yes Some
Assumes strong global coordination No No No Yes No
Controlled access Yes Yes No Yes Yes
(Note that due to limited space, I've left out a couple of scenarios which are straightforward recombinations of the above success stories, namely Sovereign Contender and Corrigible Singleton. I also left out C...
view more