Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Commentary on AGI Safety from First Principles, published by Richard Ngo on the AI Alignment Forum.
My AGI safety from first principles report (which is now online here) was originally circulated as a google doc. Since there was a lot of good discussion in comments on the original document, I thought it would be worthwhile putting some of it online, and have copied out most of the substantive comment threads here. Many thanks to all of the contributors for their insightful points, and to Habryka for helping with formatting. Note that in some cases comments may refer to parts of the report that didn't make it into the public version.
Discussion on the whole report
Will MacAskill
Thanks so much for writing this! Huge +1 to more foundational work in this area.
My overall biggest worry with your argument is just whether it's spending a lot of time defending something that's not really where the controversy lies. (This is true for me; I don't know if I'm idiosyncratic.) Distinguish two claims one could argue for:
Claim 1: At some point in the future, assuming continued tech progress, history will have primarily become the story of AI systems doing things. The goals of those AI systems, or the emergent path that results from interactions among these systems, will probably not be what you reading this document want to happen.
I find claim 1 pretty uncontroversial. And I do think that this alone is enough for far more of the world to be thinking about AI than currently is.
But it feels like at least for longtermist EAs trying to prioritise among causes (or for non-longtermists deciding how much to prioritise safety vs speed on AI), the action is much more on a more substantial claim like:
Claim 2: Claim 1 is true, and the point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime, and the transition will be fast, and we can meaningfully affect how this transition goes with very long-lasting impacts, and (on the classic formulations at least) the transition will be to a single AI agent with more power than all other agents combined, and what we should try to do in response to all this is ensure that the AI systems that get built have goals that are the same as the goals of those who design the AI systems.
Each of the new sub-claims in claim 2, I find (highly) controversial. And you talk a little bit about some of these sub-claims, but it's not the focus.
Interested if you think that's an unfair characterisation. Perhaps you see yourself as arguing for something in between Claim 1 and Claim 2.
Richard Ngo
I think it's fair to say that I'm defending claim 1. I think that a lot of people would disagree with it, because:
a) They don't picture AI systems having goals in a way that's easily separable from the goals of the humans who use them; or
b) They think that humans will retain enough power over AIs that the "main story" will be what humans choose to do, even if some AIs have goals we don't like; or
c) They think that it'll be easy to make AIs have the goals we want them to have; or
d) They think that, even if the outcome is not specifically what they want, it'll be within some range of acceptable variation (in a similar way to how our current society is related to our great-great-grandparents').
My thoughts on the remaining parts of claim 2:
a) "The point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime"
OpenPhil are investigating timelines very thoroughly, so I'm happy to defer to them.
b) The transition will be fast.
I make some arguments about this in the "speed of AI development" section. But broadly speaking, I don't want this version of the argument to depend on the claim that it'll be very fast (i.e. there's a "takeoff" from something like our current world lasting less than...
view more