Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An artificially structured argument for expecting AGI ruin, published by Rob Bensinger on May 7, 2023 on LessWrong.
Philosopher David Chalmers asked:
[I]s there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an explicit argument with premises and a conclusion?
Unsurprisingly, the actual reason people expect AGI ruin isn't a crisp deductive argument; it's a probabilistic update based on many lines of evidence. The specific observations and heuristics that carried the most weight for someone will vary for each individual, and can be hard to accurately draw out.
That said, Eliezer Yudkowsky's So Far: Unfriendly AI Edition might be a good place to start if we want a pseudo-deductive argument just for the sake of organizing discussion. People can then say which premises they want to drill down on.
In The Basic Reasons I Expect AGI Ruin, I wrote:
When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems".
It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too.
STEM-level AGI is AGI that has "the basic mental machinery required to do par-human reasoning about all the hard sciences", though a specific STEM-level AGI could (e.g.) lack physics ability for the same reasons many smart humans can't solve physics problems, such as "lack of familiarity with the field".
A simple way of stating the argument in terms of STEM-level AGI is:
Substantial Difficulty of Averting Instrumental Pressures: As a strong default, absent alignment breakthroughs, STEM-level AGIs that understand their situation and don't value human survival as an end will want to kill all humans if they can.
Substantial Difficulty of Value Loading: As a strong default, absent alignment breakthroughs, STEM-level AGI systems won't value human survival as an end.
High Early Capabilities. As a strong default, absent alignment breakthroughs or global coordination breakthroughs, early STEM-level AGIs will be scaled to capability levels that allow them to understand their situation, and allow them to kill all humans if they want.
Conditional Ruin. If it's very likely that there will be no alignment breakthroughs or global coordination breakthroughs before we invent STEM-level AGI, then given 1+2+3, it's very likely that early STEM-level AGI will kill all humans.
Inadequacy. It's very likely that there will be no alignment breakthroughs or global coordination breakthroughs before we invent STEM-level AGI.
Therefore it's very likely that early STEM-level AGI will kill all humans. (From 1–5)
I'll say that the "invention of STEM-level AGI" is the first moment when an AI developer (correctly) recognizes that it can build a working STEM-level AGI system within a year. I usually operationalize "early STEM-level AGI" as "STEM-level AGI that is built within five years of the invention of STEM-level AGI".
I think humanity is very likely to destroy itself within five years of the invention of STEM-level AGI. And plausibly far sooner — e.g., within three months or a year of the technology's invention. A lot of the technical and political difficulty of the situation stems from this high level of time pressure: if we had decades to work with STEM-level AGI before catastrophe, rather than months or years, we would have far more time to act, learn, try and fail at various approaches, build political will, craft and implement policy, etc.
This argument focuses on "human survival", but from my perspec...
view more