Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Comments on Carlsmith's “Is power-seeking AI an existential risk?”, published by Nate Soares on the AI Alignment Forum.
The following are some comments I gave on Open Philanthropy Senior Research Analyst Joe Carlsmith’s Apr. 2021 “Is power-seeking AI an existential risk?”, published with permission and lightly edited. Joe replied; his comments are included inline. I gave a few quick replies in response, that I didn't want to worry about cleaning up; Rob Bensinger has summarized a few of them and those have also been added inline.
I think Joe Carlsmith's report is clear, extensive, and well-reasoned. I also agree with his conclusion, that there's at least a 5% chance of catastrophic risk from AI by 2070. In fact, I think that number is much too low. I'll now attempt to pinpoint areas of disagreement I have with Joe, and put forth some counterarguments to Joe's position.
Warning: this is going to be a bit quick-and-dirty, and written in a colloquial tongue.
I'll start by addressing the object-level disagreements, and then I'll give a few critiques of the argument style.
On the object level, let's look at Joe's "shorter negative" breakdown of his argument in the appendix:
Shorter negative:
By 2070:
1. It will become possible and financially feasible to build APS AI systems.
65%
2. It will much more difficult to build APS AI systems that would be practically PS-aligned if deployed than to build APS systems that would be practically PS-misaligned if deployed, but which are at least superficially attractive to deploy anyway | 1.
35%
3. Deployed, practically PS-misaligned systems will disempower humans at a scale that constitutes existential catastrophe | 1-2.
20%
Implied probability of existential catastrophe from scenarios where all three premises are true: ~5%
My odds, for contrast, are around 85%, 95%, and 95%, for an implied 77% chance of catastrophe from these three premises, with most of our survival probability coming from "we have more time than I expect". These numbers in fact seem a bit too low to me, likely because in giving these very quick-and-dirty estimates I failed to account properly for the multi-stage fallacy (more on that later), and because I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts. But the difference between 5% and 77% is stark enough to imply significant object-level disagreement, and so let's focus on that first, without worrying too much about the degree.
"we have more time than I expect"
Joe Carlsmith: I'd be curious how much your numbers would change if we conditioned on AGI, but after 2070.
[Partial summary of Nate’s reply: Nate would give us much better odds if AGI came after 2070.]
I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts
Joe Carlsmith: Would be curious to hear more about these scenarios. The main ones salient to me are "we might see unintentional deployment of practically PS-misaligned APS systems even if they aren’t superficially attractive to deploy" and "practically PS-misaligned APS systems might be developed and deployed even absent strong incentives to develop them (for example, simply for the sake of scientific curiosity)".
Maybe also cases where alignment is easy but we mess up anyway.
[Partial summary of Nate’s reply: Mostly “we might see unintentional deployment of practically PS-misaligned APS systems even if they aren’t superficially attractive to deploy”, plus the general category of weird and surprising violations of some clause in Joe’s conditions.]
Background
Before I dive into specific disagreements, a bit of background on my model of the world. Note that I'm not trying to make a large conjunctive argument here, these are just a bunch of background things that seem to be ro...
view more