Download - Comments on Carlsmith's “Is power-seeking AI an existential risk?” by Nate Soares

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Comments on Carlsmith's “Is power-seeking AI an existential risk?” by Nate Soares

2021-12-05

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Comments on Carlsmith's “Is power-seeking AI an existential risk?”, published by Nate Soares on the AI Alignment Forum. The following are some comments I gave on Open Philanthropy Senior Research Analyst Joe Carlsmith’s Apr. 2021 “Is power-seeking AI an existential risk?”, published with permission and lightly edited. Joe replied; his comments are included inline. I gave a few quick replies in response, that I didn't want to worry about cleaning up; Rob Bensinger has summarized a few of them and those have also been added inline. I think Joe Carlsmith's report is clear, extensive, and well-reasoned. I also agree with his conclusion, that there's at least a 5% chance of catastrophic risk from AI by 2070. In fact, I think that number is much too low. I'll now attempt to pinpoint areas of disagreement I have with Joe, and put forth some counterarguments to Joe's position. Warning: this is going to be a bit quick-and-dirty, and written in a colloquial tongue. I'll start by addressing the object-level disagreements, and then I'll give a few critiques of the argument style. On the object level, let's look at Joe's "shorter negative" breakdown of his argument in the appendix: Shorter negative: By 2070: 1. It will become possible and financially feasible to build APS AI systems. 65% 2. It will much more difficult to build APS AI systems that would be practically PS-aligned if deployed than to build APS systems that would be practically PS-misaligned if deployed, but which are at least superficially attractive to deploy anyway | 1. 35% 3. Deployed, practically PS-misaligned systems will disempower humans at a scale that constitutes existential catastrophe | 1-2. 20% Implied probability of existential catastrophe from scenarios where all three premises are true: ~5% My odds, for contrast, are around 85%, 95%, and 95%, for an implied 77% chance of catastrophe from these three premises, with most of our survival probability coming from "we have more time than I expect". These numbers in fact seem a bit too low to me, likely because in giving these very quick-and-dirty estimates I failed to account properly for the multi-stage fallacy (more on that later), and because I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts. But the difference between 5% and 77% is stark enough to imply significant object-level disagreement, and so let's focus on that first, without worrying too much about the degree. "we have more time than I expect" Joe Carlsmith: I'd be curious how much your numbers would change if we conditioned on AGI, but after 2070. [Partial summary of Nate’s reply: Nate would give us much better odds if AGI came after 2070.] I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts Joe Carlsmith: Would be curious to hear more about these scenarios. The main ones salient to me are "we might see unintentional deployment of practically PS-misaligned APS systems even if they aren’t superficially attractive to deploy" and "practically PS-misaligned APS systems might be developed and deployed even absent strong incentives to develop them (for example, simply for the sake of scientific curiosity)". Maybe also cases where alignment is easy but we mess up anyway. [Partial summary of Nate’s reply: Mostly “we might see unintentional deployment of practically PS-misaligned APS systems even if they aren’t superficially attractive to deploy”, plus the general category of weird and surprising violations of some clause in Joe’s conditions.] Background Before I dive into specific disagreements, a bit of background on my model of the world. Note that I'm not trying to make a large conjunctive argument here, these are just a bunch of background things that seem to be ro...