Download - Commentary on AGI Safety from First Principles by Richard Ngo

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Commentary on AGI Safety from First Principles by Richard Ngo

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Commentary on AGI Safety from First Principles, published by Richard Ngo on the AI Alignment Forum. My AGI safety from first principles report (which is now online here) was originally circulated as a google doc. Since there was a lot of good discussion in comments on the original document, I thought it would be worthwhile putting some of it online, and have copied out most of the substantive comment threads here. Many thanks to all of the contributors for their insightful points, and to Habryka for helping with formatting. Note that in some cases comments may refer to parts of the report that didn't make it into the public version. Discussion on the whole report Will MacAskill Thanks so much for writing this! Huge +1 to more foundational work in this area. My overall biggest worry with your argument is just whether it's spending a lot of time defending something that's not really where the controversy lies. (This is true for me; I don't know if I'm idiosyncratic.) Distinguish two claims one could argue for: Claim 1: At some point in the future, assuming continued tech progress, history will have primarily become the story of AI systems doing things. The goals of those AI systems, or the emergent path that results from interactions among these systems, will probably not be what you reading this document want to happen. I find claim 1 pretty uncontroversial. And I do think that this alone is enough for far more of the world to be thinking about AI than currently is. But it feels like at least for longtermist EAs trying to prioritise among causes (or for non-longtermists deciding how much to prioritise safety vs speed on AI), the action is much more on a more substantial claim like: Claim 2: Claim 1 is true, and the point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime, and the transition will be fast, and we can meaningfully affect how this transition goes with very long-lasting impacts, and (on the classic formulations at least) the transition will be to a single AI agent with more power than all other agents combined, and what we should try to do in response to all this is ensure that the AI systems that get built have goals that are the same as the goals of those who design the AI systems. Each of the new sub-claims in claim 2, I find (highly) controversial. And you talk a little bit about some of these sub-claims, but it's not the focus. Interested if you think that's an unfair characterisation. Perhaps you see yourself as arguing for something in between Claim 1 and Claim 2. Richard Ngo I think it's fair to say that I'm defending claim 1. I think that a lot of people would disagree with it, because: a) They don't picture AI systems having goals in a way that's easily separable from the goals of the humans who use them; or b) They think that humans will retain enough power over AIs that the "main story" will be what humans choose to do, even if some AIs have goals we don't like; or c) They think that it'll be easy to make AIs have the goals we want them to have; or d) They think that, even if the outcome is not specifically what they want, it'll be within some range of acceptable variation (in a similar way to how our current society is related to our great-great-grandparents'). My thoughts on the remaining parts of claim 2: a) "The point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime" OpenPhil are investigating timelines very thoroughly, so I'm happy to defer to them. b) The transition will be fast. I make some arguments about this in the "speed of AI development" section. But broadly speaking, I don't want this version of the argument to depend on the claim that it'll be very fast (i.e. there's a "takeoff" from something like our current world lasting less than...