Download - What Failure Looks Like: Distilling the Discussion by Ben Pace

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

What Failure Looks Like: Distilling the Discussion by Ben Pace

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What Failure Looks Like: Distilling the Discussion, published by Ben Pace on the AI Alignment Forum. The comments under a post often contains valuable insights and additions. They are also often very long and involved, and harder to cite than posts themselves. Given this, I was motivated to try to distill some comment sections on LessWrong, in part to start exploring whether we can build some norms and some features to help facilitate this kind of intellectual work more regularly. So this is my attempt to summarise the post and discussion around What Failure Looks Like by Paul Christiano. Epistemic status: I think I did an okay job. I think I probably made the most errors in place where I try to emphasise concrete details more than the original post did. I think the summary of the discussion is much more concise than the original. What Failure Looks Like (Summary) On its default course, our civilization will build very useful and powerful AI systems, and use such systems to run significant parts of society (such as healthcare, legal systems, companies, the military, and more). Similar to how we are dependent on much novel technology such as money and the internet, we will be dependent on AI. The stereotypical AI catastrophe involves a powerful and malicious AI that seems good but suddenly becomes evil and quickly takes over humanity. Such descriptions are often stylised for good story-telling, or emphasise unimportant variables. The post below will concretely lay out two ways that building powerful AI systems may cause an existential catastrophe, if the problem of intent alignment is not solved. This is solely an attempt to describe what failure looks like, not to assign probabilities to such failure or to propose a plan to avoid these failures. There are two failure modes that will be discussed. First, we may increasingly fail to understand how our AI systems work and subsequently what is happening in society. Secondly, we may eventually give these AI systems massive amounts of power despite not understanding their internal reasoning and decision-making algorithms. Due to the massive space of designs we'll be searching through, if we do not understand the AI, this will mean certain AIs will be more power-seeking than expected, and will take adversarial action and take control. Failure by loss of control There is a gap between what we want, and what objective functions we can write down. Nobody has yet created a function that when maximised perfectly describes what we want, but increasingly powerful machine learning will optimise very hard for what function we encode. This will lead to a strong increases the gap between what we can optimise for and what we want. (This is a classic goodharting scenario.) Concretely, we will gradually use ML to perform more and more key functions in society, but will largely not understand how these systems work or what exactly they’re doing. The information we can gather will seem strongly positive: GDP will be rising quickly, crime will be down, life-satisfaction ratings will be up, congress's approval will be up, and so on. However, the underlying reality will increasingly diverge from what we think these metrics are measuring, and we may no longer have the ability to independently figure this out. In fact, new things won’t be built, crime will continue, people's lives will be miserable, and congress will not be effective at improving governance, we’ll just believe this because the ML systems will be improving our metrics, and will have a hard time understanding what’s going on outside of what they report. Gradually, our civilization will lose its ability to understand what is happening in the world as our systems and infrastructure shows us success on all of our available metrics (GDP, wealth, crime, health, self-reported happiness...