Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

The Nonlinear Library: LessWrong

LW - Value fragility and AI takeover by Joe Carlsmith

2024-08-05

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Value fragility and AI takeover, published by Joe Carlsmith on August 5, 2024 on LessWrong. 1. Introduction "Value fragility," as I'll construe it, is the claim that slightly-different value systems tend to lead in importantly-different directions when subject to extreme optimization. I think the idea of...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Value fragility and AI takeover, published by Joe Carlsmith on August 5, 2024 on LessWrong.
1. Introduction
"Value fragility," as I'll construe it, is the claim that slightly-different value systems tend to lead in importantly-different directions when subject to extreme optimization. I think the idea of value fragility haunts the AI risk discourse in various ways - and in particular, that it informs a backdrop prior that adequately aligning a superintelligence requires an extremely precise and sophisticated kind of technical and ethical achievement.
That is, the thought goes: if you get a superintelligence's values even slightly wrong, you're screwed.
This post is a collection of loose and not-super-organized reflections on value fragility and its role in arguments for pessimism about AI risk. I start by trying to tease apart a number of different claims in the vicinity of value fragility. In particular:
I distinguish between questions about value fragility and questions about how different agents would converge on the same values given adequate reflection.
I examine whether "extreme" optimization is required for worries about value fragility to go through (I think it at least makes them notably stronger), and I reflect a bit on whether, even conditional on creating super-intelligence, we might be able to avoid a future driven by relevantly extreme optimization.
I highlight questions about whether multipolar scenarios alleviate concerns about value fragility, even if your exact values don't get any share of the power.
My sense is that people often have some intuition that multipolarity helps notably in this respect; but I don't yet see a very strong story about why. If readers have stories that they find persuasive in this respect, I'd be curious to hear.
I then turn to a discussion of a few different roles that value fragility, if true, could play in an argument for pessimism about AI risk. In particular, I distinguish between:
1. The value of what a superintelligence does after it takes over the world, assuming that it does so.
2. What sorts of incentives a superintelligence has to try to take over the world, in a context where it can do so extremely easily via a very wide variety of methods.
3. What sorts of incentives a superintelligence has to try to take over the world, in a context where it can't do so extremely easily via a very wide variety of methods.
Yudkowsky's original discussion of value fragility is most directly relevant to (1). And I think it's actually notably irrelevant to (2). In particular, I think the basic argument for expecting AI takeover in a (2)-like scenario doesn't require value fragility to go through - and indeed, some conceptions of "AI alignment" seem to expect a "benign" form of AI takeover even if we get a superintelligence's values exactly right.
Here, though, I'm especially interested in understanding (3)-like scenarios - that is, the sorts of incentives that apply to a superintelligence in a case where it can't just take over the world very easily via a wide variety of methods. Here, in particular, I highlight the role that value fragility can play in informing the AI's expectations with respect to the difference in value between worlds where it does not take over, and worlds where it does.
In this context, that is, value fragility can matter to how the AI feels about a world where humans do retain control - rather than solely to how humans feel about a world where the AI takes over.
I close with a brief discussion of how commitments to various forms of "niceness" and intentional power-sharing, if made sufficiently credible, could help diffuse the sorts of adversarial dynamics that value fragility can create.
2. Variants of value fragility
What is value fragility? Let's start with some high-level definitions and clarifications.
...

View more

Comments (3)

More Episodes

You may also like

Disney Family Stories & Gossip

The Saad Truth with Dr. Saad

The Pacific War - week by week

The Mel Robbins Podcast

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

The Jordan B. Peterson Podcast

All Ears English Podcast

رادیو راه با مجتبی شکوری

The Jordan Harbinger Show

Halacha Headlines

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com