Download - The alignment problem in different capability regimes by Buck Shlegeris

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

The alignment problem in different capability regimes by Buck Shlegeris

2021-12-05

Download Right click and do "save link as"

I think the alignment problem looks different depending on the capability level of systems you’re trying to align. And I think that different researchers often have different capability levels in mind when they talk about the alignment problem. I think this leads to confusion. I’m going to use the term “regimes of the alignment problem” to refer to the different perspectives on alignment you get from considering systems with different capability levels. (I would be pretty unsurprised if these points had all been made elsewhere; the goal of this post is just to put them all in one place. I’d love pointers to pieces that make many of the same points as this post. Thanks to a wide variety of people for conversations that informed this. If there’s established jargon for different parts of this, point it out to me and I’ll consider switching to using it.) Different regimes: Wildly superintelligent systems Systems that are roughly as generally intelligent and capable as humans--they’re able to do all the important tasks as well as humans can, but they’re not wildly more generally intelligent. Systems that are less generally intelligent and capable than humans Two main causes that lead to differences in which regime people focus on: Disagreements about the dynamics of AI development. Eg takeoff speeds. The classic question along these lines is whether we have to come up with alignment strategies that scale to arbitrarily competent systems, or whether we just have to be able to align systems that are slightly smarter than us, which can then do the alignment research for us. Disagreements about what problem we’re trying to solve. I think that there are a few different mechanisms by which AI misalignment could be bad from a longtermist perspective, and depending on which of these mechanisms you’re worried about, you’ll be worried about different regimes of the problem. Different mechanisms by which AI misalignment could be bad from a longtermist perspective: The second species problem: We build powerful ML systems and then they end up controlling the future, which is bad if they don’t intend to help us achieve our goals. To mitigate this concern, you’re probably most interested in the “wildly superintelligent systems” or “roughly human-level systems” regimes, depending on your beliefs about takeoff speeds and maybe some other stuff. Missed opportunity: We build pretty powerful ML systems, but because we can’t align them, we miss the opportunity to use them to help us with stuff, and then we fail to get to a good future. For example, suppose that we can build systems that are good at answering questions persuasively, but we can’t make them good at answering them honestly. This is an alignment problem. It probably doesn’t pose an x-risk directly, because persuasive wrong answers to questions are probably not going to lead to the system accumulating power over time, they’re just going to mean that people waste their time whenever they listen to the system’s advice on stuff. This feels much more like a missed opportunity than a direct threat from the misaligned systems. In this situation, the world is maybe in a more precarious situation than it could have been because of the things that we can harness AI to do (eg make bigger computers), but that’s not really the fault of the systems we failed to align. If this is your concern, you’re probably most interested in the “roughly human-level” regime. We build pretty powerful systems that aren’t generally intelligent, and then they make the world worse somehow by some mechanism other than increasing their own influence over time through clever planning, and this causes humanity to have a bad ending rather than a good one. For example, you might worry that if we can build systems that persuade much more easily than we can build systems that explain, then the world will have more bullshit in it and this will make things generally worse. Another thing that maybe counts: if we deploy a bunch of A...