Download - LW - A Playbook for AI Risk Reduction (focused on misaligned AI) by HoldenKarnofsky

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - A Playbook for AI Risk Reduction (focused on misaligned AI) by HoldenKarnofsky

2023-06-06

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Playbook for AI Risk Reduction (focused on misaligned AI), published by HoldenKarnofsky on June 6, 2023 on LessWrong. I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?” This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1 A plan connotes something like: “By default, we ~definitely fail. To succeed, we need to hit multiple non-default goals.” If you want to start a company, you need a plan: doing nothing will definitely not result in starting a company, and there are multiple identifiable things you need to do to pull it off. I don’t think that’s the situation with AI risk. As I argued before, I think we have a nontrivial chance of avoiding AI takeover even in a “minimal-dignity” future - say, assuming essentially no growth from here in the size or influence of the communities and research fields focused specifically on existential risk from misaligned AI, and no highly surprising research or other insights from these communities/fields either. (This statement is not meant to make anyone relax! A nontrivial chance of survival is obviously not good enough.) I think there are a number of things we can do that further improve the odds. My favorite interventions are such that some success with them helps a little, and a lot of success helps a lot, and they can help even if other interventions are badly neglected. I’ll list and discuss these interventions below. So instead of a “plan” I tend to think about a “playbook”: a set of plays, each of which might be useful. We can try a bunch of them and do more of what’s working. I have takes on which interventions most need more attention on the margin, but think that for most people, personal fit is a reasonable way to prioritize between the interventions I’m listing. Below I’ll briefly recap my overall picture of what success might look like (with links to other things I’ve written on this), then discuss four key categories of interventions: alignment research, standards and monitoring, successful-but-careful AI projects, and information security. For each, I’ll lay out: How a small improvement from the status quo could nontrivially improve our odds. How a big enough success at the intervention could put us in a very good position, even if the other three interventions are going poorly. Common concerns/reservations about the intervention. Overall, I feel like there is a pretty solid playbook of helpful interventions - any and all of which can improve our odds of success - and that working on those interventions is about as much of a “plan” as we need for the time being. The content in this post isn’t novel, but I don’t think it’s already-consensus: two of the four interventions (standards and monitoring; information security) seem to get little emphasis from existential risk reduction communities today, and one (successful-but-careful AI projects) is highly controversial and seems often (by this audience) assumed to be net negative. Many people think most of the above interventions are doomed, irrelevant or sure to be net harmful, and/or that our baseline odds of avoiding a catastrophe are so low that we need something more like a “plan” to have any hope. I have some sense of the arguments for why this is, but in most cases not a great sense (at least, I can’t see where many folks’ level of confidence is coming from). A lot of my goal in posting this piece is to give such people a chance to see where I’m making steps that they disagree with, and to push back pointedly on my views, which could change my picture and my decisions. As with many of my posts, I don’t claim personal credit for any new ground here. I’m leaning heavily on conversations with others, especially Paul Christiano and Carl Shulman. My basic picture o...