Download - My take on Vanessa Kosoy's take on AGI safety by Steve Byrnes

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

My take on Vanessa Kosoy's take on AGI safety by Steve Byrnes

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My take on Vanessa Kosoy's take on AGI safety, published by Steve Byrnes on the AI Alignment Forum. Confidence level: Low Vanessa Kosoy is a deep fountain of knowledge and insights about AGI safety, but I’ve had trouble understanding some aspects of her point of view. Part of the problem is just pedagogy, and part of it (I will argue) is that she has some different underlying assumptions and beliefs than I do. This post aims to address both those things. In particular, on the pedagogy front, I will try to give a sense for what Vanessa is doing and why, assuming minimal knowledge of either math or theoretical CS. (At least, that's my intention—please let me know if anything is confusing or jargon-y.) Here’s an example of where we differ. I tend to think of things like “the problem of wireheading” and “the problem of ontological crises” etc. as being on the critical path to AGI safety—as in, I think that, to build safe AGIs, we’ll need to be talking explicitly about these specific problems, and others like them, and to be addressing those specific problems with specific solutions. But Vanessa seems to disagree. What’s the root cause of that disagreement? More to the point, am I wasting my time, thinking about the wrong things? Vanessa responds: Actually I don't think I disagree? I don't like the name "ontological crisis" since I think it presupposes a particular framing that's not necessarily useful. However I do think it's important to understand how agents can have utility functions that depend on unobservable quantities. I talked about it in Reinforcement Learning With Imperceptible Rewards and have more to say in an upcoming post. Let’s find out! Many thanks to Vanessa for patiently engaging with me. Also, thanks to Adam Shimi & Logan Smith for comments on a draft. Summary & Table of Contents Section 1 is just getting situated, i.e. what is the problem we’re trying to solve here? In Section 2, I compare the more popular “algorithms-first approach” to Vanessa’s “desiderata-first approach”. In brief, the former is when you start with an AGI-relevant algorithm and figure out how to make it safe. The latter is when you come up with one or more precise criteria, called desiderata, such that if an algorithm satisfies the desiderata, then it would be safe. Then you go try to find algorithms for which you can prove that they satisfy the desiderata. Sections 3-5 go through the three ingredients needed for AGI safety in Vanessa’s “desiderata-first approach”: Section 3 covers the part where we prove that an AI algorithm satisfies some precisely-defined desiderata. I’ll cover some key background concepts (“regret bounds”, “traps”, “realizability”), and some of Vanessa’s related ideas (“Delegative Reinforcement Learning”, “Infra-Bayesianism”), and how they’re all connected. Section 4 covers the part where we come up with good desiderata. To give a taste of what Vanessa has in mind, I give an intuitive walk-through of a particular example she came up with recently: “The Hippocratic Principle” desideratum, and “Hippocratic Timeline-Driven Learning”, an example type of algorithm that would satisfy the desideratum. Section 5 covers “non-Cartesian daemons”. This part is basically filling in a loophole in the “desiderata-first” framework, namely ruling out bad behaviors unrelated to the AI’s nominal output, like if the AI hacks into the operating system that it’s running on. Section 6 switches to my own opinions: In Section 6.1, I circle back to the “algorithms-first” vs “desiderata-first” distinction from Section 2, arguing that there’s less to it than it first appears, and that a more important difference is the approach to “weird failure modes that x-risk people talk about” (wireheading, ontological crises, deceptive mesa-optimizers, incorrigibility, gradient hacking, etc. etc.). ...