Download - Towards a New Impact Measure by Alex Turner

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Towards a New Impact Measure by Alex Turner

2021-12-05

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards a New Impact Measure, published by Alex Turner on the AI Alignment Forum. In which I propose a closed-form solution to low impact, increasing corrigibility and seemingly taking major steps to neutralize basic AI drives 1 (self-improvement), 5 (self-protectiveness), and 6 (acquisition of resources). Previously: Worrying about the Vase: Whitelisting, Overcoming Clinginess in Impact Measures, Impact Measure Desiderata To be used inside an advanced agent, an impact measure... must capture so much variance that there is no clever strategy whereby an advanced agent can produce some special type of variance that evades the measure. ~ Safe Impact Measure If we have a safe impact measure, we may have arbitrarily-intelligent unaligned agents which do small (bad) things instead of big (bad) things. For the abridged experience, read up to "Notation", skip to "Experimental Results", and then to "Desiderata". What is "Impact"? One lazy Sunday afternoon, I worried that I had written myself out of a job. After all, Overcoming Clinginess in Impact Measures basically said, "Suppose an impact measure extracts 'effects on the world'. If the agent penalizes itself for these effects, it's incentivized to stop the environment (and any agents in it) from producing them. On the other hand, if it can somehow model other agents and avoid penalizing their effects, the agent is now incentivized to get the other agents to do its dirty work." This seemed to be strong evidence against the possibility of a simple conceptual core underlying "impact", and I didn't know what to do. At this point, it sometimes makes sense to step back and try to say exactly what you don't know how to solve – try to crisply state what it is that you want an unbounded solution for. Sometimes you can't even do that much, and then you may actually have to spend some time thinking 'philosophically' – the sort of stage where you talk to yourself about some mysterious ideal quantity of [chess] move-goodness and you try to pin down what its properties might be. ~ Methodology of Unbounded Analysis There's an interesting story here, but it can wait. As you may have guessed, I now believe there is a such a simple core. Surprisingly, the problem comes from thinking about "effects on the world". Let's begin anew. Rather than asking "What is goodness made out of?", we begin from the question "What algorithm would compute goodness?". ~ Executable Philosophy Intuition Pumps I'm going to say some things that won't make sense right away; read carefully, but please don't dwell. u A is an agent's utility function, while u H is some imaginary distillation of human preferences. WYSIATI What You See Is All There Is is a crippling bias present in meat-computers: [WYSIATI] states that when the mind makes decisions... it appears oblivious to the possibility of Unknown Unknowns, unknown phenomena of unknown relevance. Humans fail to take into account complexity and that their understanding of the world consists of a small and necessarily un-representative set of observations. Surprisingly, naive reward-maximizing agents catch the bug, too. If we slap together some incomplete reward function that weakly points to what we want (but also leaves out a lot of important stuff, as do all reward functions we presently know how to specify) and then supply it to an agent, it blurts out "gosh, here I go!", and that's that. Power A position from which it is relatively easier to achieve arbitrary goals. That such a position exists has been obvious to every population which has required a word for the concept. The Spanish term is particularly instructive. When used as a verb, "poder" means "to be able to," which supports that our definition of "power" is natural. ~ Cohen et al. And so it is with the French "pouvoir". Lines Suppose you start at point C ...