Download - Worrying about the Vase: Whitelisting by Alex Turner

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Worrying about the Vase: Whitelisting by Alex Turner

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Worrying about the Vase: Whitelisting , published by Alex Turner on the AI Alignment Forum. Suppose a designer wants an RL agent to achieve some goal, like moving a box from one side of a room to the other. Sometimes the most effective way to achieve the goal involves doing something unrelated and destructive to the rest of the environment, like knocking over a vase of water that is in its path. If the agent is given a reward only for moving the box, it will probably knock over the vase. Amodei et al., Concrete Problems in AI Safety Side effect avoidance is a major open problem in AI safety. I present a robust, transferable, easily- and more safely-trainable, partially reward hacking-resistant impact measure. TurnTrout, Worrying about the Vase: Whitelisting An impact measure is a means by which change in the world may be evaluated and penalized; such a measure is not a replacement for a utility function, but rather an additional precaution thus overlaid. While I'm fairly confident that whitelisting contributes meaningfully to short- and mid-term AI safety, I remain skeptical of its robustness to scale. Should several challenges be overcome, whitelisting may indeed be helpful for excluding swathes of unfriendly AIs from the outcome space. 1 Furthermore, the approach allows easy shaping of agent behavior in a wide range of situations. Segments of this post are lifted from my paper, whose latest revision may be found here; for Python code, look no further than this repository. For brevity, some relevant details are omitted. Summary Be careful what you wish for. In effect, side effect avoidance aims to decrease how careful we have to be with our wishes. For example, asking for help filling a cauldron with water shouldn't result in this: However, we just can't enumerate all the bad things that the agent could do. How do we avoid these extreme over-optimizations robustly? Several impact measures have been proposed, including state distance, which we could define as, say, total particle displacement. This could be measured either naively (with respect to the original state) or counterfactually (with respect to the expected outcome had the agent taken no action). These approaches have some problems: Making up for bad things it prevents with other negative side effects. Imagine an agent which cures cancer, yet kills an equal number of people to keep overall impact low. Not being customizable before deployment. Not being adaptable after deployment. Not being easily computable. Not allowing generative previews, eliminating a means of safely previewing agent preferences (see latent space whitelisting below). Being dominated by random effects throughout the universe at-large; note that nothing about particle distance dictates that it be related to anything happening on planet Earth. Equally penalizing breaking and fixing vases (due to the symmetry of the above metric): For example, the agent would be equally penalized for breaking a vase and for preventing a vase from being broken, though the first action is clearly worse. This leads to “overcompensation” (“offsetting“) behaviors: when rewarded for preventing the vase from being broken, an agent with a low impact penalty rescues the vase, collects the reward, and then breaks the vase anyway (to get back to the default outcome). Victoria Krakovna, Measuring and Avoiding Side Effects Using Reachability Not actually measuring impact in a meaningful way. Whitelisting falls prey to none of these. However, other problems remain, and certain new challenges have arisen; these, and the assumptions made by whitelisting, will be discussed. Rare LEAKED footage of Mickey trying to catch up on his alignment theory after instantiating an unfriendly genie [colorized, 2050]. 2 So, What's Whitelisting? To achieve robust side effect avoidance with ...