Download - Imitative Generalisation (AKA 'Learning the Prior') by Beth Barnes

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Imitative Generalisation (AKA 'Learning the Prior') by Beth Barnes

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Imitative Generalisation (AKA 'Learning the Prior'), published by Beth Barnes on the AI Alignment Forum. Tl;dr We want to be able to supervise models with superhuman knowledge of the world and how to manipulate it. For this we need an overseer to be able to learn or access all the knowledge our models have, in order to be able to understand the consequences of suggestions or decisions from the model. If the overseers don’t have access to all the same knowledge as the model, it may be easy for the model to deceive us, suggesting plans that look good to us but that may have serious negative consequences. We might hope to access what the model knows just by training it to answer questions. However, we can only train on questions that humans are able to answer[1]. This gives us a problem that’s somewhat similar to the standard formulation of transduction: we have some labelled training set (questions humans can answer), and we want to transfer to an unlabelled dataset (questions we care about), that may be differently distributed. We might hope that our models will naturally generalize correctly from easy-to-answer questions to the ones that we care about. However, a natural pathological generalisation is for our models to only give us ‘human-like’ answers to questions, even if it knows the best answer is different. If we only have access to these human-like answers to questions, that probably doesn’t give us enough information to supervise a superhuman model. What we’re going to call ‘Imitative Generalization’ is a possible way to narrow the gap between the things our model knows, and the questions we can train our model to answer honestly. It avoids the pathological generalisation by only using ML for IID tasks, and imitating the way humans generalize. This hopefully gives us answers that are more like ‘how a human would answer if they’d learnt from all the data the model has learnt from’. We supervise how the model does the transfer, to get the sort of generalisation we want. It’s worth noting there are enough serious open questions that imitative generalization is more of a research proposal than an algorithm! This post is based on work done with Paul Christiano at OpenAI. Thanks very much to Evan Hubinger, Richard Ngo, William Saunders, Long Ouyang and others for helpful feedback, as well as Alice Fares for formatting help Goals of this post This post tries to explain a simplified[2] version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A more detailed explanation of exactly what the training objective is (with diagrams), and the correspondence with Bayesian inference, are in the appendix. Example: using IG to avoid overfitting in image classification. Here’s an example of using Imitative Generalization to get better performance on a standard ML task: image classification of dog breeds, with distributional shift. Imagine we want to robustly learn to classify dog breeds, but the human labellers we have access to don’t actually know how to identify all the breeds[3], and we don’t have any identification guides or anything. However, we do have access to a labelled dataset D We want to classify dogs in a different dataset D ′ , which is unlabelled. One unfamiliar breed we want to learn to recognise is a husky. It happens that all the huskies in D are on snow, but in D ′ some of them are on grass. Label: Husky Image from D Label: ??? OOD image from D ′ A NN architecture prior lik...