Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.
Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.
Links to learn more, summary and full transcript.
As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.
Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!
Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:
And according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.
In today's interview, Ajeya and Rob discuss the above, as well as:
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.
Producer: Keiran Harris
Audio mastering: Ryan Kessler and Ben Cordell
Transcriptions: Katy Moore
#190 – Eric Schwitzgebel on whether the US is conscious
#189 – Rachel Glennerster on how “market shaping” could help solve climate change, pandemics, and other global problems
#188 – Matt Clancy on whether science is good
#187 – Zach Weinersmith on how researching his book turned him from a space optimist into a "space bastard"
#186 – Dean Spears on why babies are born small in Uttar Pradesh, and how to save their lives
#185 – Lewis Bollard on the 7 most promising ways to end factory farming, and whether AI is going to be good or bad for animals
#184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT
AI governance and policy (Article)
#183 – Spencer Greenberg on causation without correlation, money and happiness, lightgassing, hype vs value, and more
#182 – Bob Fischer on comparing the welfare of humans, chickens, pigs, octopuses, bees, and more
#181 – Laura Deming on the science that could keep us healthy in our 80s and beyond
#180 – Hugo Mercier on why gullibility and misinformation are overrated
#179 – Randy Nesse on why evolution left us so vulnerable to depression and anxiety
#178 – Emily Oster on what the evidence actually says about pregnancy and parenting
#177 – Nathan Labenz on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps
#90 Classic episode – Ajeya Cotra on worldview diversification and how big the future could be
#112 Classic episode – Carl Shulman on the common-sense case for existential risk work and its practical implications
#111 Classic episode – Mushtaq Khan on using institutional economics to predict effective government reforms
2023 Mega-highlights Extravaganza
#100 Classic episode – Having a successful career with depression, anxiety, and imposter syndrome
Create your
podcast in
minutes
It is Free
Navigating Life After 40
Teaching Learning Leading K-12
Regenerative Skills
The Jordan B. Peterson Podcast
The Mel Robbins Podcast