Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Big Picture AI Safety: Introduction, published by EuanMcLean on May 23, 2024 on LessWrong.
tldr: I conducted 17 semi-structured interviews of AI safety experts about their big picture strategic view of the AI safety landscape: how will human-level AI play out, how things might go wrong, and what should the AI safety community be doing. While many respondents held "traditional" views (e.g.
the main threat is misaligned AI takeover), there was more opposition to these standard views than I expected, and the field seems more split on many important questions than someone outside the field may infer.
What do AI safety experts believe about the big picture of AI risk? How might things go wrong, what we should do about it, and how have we done so far? Does everybody in AI safety agree on the fundamentals? Which views are consensus, which are contested and which are fringe? Maybe we could learn this from the literature (as in the MTAIR project), but many ideas and opinions are not written down anywhere, they exist only in people's heads and in lunchtime conversations at AI labs and coworking
spaces.
I set out to learn what the AI safety community believes about the strategic landscape of AI safety. I conducted 17 semi-structured interviews with a range of AI safety experts. I avoided going into any details of particular technical concepts or philosophical arguments, instead focussing on how such concepts and arguments fit into the big picture of what AI safety is trying to achieve.
This work is similar to the AI Impacts surveys, Vael Gates' AI Risk Discussions, and Rob Bensinger's existential risk from AI survey. This is different to those projects in that both my approach to interviews and analysis are more qualitative. Part of the hope for this project was that it can hit on harder-to-quantify concepts that are too ill-defined or intuition-based to fit in the format of previous survey work.
Questions
I asked the participants a standardized list of questions.
What will happen?
Q1 Will there be a human-level AI? What is your modal guess of what the first human-level AI (HLAI) will look like? I define HLAI as an AI system that can carry out roughly 100% of economically valuable cognitive tasks more cheaply than a human.
Q1a What's your 60% or 90% confidence interval for the date of the first HLAI?
Q2 Could AI bring about an existential catastrophe? If so, what is the most likely way this could happen?
Q2a What's your best guess at the probability of such a catastrophe?
What should we do?
Q3 Imagine a world where, absent any effort from the AI safety community, an existential catastrophe happens, but actions taken by the AI safety community prevent such a catastrophe. In this world, what did we do to prevent the catastrophe?
Q4 What research direction (or other activity) do you think will reduce existential risk the most, and what is its theory of change? Could this backfire in some way?
What mistakes have been made?
Q5 Are there any big mistakes the AI safety community has made in the past or are currently making?
These questions changed gradually as the interviews went on (given feedback from participants), and I didn't always ask the questions exactly as I've presented them here. I asked participants to answer from their internal model of the world as much as possible and to avoid deferring to the opinions of others (their inside view so to speak).
Participants
Adam Gleave is the CEO and co-founder of the alignment research non-profit FAR AI. (Sept 23)
Adrià Garriga-Alonso is a research scientist at FAR AI. (Oct 23)
Ajeya Cotra leads Open Philantropy's grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. (Jan 24)
Alex Turner is a research scientist at Google DeepMind on the Scalable Alignment team. (Feb 24)
Ben Cottie...
view more