Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes - Summary & Transcript, published by Olivia Jimenez on May 1, 2023 on LessWrong.
Preface
In December 2022, Rohin Shah (DeepMind) and Connor Leahy (Conjecture) discussed why Leahy is pessimistic about AI risk, and Shah is less so. Below is a summary and transcript.
Summary
Leahy expects discontinuities - capabilities rapidly increasing and behavior diverging far from what we aim towards - to be a core alignment difficulty. While this concept is similar to the sharp left turn (SLT), Leahy notes he doesn’t like the term SLT. SLT seems to imply deception or a malicious treacherous turn, which Leahy doesn’t see as an interesting, relevant, or necessary part.
Shah suggests they start by discussing Leahy’s SLT views.
Leahy explains he expects there to be some properties that are robustly useful for achieving many goals. As systems become powerful and applied to an increasing variety of tasks, these properties will be favored, and systems will generalize well to increasingly novel tasks.
However, the goals we intend for our AIs to learn might not generalize. Perhaps it will seem that we’ve gotten them relatively aligned them to the goal of helping humans, but once they become smart enough to access the action space of "hook all humans up to super heroin", we’ll realize they learned some superficial distortion of our intentions that allows for bizarre and undesirable actions like that. And an AI might access this intelligence level rapidly, such that it’s not visibly misaligned ahead of time. In this sense, as systems get smarter, they might become harder to understand and manage.
Shah is mostly on board: capabilities will improve, convergent instrumental goals exist, generalization will happen, goals can fail to generalize, etc. (However, he doesn’t think goal misgeneralization will definitely happen.)
Shah asks why Leahy expects AI capabilities to increase rapidly. What makes the sharp left turn sharp?
Leahy answers if we can’t rule a sharp left turn, and given a sharp left turn we expect it would be catastrophic, that’s sufficient cause for concern.
The reason Leahy expects capabilities to increase sharply, though, relates to how humanity’s capabilities developed. He sees humans as having undergone sharp left turns in the past 1000 and 150 years. Our intelligence increased rapidly, such that we went from agriculture to the space age incredibly quickly. These shifts seem to have happened for three subsequent reasons. First, because of increases in brain size. Second, because of the development of culture, language and tools. And third, because of improvements in epistemology: science, rationality and emotional regulation. All three of these mechanisms will also exist for AIs, and AIs will be less constrained than humans in several respects, such as no biological limits on their brain size. So, as we scale AI systems, give them access to tools, and guide them towards discovering better ways of solving problems, Leahy expects they’ll become rapidly smarter just like we did.
[Editor’s note expanding the parallel: When humans became rapidly smarter, they became increasingly unaligned with evolution’s goal of inclusive genetic fitness. Humans never cared about inclusive genetic fitness. Evolution just got us to care about proxies like sex. Early on, that wasn’t a big deal; sex usually led to offspring. Once we were smart enough to invent birth control, though, it was a big deal. Nowadays, human behavior is significantly unaligned with evolution’s intention of inclusive genetic fitness. If AIs become rapidly smarter, and they’ve only learned shallow proxies of our goals, a direction that we currently don’t know how to avoid, we might expect their behavior will stray very far from what we value too.]
Shah notes Leahy’...
view more