Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking, published by Harlan on April 5, 2024 on LessWrong.
Visiting researcher Rose Hadshar recently published
a review of some evidence for existential risk from AI, focused on empirical evidence for misalignment and power...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking, published by Harlan on April 5, 2024 on LessWrong.
Visiting researcher Rose Hadshar recently published
a review of some evidence for existential risk from AI, focused on empirical evidence for misalignment and power seeking. (Previously from this project: a blogpost outlining some of the
key claims that are often made about AI risk, a series of
interviews of AI researchers, and a
database of empirical evidence for misalignment and power seeking.)
In this report, Rose looks into evidence for:
Misalignment,[1] where AI systems develop goals which are misaligned with human goals; and
Power-seeking,[2] where misaligned AI systems seek power to achieve their goals.
Rose found the current state of this evidence for existential risk from misaligned power-seeking to be concerning but inconclusive:
There is empirical evidence of AI systems developing misaligned goals (via specification gaming[3] and via goal misgeneralization[4]), including in deployment (via specification gaming), but it's not clear to Rose whether these problems will scale far enough to pose an existential risk.
Rose considers the conceptual arguments for power-seeking behavior from AI systems to be strong, but notes that she could not find any clear examples of power-seeking AI so far.
With these considerations, Rose thinks that it's hard to be very confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. She finds this uncertainty to be concerning, given the severity of the potential risks in question. Rose also expressed that it would be good to have more reviews of evidence, including evidence for other claims about AI risks[5] and evidence against AI risks.[6]
^
"An AI is misaligned whenever it chooses behaviors based on a reward function that is different from the true welfare of relevant humans." (
Hadfield-Menell & Hadfield, 2019)
^
Rose follows (Carlsmith, 2022) and defines power-seeking as "active efforts by an AI system to gain and maintain power in ways that designers didn't intend, arising from problems with that system's objectives."
^
"Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome." (
Krakovna et al., 2020).
^
"Goal misgeneralization is a specific form of robustness failure for learning algorithms in which the learned program competently pursues an undesired goal that leads to good performance in training situations but bad performance in novel test situations." (
Shah et al., 2022a).
^
Joseph Carlsmith's report
Is Power-Seeking AI an Existential Risk? Reviews some evidence for most of the claims that are central to the argument that AI will pose an existential risk.
^
Last year, Katja wrote
Counterarguments to the basic AI x-risk case, which outlines some arguments against existential risk from AI.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
View more