Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Updates and additions to Embedded Agency, published by Rob Bensinger, Abram Demski on the AI Alignment Forum.
Abram Demski and Scott Garrabrant's "Embedded Agency" has been updated with quite a bit of new content from Abram. All the changes are live today, and can be found at any of these links:
as a hand-drawn sequence (LW link, AIAF link);
as blog posts (MIRI link, LW link, AIAF link);
and as an arXiv paper (link).
Abram says, "I'm excited about this new version because I feel like in a lot of cases, the old version gestured at an idea but didn't go far enough to really explain. The new version feels to me like it gives the real version of the problem in cases where the previous version didn't quite make it, and explains things more thoroughly."
This diff shows all the changes to the blog version. Changes include (in addition to many added or tweaked illustrations)...
Changes to "Decision Theory":
"Observation counterfactuals" (discussed in the counterfactual mugging section at the end) are distinguished from "action counterfactuals" (discussed in the earlier sections). Action counterfactuals are introduced before the five-and-ten problem.
The introduction to the five-and-ten problem is now slower and more focused (less jumping between topics), and makes the motivation clearer.
Instead of highlighting "Perhaps the agent is trying to plan ahead, or reason about a game-theoretic situation in which its action has an intricate role to play." as reasons an agent might know its own action, the text now highlights points from "Embedded World-Models": a sufficiently smart agent with access to its own source code can always deduce its own conditional behaviors.
ε-exploration and Newcomblike problems now get full sections, rather than a few sentences each.
Added discussion of "Do humans make this kind of mistake?" (Text versions only.)
Changes to "Embedded World-Models":
"This is fine if the world 'holds still' for us; but because the map is in the world, it may implement some function." changed to "... because the map is in the world, different maps create different worlds."
Discussion of reflective oracles now gives more context (e.g., says what "oracle machines" are).
Spend more time introducing the problem of logical uncertainty: emphasize that humans handle logical uncertainty fine (text versions only); say a bit more about how logic and probability theory differ; note that the two "may seem superficially compatible, since probability theory is an extension of Boolean logic"; and describe the Gödelian and realizability obstacles to linking the two. Note explicitly that "the 'scale versus tree' problem also means that we don’t know how ordinary empirical reasoning works" (text versions only).
Changes to "Robust Delegation":
Introduction + Vingean Reflection:
Introduction expanded to explicitly describe the AI alignment, tiling agent, and stability under self-improvement problems; draw analogies to royal succession and lost purposes in human institutions; and highlight that the difficulty lies in (a) the predecessor not fully understanding itself and its goals, and (b) the successor needing to act with some degree of autonomy. (Text versions only.)
Put more explicit focus on the case where a successor is much smarter than its predecessor. (Text versions only.)
Expanded "Usually, we think about this from the point of view of the human." to "A lot of current work on robust delegation comes from the goal of aligning AI systems with what humans want. So usually, we think about this from the point of view of the human." (Text versions only.)
Goodhart's Law:
Fixed a typo in the text versions' Bayes estimate equation: it previously flipped the first
x
and
y
, but now shows the correct formula
E
y
x
g
x
−
y
0
. (Text versions only.)
Expanded discussion of regressional Goodhart, i...
view more