Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Summaries of Agent Foundations Work, published by mattmacdermott on May 15, 2023 on LessWrong.
This is a bunch of not-very-technical summaries of agent foundations work from LessWrong and the Alignment Forum.
I was hoping to turn it into a comprehensive literature review, categorising things in an enlightening way and listing open problems. It turns out that's pretty hard to do! It's languished in my drafts for long enough now, so I'm just going to post it as it is. Hopefully someone else can produce a great overview of the field instead.
Why Agent Foundations?
My own answer to this question is that most AI threat models depend on a powerful agent pursuing a goal we don't want it to, and mathematical models of agency seem useful both for understanding and dealing with these risks. Existing models of agency from fields like reinforcement learning and game theory don't seem up to the job, so trying to develop better ones might pay off.
Normative and Descriptive Agent Foundations
One account of why our usual models of agency aren't up to the job is the Embedded Agency sequence - the usual models assume agents are unchanging, indivisible entities which interact with their environments through predefined channels, but real-world agents are a part of their environment. The sequence lists identifies four rough categories of problems that arise when we switch to trying to model embedded agents, explained in terms of Marcus Hutter's model of the theoretically perfect reinforcement learning agent, AIXI.
I find these subproblems to be a useful framing for thinking about agents foundations, but as I explained in a previous post, I think they take a strongly normative stance, asking what high-level principles an agent should follow in order be theoretically perfect. Some other agent foundations takes a more descriptive stance, asking what mechanistic and behavioural properties agents in the real world tend to have. You could also call this a distinction between top-down and bottom-up approaches to modelling agency.
Here are the problems from the Embedded Agency sequence:
Normative Subproblems
Decision Theory. AIXI's actions affect the world in a well-defined way, but embedded agents have to figure out whether they care about the causal, evidential, or logical implications of their choices.
Embedded World-Models. AIXI can hold every possible model of the world in its head in full detail and consider every consequence of its actions, but embedded agents are part of the world, and have limited space and compute with which to model it. This gives rise to the non-realisability problem - what happens when the real world isn't in your hypothesis class?
Robust Delegation. AIXI is unchanging and the only comparable agent in town, but embedded agents can self-modify and create other agents. They need to ensure their successors are aligned.
Subsystem Alignment. AIXI is indivisible, but embedded agents are chunks of the world made up of subchunks. What if those subchunks are agents with their own agendas?
I think the embedded agency subproblems are also a useful way to categorise descriptive work, but the names and descriptions feel too normative, so I renamed them for the descriptive case. I also suggested a fifth problem, which is about figuring out how our models actually correspond to reality. I called it 'Identifying Agents', but now I prefer something like 'Agents in Practice'.
Descriptive Subproblems
I/O Channels. Actions, observations, and Cartesian boundaries aren't primitive: descriptive models need to define them. How do we move from a non-agentic model of the world to one with free will and counterfactuals?
Internal Components. Presumably agents contain things like goals and world-models, but how do these components work mathematically? And are there others?
Future Agents. W...
view more