Download - LW - Some Summaries of Agent Foundations Work by mattmacdermott

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - Some Summaries of Agent Foundations Work by mattmacdermott

2023-05-16

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Summaries of Agent Foundations Work, published by mattmacdermott on May 15, 2023 on LessWrong. This is a bunch of not-very-technical summaries of agent foundations work from LessWrong and the Alignment Forum. I was hoping to turn it into a comprehensive literature review, categorising things in an enlightening way and listing open problems. It turns out that's pretty hard to do! It's languished in my drafts for long enough now, so I'm just going to post it as it is. Hopefully someone else can produce a great overview of the field instead. Why Agent Foundations? My own answer to this question is that most AI threat models depend on a powerful agent pursuing a goal we don't want it to, and mathematical models of agency seem useful both for understanding and dealing with these risks. Existing models of agency from fields like reinforcement learning and game theory don't seem up to the job, so trying to develop better ones might pay off. Normative and Descriptive Agent Foundations One account of why our usual models of agency aren't up to the job is the Embedded Agency sequence - the usual models assume agents are unchanging, indivisible entities which interact with their environments through predefined channels, but real-world agents are a part of their environment. The sequence lists identifies four rough categories of problems that arise when we switch to trying to model embedded agents, explained in terms of Marcus Hutter's model of the theoretically perfect reinforcement learning agent, AIXI. I find these subproblems to be a useful framing for thinking about agents foundations, but as I explained in a previous post, I think they take a strongly normative stance, asking what high-level principles an agent should follow in order be theoretically perfect. Some other agent foundations takes a more descriptive stance, asking what mechanistic and behavioural properties agents in the real world tend to have. You could also call this a distinction between top-down and bottom-up approaches to modelling agency. Here are the problems from the Embedded Agency sequence: Normative Subproblems Decision Theory. AIXI's actions affect the world in a well-defined way, but embedded agents have to figure out whether they care about the causal, evidential, or logical implications of their choices. Embedded World-Models. AIXI can hold every possible model of the world in its head in full detail and consider every consequence of its actions, but embedded agents are part of the world, and have limited space and compute with which to model it. This gives rise to the non-realisability problem - what happens when the real world isn't in your hypothesis class? Robust Delegation. AIXI is unchanging and the only comparable agent in town, but embedded agents can self-modify and create other agents. They need to ensure their successors are aligned. Subsystem Alignment. AIXI is indivisible, but embedded agents are chunks of the world made up of subchunks. What if those subchunks are agents with their own agendas? I think the embedded agency subproblems are also a useful way to categorise descriptive work, but the names and descriptions feel too normative, so I renamed them for the descriptive case. I also suggested a fifth problem, which is about figuring out how our models actually correspond to reality. I called it 'Identifying Agents', but now I prefer something like 'Agents in Practice'. Descriptive Subproblems I/O Channels. Actions, observations, and Cartesian boundaries aren't primitive: descriptive models need to define them. How do we move from a non-agentic model of the world to one with free will and counterfactuals? Internal Components. Presumably agents contain things like goals and world-models, but how do these components work mathematically? And are there others? Future Agents. W...