Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Comparing Alignment to other AGI interventions: Basic model, published by Martín Soto on March 20, 2024 on The AI Alignment Forum.
Interventions that increase the probability of Aligned AGI aren't the only kind of AGI-related work that could importantly increase the Expected Value of the future. Here I present a very basic quantitative model (which you can run yourself here) to start thinking about these issues. In a follow-up post I give a brief overview of extensions and analysis.
A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources. More concretely, in a utility framework, we compare
Alignment interventions (aV): increasing the probability that one or more agents have our values.
Cooperation interventions given alignment (aC|V): increasing the gains from trade and reducing the cost from conflict for agents with our values.
Cooperation interventions given misalignment (aC|V): increasing the gains from trade and reducing the cost from conflict for agents without our values.
We used a model-based approach (see here for a discussion of its benefits) paired with qualitative analysis. While these two posts don't constitute an exhaustive analysis (more exhaustive versions are less polished), feel free to reach out if you're interested in this question and want to hear more about this work.
Most of this post is a replication of previous work by Hjalmar Wijk and Tristan Cook (unpublished). The basic modelling idea we're building upon is to define how different variables affect our utility, and then incrementally compute or estimate partial derivatives to assess the value of marginal work on this or that kind of intervention.
Setup
We model a multi-agentic situation. We classify each agent as either having (approximately) our values (V) or any other values (V). We also classify them as either cooperative (C) or non-cooperative (C).[1] These classifications are binary. We are also (for now) agnostic about what these agents represent. Indeed, this basic multi-agentic model will be applicable (with differently informed estimates) to any scenario with multiple singletons, including the following:
Different AGIs (or other kinds of singletons, like AI-augmented nation-states) interacting causally on Earth
Singletons arising from different planets interacting causally in the lightcone
Singletons from across the multi-verse interacting acausally
The variable we care about is total utility (U). As a simplifying assumption, our way to compute it will be as a weighted interpolation of two binary extremes: one in which bargaining goes (for agents with our values) as well as possible (B), and another one in which it goes as badly as possible (B). The interpolation coefficient (b) could be interpreted as "percentage of interactions that result in minimally cooperative bargaining settlements".
We also consider all our interventions are on only a single one of the agents (which controls a fraction FI of total resources), which usually represents our AGI or our civilization.[2] And these interventions are coarsely grouped into alignment work (aV), cooperation work targeted at worlds with high alignment power (aC|V), and cooperation work targeted at worlds with low alignment power (aC|V).
The overall structure looks like this:
Full list of variables
This section safely skippable.
The first 4 variables model expected outcomes:
UBR: Utility attained in the possible world where our bargaining goes as well as possible.
UBR: Utility attained in the possible world where our bargaining goes as badly as possible.
b[0,1]: Baseline (expected) success of bargaining (for agents with our values), used to interpolate between UB and UB. Can be i...
view more