Download - AF - Comparing Alignment to other AGI interventions: Basic model by Martín Soto

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum

Education

AF - Comparing Alignment to other AGI interventions: Basic model by Martín Soto

2024-03-20

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Comparing Alignment to other AGI interventions: Basic model, published by Martín Soto on March 20, 2024 on The AI Alignment Forum. Interventions that increase the probability of Aligned AGI aren't the only kind of AGI-related work that could importantly increase the Expected Value of the future. Here I present a very basic quantitative model (which you can run yourself here) to start thinking about these issues. In a follow-up post I give a brief overview of extensions and analysis. A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources. More concretely, in a utility framework, we compare Alignment interventions (aV): increasing the probability that one or more agents have our values. Cooperation interventions given alignment (aC|V): increasing the gains from trade and reducing the cost from conflict for agents with our values. Cooperation interventions given misalignment (aC|V): increasing the gains from trade and reducing the cost from conflict for agents without our values. We used a model-based approach (see here for a discussion of its benefits) paired with qualitative analysis. While these two posts don't constitute an exhaustive analysis (more exhaustive versions are less polished), feel free to reach out if you're interested in this question and want to hear more about this work. Most of this post is a replication of previous work by Hjalmar Wijk and Tristan Cook (unpublished). The basic modelling idea we're building upon is to define how different variables affect our utility, and then incrementally compute or estimate partial derivatives to assess the value of marginal work on this or that kind of intervention. Setup We model a multi-agentic situation. We classify each agent as either having (approximately) our values (V) or any other values (V). We also classify them as either cooperative (C) or non-cooperative (C).[1] These classifications are binary. We are also (for now) agnostic about what these agents represent. Indeed, this basic multi-agentic model will be applicable (with differently informed estimates) to any scenario with multiple singletons, including the following: Different AGIs (or other kinds of singletons, like AI-augmented nation-states) interacting causally on Earth Singletons arising from different planets interacting causally in the lightcone Singletons from across the multi-verse interacting acausally The variable we care about is total utility (U). As a simplifying assumption, our way to compute it will be as a weighted interpolation of two binary extremes: one in which bargaining goes (for agents with our values) as well as possible (B), and another one in which it goes as badly as possible (B). The interpolation coefficient (b) could be interpreted as "percentage of interactions that result in minimally cooperative bargaining settlements". We also consider all our interventions are on only a single one of the agents (which controls a fraction FI of total resources), which usually represents our AGI or our civilization.[2] And these interventions are coarsely grouped into alignment work (aV), cooperation work targeted at worlds with high alignment power (aC|V), and cooperation work targeted at worlds with low alignment power (aC|V). The overall structure looks like this: Full list of variables This section safely skippable. The first 4 variables model expected outcomes: UBR: Utility attained in the possible world where our bargaining goes as well as possible. UBR: Utility attained in the possible world where our bargaining goes as badly as possible. b[0,1]: Baseline (expected) success of bargaining (for agents with our values), used to interpolate between UB and UB. Can be i...