Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Updatelessness doesn't solve most problems, published by Martín Soto on February 8, 2024 on The AI Alignment Forum.
In some discussions (especially about
acausal trade and
multi-polar conflict), I've heard the motto "X will/won't be a problem because superintelligences will just be Updateless". Here ...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Updatelessness doesn't solve most problems, published by Martín Soto on February 8, 2024 on The AI Alignment Forum.
In some discussions (especially about
acausal trade and
multi-polar conflict), I've heard the motto "X will/won't be a problem because superintelligences will just be Updateless". Here I'll explain (in layman's terms) why, as far as we know, it's not looking likely that a super satisfactory implementation of Updatelessness exists, nor that superintelligences automatically implement it, nor that this would drastically improve multi-agentic bargaining.
Epistemic status: These insights seem like the most robust update from my work with Demski on
Logical Updatelessness and discussions with CLR employees about
Open-Minded Updatelessness. To my understanding, most researchers involved agree with them and the message of this post.
What is Updatelessness?
This is skippable if you're already familiar with the concept.
It's easier to illustrate with the following example: Counterfactual Mugging.
I will throw a fair coin.
If it lands Heads, you will be able to freely choose whether to pay me $100 (and if so, you will receive nothing in return).
If it lands Tails, I will check whether you paid me the $100 in the Heads world[1], and if so, I will pay you $1000.
When you find yourself in the Heads world, one might argue, the rational thing to do is to not pay. After all, you already know the coin landed Heads, so you will gain nothing by paying the $100 (assume this game is not iterated, etc.).
But if, before knowing how the coin lands, someone offers you the opportunity of committing to paying up in the Heads world, you will want to accept it! Indeed, you're still uncertain about whether you'll end up in the Heads or the Tails world (50% chance on each). If you don't commit, you know you won't pay if you find yourself in the Heads world (and so also won't receive $1000 in the Tails world), so your expected payoff is $0.
But if you commit, your payoff will be -$100 in the Heads world, and $1000 in the Tails world, so $450 in expectation.
This is indeed what happens to the best-known decision theories (CDT and EDT): they want to commit to paying, but if they don't, by the time they get to the Heads world they don't pay. We call this dynamic instability, because different (temporal) versions of the agent seem to be working against each other.
Why does this happen? Because, before seeing the coin, the agent is still uncertain about which world it will end in, and so still "cares" about what happens in both (and this is reflected in the expected value calculation, when we include both with equal weight). But upon seeing the coin land, the agent updates on the information that it's in the Heads world, and the Tails world doesn't exist, and so stops "caring" about the latter.
This is not so different from our utility function changing (before we were trying to maximize it in two worlds, now only in one), and
we know that leads to instability.
An updateless agent would use a decision procedure that doesn't update on how the coin lands. And thus, even if it found itself in the Heads world, it would acknowledge its previous credences gave equal weight to both worlds, and so pay up (without needing to have pre-committed to do so), because this was better from the perspective of the prior.
Indeed, Updatelessness is nothing more than "committing to maximize the expected value from the perspective of your prior" (instead of constantly updating your prior, so that the calculation of this expected value changes). This is not always straight-forward or well-defined (for example, what if you learn of a radically new insight that you had never considered at the time of setting your prior?), so we need to fill in more details to obtain a completely defined decision theory.
But that's t...
View more