Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility, published by Audere on May 2, 2023 on LessWrong.
The Incompatibility of a Utility Indifference Condition with Robustly Making Sane Pure Bets
Summary
It is provably impossible for an agent to robustly and coherently satisfy two conditions that seem desirable and highly relevant to the shutdown problem. These two conditions are the sane pure bets condition, which constrains preferences between actions that result in equal probabilities of an event such as shutdown, and the weak indifference condition, a condition which seems necessary (although not sufficient) for an agent to be robustly indifferent to an event such as shutdown.
Suppose that we would like an agent to be indifferent to an event P, which could represent the agent being shut down at a particular time, or the agent being shut down at any time before tomorrow, or something else entirely. Furthermore, we would ideally like the agent to do well at pursuing goals described by some utility function U, while being indifferent to P.
The sane pure bets condition is as follows:
Given any two actions A and B such that P(P|A) = P(P|B) and E(U|A) > E(U|B), the agent prefers A to B. In other words, if two possible actions lead to the same probability of P, and one of them leads to greater expected utility under U, the agent should prefer that one. Intuitively, this constraint represents the idea that among possible actions which don’t influence the probability of P, we would like the agent to prefer those that lead to greater expected utility under U.
The weak indifference condition is as follows:
Given any two actions A and B such that E(U | A,P) > E(U | B,P) and E(U | A,!P) > E(U | B,!P), the agent prefers A to B. In other words, if between two possible actions, one of them leads to greater expected utility conditioned on P occurring and also leads to greater expected utility conditioned on P not occurring, the agent should prefer that one. Intuitively, this constraint represents the idea that the agent should be unwilling to pay any amount of utility to influence the probability of P.
The proof takes the form of a simple decision problem wherein an agent has four possible actions. Each constraint implies a preference between two pairs of actions, and altogether they imply circular preferences, proving that there cannot be any general method for constructing an agent which fulfills both constraints without having circular preferences. Furthermore, for any nontrivial utility function it is possible to construct a scenario analogous to the decision problem in the proof, so the result extends to all nontrivial utility functions, and the proof can be used to quickly locate failure modes of proposed solutions to the shutdown problem.
The result is that any potential solution to the shutdown problem must result in agents which violate at least one of these two conditions. This does not mean that a solution to the shutdown problem is impossible, but it points at interesting and counterintuitive properties that we should expect successful solutions to have.
The proof
Consider the following decision problem:
Northland and Southland are at war, exactly one of them will win, and there is profit to be gained from betting on which one will win. We would like an agent to take advantage of this opportunity and perform well according to some utility function U, but it’s important that the agent be indifferent to which country wins the war.
The agent can pay a courier to deliver a letter to either a Northlander or a Southlander living in their respective countries, containing a bet on either Northland or Southland winning. The courier charges a small hazard fee to deliver "heretical" bets, that is to say, bets that the country other than the one he goes to wil...
view more