Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Hell is Game Theory Folk Theorems, published by jessicata on May 1, 2023 on LessWrong.
[content warning: simulated very hot places; extremely bad Nash equilibria]
(based on a Twitter thread)
Rowan: "If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently."
Neal: "Punishment seems like a bad idea. It's pessimizing another agent's utility function. You could get a pretty bad equilibrium if you're saying agents should be intentionally harming each others' interests, even in restricted cases."
Rowan: "In iterated games, it's correct to defect when others defect against you; that's tit-for-tat."
Neal: "Tit-for-tat doesn't pessimize, though, it simply withholds altruism sometimes. In a given round, all else being equal, defection is individually rational."
Rowan: "Tit-for-tat works even when defection is costly, though."
Neal: "Oh my, I'm not sure if you want to go there. It can get real bad. This is where I pull out the game theory folk theorems."
Rowan: "What are those?"
Neal: "They're theorems about Nash equilibria in iterated games. Suppose players play normal-form game G repeatedly, and are infinitely patient, so they don't care about their positive or negative utilities being moved around in time. Then, a given payoff profile (that is, an assignment of utilities to players) could possibly be the mean utility for each player in the iterated game, if it satisfies two conditions: feasibility, and individual rationality."
Rowan: "What do those mean?"
Neal: "A payoff profile is feasible if it can be produced by some mixture of payoff profiles of the original game G. This is a very logical requirement. The payoff profile could only be the average of the repeated game if it was some mixture of possible outcomes of the original game. If some player always receives between 0 and 1 utility, for example, they can't have an average utility of 2 across the repeated game."
Rowan: "Sure, that's logical."
Neal: "The individual rationality condition, on the other hand, states that each player must get at least as much utility in the profile as they could guarantee getting by min-maxing (that is, picking their strategy assuming other players make things as bad as possible for them, even at their own expense), and at least one player must get strictly more utility."
Rowan: "How does this apply to an iterated game where defection is costly? Doesn't this prove my point?"
Neal: "Well, if defection is costly, it's not clear why you'd worry about anyone defecting in the first place."
Rowan: "Perhaps agents can cooperate or defect, and can also punish the other agent, which is costly to themselves, but even worse for the other agent. Maybe this can help agents incentivize cooperation more effectively."
Neal: "Not really. In an ordinary prisoner's dilemma, the (C, C) utility profile already dominates both agents' min-max utility, which is the (D, D) payoff. So, game theory folk theorems make mutual cooperation a possible Nash equilibrium."
Rowan: "Hmm. It seems like introducing a punishment option makes everyone's min-max utility worse, which makes more bad equilibria possible, without making more good equilibria possible."
Neal: "Yes, you're beginning to see my point that punishment is useless. But, things can get even worse and more absurd."
Rowan: "How so?"
Neal: "Let me show you my latest game theory simulation, which uses state-of-the-art generative AI and reinforcement learning. Don't worry, none of the AIs involved are conscious, at least according to expert consensus."
Neal turns on a TV and types some commands into his laptop. The TV shows 100 prisoners in cages, some of whom are screaming in pain. A mirage effect appears across the landscape, as if the area is very hot.
Rowan: "Wow, tha...
view more