Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Explaining “Hell is Game Theory Folk Theorems”, published by electroswing on May 5, 2023 on LessWrong.
I, along with many commenters, found the explanation in Hell is Game Theory Folk Theorems somewhat unclear. I am re-explaining some of the ideas from that post here. Thanks to jessicata for writing a post on such an interesting topic.
1-shot prisoner’s dilemma.
In a 1-shot prisoner’s dilemma, defecting is a dominant strategy. Because of this, (defect, defect) is the unique Nash equilibrium of this game. Which kind of sucks, since (cooperate, cooperate) would be better for both players.
Nash equilibrium.
Nash equilibrium is just a mathematical formalism. Consider a strategy profile, which is a list of which strategy each player chooses. A strategy profile is a Nash equilibrium if no player is strictly better off switching their strategy, assuming everyone else continues to play their strategy listed in the strategy profile. Notably, Nash equilibrium says nothing about:
What if two or more people team up and deviate from the Nash equilibrium strategy profile?
What if people aren’t behaving fully rationally? (see bounded rationality)
Nash equilibria may or may not have predictive power. It depends on the game. Much work in game theory involves refining equilibrium concepts to have more predictive power in different situations (e.g. subgame perfect equilibrium to handle credible threats, trembling hand equilibrium to handle human error).
n-shot prisoner’s dilemma.
OK, now what if people agree to repeat a prisoner’s dilemma n=10 times? Maybe the repeated rounds can build trust among players, causing cooperation to happen?
Unfortunately, the theory says that (defect, defect) is still the unique Nash equilibrium. Why? Because in the 10th game, players don’t care about their reputation anymore. They just want to maximize payoff, so they may as well defect. So, it is common knowledge that each player will defect in the 10th game. Now moving to the 9th game, players know their reputation doesn’t matter in this game, because everyone is going to defect in the 10th game anyway. So, it is common knowledge that each player will defect in the 9th game. And so on. This thought process is called backwards induction.
This shows that the unique Nash equilibrium is still (defect, defect), even if the number of repetitions is large. Why might this lack predictive power?
In the real world there might be uncertainty about the number of repetitions.
(again) people might not behave fully rationally—backwards induction is kind of complicated!
Probably the simplest way to model “uncertainty about the number of repetitions” is by assuming an infinite number of repetitions.
infinitely repeated prisoner’s dilemma. OK, now assume the prisoner’s dilemma will be repeated forever.
Turns out, now, there exists a Nash equilibrium which involves cooperation! Here is how it goes. Each player agrees to play cooperate, indefinitely. Whenever any player defects, the other player responds by defecting for the rest of all eternity (“punishment”).
technical Q: Hold on a second. I thought Nash equilibrium was “static” in the sense that it just says: given that everybody is playing a Nash equilibrium strategy profile, if a single person deviates (while everyone else keeps playing according to the Nash equilibrium strategy profile), then they will not be better off from deviating. This stuff where players choose to punish other players in response to bad behavior seems like a stronger equilibrium concept not covered by Nash.
A: Nope! A (pure) strategy profile is a list of strategies for each player. In a single prisoner’s dilemma, this is just a choice of “cooperate” or “defect”. In a repeated prisoner’s dilemma, this is much more complicated. A strategy is a complete contingency plan of what the player plans to...
view more