Download - Prisoners' Dilemma with Costs to Modeling by Scott Garrabrant

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Prisoners' Dilemma with Costs to Modeling by Scott Garrabrant

2021-12-06

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Prisoners' Dilemma with Costs to Modeling, published by Scott Garrabrant on the AI Alignment Forum. We consider a modification to the open source prisoners' dilemma in which agents must pay some resources to model each other. We will use the modal combat framework, but where agents pay a cost proportional to the depth of boxes in their code. Even a small modeling penalty makes the FairBot-FairBot outcome no longer an equilibrium, since the best response to FairBot is to be CooperateBot and not pay the modeling penalty. The best response to CooperateBot is to be DefectBot, and the pure DefectBot-DefectBot outcome is a stable Nash equilibrium. In fact, I believe that DefectBot-DefectBot is the unique pure strategy Nash equilibrium. Amazingly, this turns out to be okay! For small modeling penalties, there is a mixed strategy equilibrium which mixes between CooperateBot, FairBot, and PrudentBot! Both players get exactly the same utility in expectation as the FairBot-FairBot outcome. Further, if you consider an evolutionary system where populations reproduce in proportion to how well they do in prisoners' dilemmas with each other, it appears that as the modeling penalty gets small, the basin of the defect equilibrium also gets small, and nearly all initial conditions cycle around CooperateBot, FairBot, and PrudentBot! This post came out of conversations with Sam Eisenstat, Abram Demski, Tsvi Benson-Tilsen, and Andrew Critch. It is a first draft that could use a coauthor to carefully check everything, expand on it, and turn it into a paper. If you think you could do that with minimal guidance from me, let me know. Formalism We will be using the modal combat framework, and identifying ⊤ with cooperation and ⊥ with defection. Agents are defined to formulas that combine the other agent X run on various agents using propositional calculus and a modal operator □ . The □ represents provability, and every instance of X run on an agent in the formula must be contained within a □ . Recall some common modal agents: CooperateBot is defined by C B X ↔ ⊤ DefectBot is defined by D B X ↔ ⊥ FairBot is defined by F B X ↔ □ X F B PrudentBot is defined by P B X ↔ □ X P B ∧ X D B → □ ⊥ These 4 agents interact with each other as follows: CooperateBot cooperates with everyone. DefectBot defects against everyone. FairBot defects against only DefectBot. PrudentBot defects against CooperateBot and DefectBot and cooperates with itself and FairBot. We will say that the depth of an agent is the maximum of the depth of □ s in its code and the depth of the agents that it calls the opponent on. CooperateBot and DefectBot have depth 0, FairBot has depth 1, and PrudentBot has depth 2. We will use a prisoner's dilemma where mutual cooperation produces utility 2, mutual defiction produces utility 1, and exploitation produces utility 3 for the exploiter and 0 for the exploited. Each player will also pay a penalty of ε times its depth. Pure Equilibria The best response to both CooperateBot and DefectBot is DefectBot, since when the opponent does not depend on you, you want to defect with the least possible penalty. The best response to FairBot is CooperateBot, since you can't exploit FairBot, so you want to get mutual cooperation with the least possible penalty. The best response to PrudentBot is FairBot, since you can't exploit PrudentBot, you can't mutually cooperate with penalty 0, but you can mutually cooperate with penalty 1 by being FairBot. (This is assuming ε is at less than 1 2 . Otherwise, you just want to defect to avoid the penalty.) Thus, if the only options are CooperateBot, DefectBot, FairBot, and PrudentBot, the unique pure strategy equilibrium is mutual DefectBot. I believe that DefectBot is the only pure strategy equilibrium in general. This would follow directly from the fact that if a dept...