Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Troll Bridge , published by Abram Demski on the AI Alignment Forum.
All of the results in this post, and most of the informal observations/interpretations, are due to Sam Eisenstat. I think the Troll Bridge story, as a way to make the decision problem understandable, is due to Tsvi; but I'm not sure.
Pure Logic Version
Troll Bridge is a decision problem which has been floating around for a while, but which has lacked a good introductory post. The original post gives the essential example, but it lacks the "troll bridge" story, which (1) makes it hard to understand, since it is just stated in mathematical abstraction, and (2) makes it difficult to find if you search for "troll bridge".
The basic idea is that you want to cross a bridge. However, there is a troll who will blow up the bridge with you on it, if (and only if) you cross it "for a dumb reason" — for example, due to unsound logic. You can get to where you want to go by a worse path (through the stream). This path is better than being blown up, though.
We apply a Löbian proof to show not only that you choose not to cross, but furthermore, that your counterfactual reasoning is confident that the bridge would have blown up if you had crossed. This is supposed to be a counterexample to various proposed notions of counterfactual, and for various proposed decision theories.
The pseudocode for the environment (more specifically, the utility gained from the environment) is as follows:
IE, if the agent crosses the bridge and is inconsistent, then U=-10. (□⊥ means "PA proves an inconsistency".) Otherwise, if the agent crosses the bridge, U=+10. If neither of these (IE, the agent does not cross the bridge), U=0.
The pseudocode for the agent could be as follows:
This is a little more complicated, but the idea is supposed to be that you search for every "action implies utility" pair, and take the action for which you can prove the highest utility (with some tie-breaking procedure). Importantly, this is the kind of proof-based decision theory which eliminates spurious counterfactuals in 5-and-10 type problems. It isn't that easy to trip up with Löbian proofs. (Historical/terminological note: This decision theory was initially called MUDT, and is still sometimes referred to in that way. However, I now often call it proof-based decision theory, because it isn't centrally a UDT. "Modal DT" (MDT) would be reasonable, but the modal operator involved is the "provability" operator, so "proof-based DT" seems more direct.)
Now, the proof:
Reasoning within PA (ie, the logic of the agent):
Suppose the agent crosses.
Further suppose that the agent proves that crossing implies U=-10.
Examining the source code of the agent, because we're assuming the agent crosses, either PA proved that crossing implies U=+10, or it proved that crossing implies U=0.
So, either way, PA is inconsistent -- by way of 0=-10 or +10=-10.
So the troll actually blows up the bridge, and really, U=-10.
Therefore (popping out of the second assumption), if the agent proves that crossing implies U=-10, then in fact crossing implies U=-10.
By Löb's theorem, crossing really implies U=-10.
So (since we're still under the assumption that the agent crosses), U=-10.
So (popping out of the assumption that the agent crosses), the agent crossing implies U=-10.
Since we proved all of this in PA, the agent proves it, and proves no better utility in addition (unless PA is truly inconsistent). On the other hand, it will prove that not crossing gives it a safe U=0. So it will in fact not cross.
The paradoxical aspect of this example is not that the agent doesn't cross -- it makes sense that a proof-based agent can't cross a bridge whose safety is dependent on the agent's own logic being consistent, since proof-based agents can't know whether their logic is consistent. Rather, the ...
view more