Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Coherence arguments imply a force for goal-directed behavior, published by KatjaGrace on the AI Alignment Forum
[Epistemic status: my current view, but I haven’t read all the stuff on this topic even in the LessWrong community, let alone more broadly.]
There is a line of thought that says that advanced AI will tend to be ‘goal-directed’—that is, consistently doing whatever makes certain favored outcomes more likely—and that this is to do with the ‘coherence arguments’. Rohin Shah, and probably others1, have argued against this. I want to argue against them.
The old argument for coherence implying (worrisome) goal-directedness
I’d reconstruct the original argument that Rohin is arguing against as something like this (making no claim about my own beliefs here):
‘Whatever things you care about, you are best off assigning consistent numerical values to them and maximizing the expected sum of those values’
‘Coherence arguments2’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.3
‘Advanced AI will tend to do better things instead of worse things, by its own lights’
Advanced AI will tend to avoid options that are predictably strictly worse by its own lights, due to being highly optimized for making good choices (by some combination of external processes that produced it, its own efforts, and the selection pressure acting on its existence).
‘Therefore advanced AI will maximize EU, roughly’
Advanced AI will tend to be fairly coherent, at least to a level of approximation where becoming more coherent isn’t worth the cost.4 Which will probably be fairly coherent (e.g. close enough to coherent that humans can’t anticipate the inconsistencies).
‘Maximizing EU is pretty much the same as being goal-directed’
To maximize expected utility is to pursue the goal of that which you have assigned higher utility to.5
And since the point of all this is to argue that advanced AI might be hard to deal with, note that we can get to that conclusion with:
‘Highly intelligent goal-directed agents are dangerous’
If AI systems exist that very competently pursue goals, they will likely be better than us at attaining their goals, and therefore to the extent there is a risk of mismatch between their goals and ours, we face a serious risk.
Rohin’s counterargument
Rohin’s counterargument begins with an observation made by others before: any behavior is consistent with maximizing expected utility, given some utility function. For instance, a creature just twitching around on the ground may have the utility function that returns 1 if the agent does whatever it in fact does in each situation (where ‘situation’ means, ‘entire history of the world so far’), and 0 otherwise. This is a creature that just wants to make the right twitch in each detailed, history-indexed situation, with no regard for further consequences. Alternately the twitching agent might care about outcomes, but just happen to want the particular holistic unfolding of the universe that is occurring, including this particular series of twitches. Or it could be indifferent between all outcomes.
The basic point is that rationality doesn’t say what ‘things’ you can want. And in particular, it doesn’t say that you have to care about particular atomic units that larger situations can be broken down into. If I try to call you out for first spending money to get to Paris, then spending money to get b...
view more