Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dangers of Closed-Loop AI, published by Gordon Seidoh Worley on March 22, 2024 on The AI Alignment Forum.
In control theory, an open-loop (or non-feedback) system is one where inputs are independent of outputs. A closed-loop (or feedback) system is one where outputs are input back into the system.
In theory, open-loop systems exist. In reality, no system is truly open-loop because systems are embedded in the physical world where isolation of inputs from outputs cannot be guaranteed. Yet in practice we can build systems that are effectively open-loop by making them ignore weak and unexpected input signals.
Open-loop systems execute plans, but they definitionally can't change their plans based on the results of their actions. An open-loop system can be designed or trained to be good at achieving a goal, but it can't actually do any optimization itself. This ensures that some other system, like a human, must be in the loop to make it better at achieving its goals.
A closed-loop system has the potential to self-optimize because it can observe how effective its actions are and change its behavior based on those observations. For example, an open-loop paperclip-making-machine can't make itself better at making paperclips if it notices it's not producing as many paperclips as possible. A closed-loop paperclip-making-machine can, assuming its designed with circuits that allow it to respond to the feedback in a useful way.
AIs are control systems, and thus can be either open- or close-loop. I posit that open-loop AIs are less likely to pose an existential threat than closed-loop AIs. Why? Because open-loop AIs require someone to make them better, and that creates an opportunity for a human to apply judgement based on what they care about. For comparison, a nuclear dead hand device is potentially much more dangerous than a nuclear response system where a human must make the final decision to launch.
This suggests a simple policy to reduce existential risks from AI: restrict the creation of closed-loop AI. That is, restrict the right to produce AI that can modify its behavior (e.g. self-improve) without going through a training process with a human in the loop.
There are several obvious problems with this proposal:
No system is truly open-loop.
A closed-loop system can easily be created by combining 2 or more open-loop systems into a single system.
Systems may look like they are open-loop at one level of abstraction but really be closed-loop at another (e.g. an LLM that doesn't modify its model, but does use memory/context to modify its behavior).
Closed-loop AIs can easily masquerade as open-loop AIs until they've already optimized towards their target enough to be uncontrollable.
Open-loop AIs are still going to be improved. They're part of closed-loop systems with a human in the loop, and can still become dangerous maximizers.
Despite these issues, I still think that, if I were designing a policy to regulate the development of AI, I would include something to place limits on closed-loop AI. A likely form would be a moratorium on autonomous systems that don't include a human in the loop, and especially a moratorium on AIs that are used to either improve themselves or train other AIs.
I don't expect such a moratorium to eliminate existential risks from AI, but I do think it could meaningfully reduce the risk of run-away scenarios where humans get cut out before we have a chance to apply our judgement to prevent undesirable outcomes. If I had to put a number on it, such a moratorium perhaps makes us 20% safer.
Author's note: None of this is especially original. I've been saying some version of what's in this post for 10 years to people, but I realized I've never written it down. Most similar arguments I've seen don't use the generic language of control theory and instead are ex...
view more