Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Access to powerful AI might make computer security radically easier, published by Buck Shlegeris on June 8, 2024 on The AI Alignment Forum.
People talk about model weight security being really hard and crucial around the advent of AGI. (E.g.
RAND report,
Leopold; see
here for some distinctions in these thread models that I think are important.) But I think that the thinking on this has not been sufficiently attentive to the fact that during that crucial time period, by assumption we'll have access to powerful AIs. I think that such access might make security wildly easier, by a bunch of different mechanisms, some of which I'll describe in this post.
The story I'm telling here is pretty different from the main story I've heard people talk about in the past about AIs helping with computer security, which is that AIs can help with hardening software and other infrastructure.
Though I agree that that seems really useful and important, here I'm not talking about AIs automating the process of security research, I'm talking about AI making it affordable to adopt policies that improve security in the organization but that wouldn't work (because they'd be too labor-intensive or lead to who-guards-the-guards problems) if you just had humans.
In this post I'm going to be ignoring problems related to AIs themselves trying to cause security failures. There are two ways that such AIs affect this situation:
The techniques I describe in this post could also be used, AI-control style, to use
trusted models to reduce the risk of security failures caused by untrusted models. You can basically just replace "humans" by "untrusted models" everywhere in the text below.
When applying these techniques to reduce risk from humans, you have to worry that these techniques will be sabotaged by scheming models. I'll ignore that possibility in this post, and talk about ways that you can improve security using AIs that are assumed not to be intentionally trying to subvert these security measures (which we call "
trusted"). If you're worried about your models scheming, you'll have to do some other stuff too (mostly various sting operations), but I think the arguments in this post still mostly hold up. I'll talk about this more in future.
I'm going to focus on insider threats here; most threats from outside humans involve first gaining access to impersonation of some insider.
Much of this post came out of thinking about the question "how could
AI control techniques be applied to humans, instead of AIs"?
I don't think these arguments are an airtight case for "SL5 security will be trivial". But they do seem like a plausible hypothesis under which infosec will end up being quite simple to resolve, if labs decide it's an important priority.
Four strategies for using powerful AI to improve security
Monitoring
As your humans take actions, you configure your system to have an AI that looks at these actions and then flags them for further review if they seem suspicious. If the actions seem potentially catastrophic, then the AI might even flag them before they can run.
Concretely, I'm imagining that instead of being allowed to directly run any command you write in the bash console of the computer you're working on, whenever you write a command, it and the surrounding context are shown to an AI, which is charged with deciding how suspicious the action is. If it is very suspicious, the AI blocks the command and sends an alert to someone else at the company who is supposed to look it over.
If the action is just somewhat suspicious, then it flags it for further review but does not block the action or alert the user.
This is somewhat similar to using
multi-party authorization. But with abundant AI labor, we could be more comprehensive, with fewer false positives and fewer false negatives, and evaluating a much grea...
view more