The Nonlinear Library: EA Forum
Education
EA - My model of how different AI risks fit together by Stephen Clare
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My model of how different AI risks fit together, published by Stephen Clare on January 31, 2024 on The Effective Altruism Forum.[Crossposted from my Substack, Unfolding Atlas]How will AI get us in the end?Maybe it'll decide we're getting in its way and decide to take us out? It could fire all the nukes and unleash all the viruses and take us all down at onceOr maybe we'll take ourselves out? We could lose control of powerful autonomous weapons, or allow a 21st century Stalin set up an impenetrable surveillance state and obliterate freedom and progress forever.Or maybe the diffusion of AI throughout our global economy will become a quiet catastrophe? As more and more tasks are delegated to AI systems, we mere humans could be left helpless, like horses after the invention of cars. Alive, perhaps, but bewildered by a world too complex and fast-paced to be understood or controlled.Each of these scenarios has been proposed as a way the advent of advanced AI could cause a global catastrophe. But they seem quite different, and warrant different responses. In this post, I describe my model of how they fit together.[1]I divide the AI development process into three steps. Risks arise at each step. Despite its simplicity, this model does a good job pulling all these risks into one framework. It's helped me understand better how the many specific AI risks people have proposed fit together. More importantly, it satisfied my powerful, innate urge to force order onto a chaotic world.Terrible things come in threesThe three AI development stages in my model are training, deployment, and diffusion.At each stage, a different kind of AI risk arises. These are, respectively, misalignment, misuse, and systemic risks.Throughout the entire process, competitive pressures act as a risk factor. More pressure makes risks throughout the process more likely.Putting it all together looks like this:This model is too simple to be perfect. For one, these risks almost certainly won't arrive in sequence, as the model implies. They're also more entangled than a linear model implies. But I think these shortcomings are relatively minor, and relating categories of risk like this gives them a pleasant cohesiveness.[2]So let's move ahead and look closer at the risks generated at each step.Training and misalignment risksThe first set of risks emerge immediately after training an advanced AI model. Training modern AI models usually involves feeding them enormous datasets from which they can learn patterns and make predictions when given new data. Training risks relate to a model's alignment. That is, what the system wants to do, how it plans to do it, and whether those goals and methods are good for people.Some researchers worry that training a model to actually do what we want in all situations, or when we deploy it in the real world, is far from straightforward. In fact, we already see some kinds of alignment failures in the world today. These often seem silly. For example, something about the way Microsoft's first Bing chatbot's goal of being helpful and charming actually made it act like anoccasionally funny, occasionally frightening psychopath.As AI systems get more powerful, though, these misalignment risks could get a whole lot scarier. Specific risks researchers have raised include:Goal specification: It might be hard to tell AI systems exactly what we want them to do, especially as we delegate more complicated tasks to them.Some researchers worry that AIs trained on large amounts of data will either end up finding tricks or shortcuts that lead them to produce the wrong solutions when deployed in the real world. Or that they'll face extreme or different situations in the real world than they saw in the training data, and willreact in unexpected, potentially dangerous, ways.Power...
Create your
podcast in
minutes
It is Free