Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analogies between scaling labs and misaligned superintelligent AI, published by Stephen Casper on February 21, 2024 on The AI Alignment Forum.
TL;DR: Scaling labs have their own alignment problem analogous to AI systems, and there are some similarities between the labs and misaligned/unsafe AI.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analogies between scaling labs and misaligned superintelligent AI, published by Stephen Casper on February 21, 2024 on The AI Alignment Forum.
TL;DR: Scaling labs have their own alignment problem analogous to AI systems, and there are some similarities between the labs and misaligned/unsafe AI.
Introduction
Major AI scaling labs (OpenAI/Microsoft, Anthropic, Google/DeepMind, and Meta) are very influential in the AI safety and alignment community. They put out cutting-edge research because of their talent, money, and institutional knowledge. A significant subset of the community works for one of these labs. This level of influence is beneficial in some aspects.
In many ways, these labs have strong safety cultures, and these values are present in their high-level approaches to developing AI - it's easy to imagine a world in which things are much worse. But the amount of influence that these labs have is also something to be cautious about.
The alignment community is defined by a concern that subtle misalignment between the incentives that we give AI systems and what we actually want from them might cause these systems to dangerously pursue the wrong goals. This post considers an analogous and somewhat ironic alignment problem: between human interests and the scaling labs.
These labs have intelligence, resources, and speed well beyond that of any single human. Their amount of money, compute, talent, and know-how make them extremely capable. Given this, it's important that they are aligned with the interests of humanity. However, there are some analogies between scaling labs and misaligned AI.
It is important not to draw false equivalences between different labs. For example, it seems that by almost every standard, Anthropic prioritizes safety and responsibility much more than other labs. But in this post, I will generally be lumping them together except to point out a few lab-specific observations.
Misaligned Incentives
In much the same way that AI systems may have perverse incentives, so do the labs. They are companies. They need to make money, court investors, make products, and attract users. Anthropic and Microsoft even just had Super Bowl ads. This type of accountability to commercial interests is not perfectly in line with doing what is good for human interests. Moreover, the labs are full of technocrats whose values and demographics do not represent humanity particularly well.
Optimizing for the goals that the labs have is not the same thing as optimizing for human welfare.
Goodhart's Law applies.
Power Seeking
One major risk factor of misaligned superintelligent AI systems is that
they may pursue power and influence. But the same is true of the scaling labs. Each is valued in the billions of dollars due to its assets and investments. They compete with each other for technical primacy. The labs also pursue instrumental goals, including
political influence with lobbying and strategic secrecy to reduce the risk of lawsuits involving data and fair use.
Recent news that Sam Altman is potentially pursuing trillions in funding for hardware suggests that this type of power-seeking may reach large scales in the near future. To stay competitive, labs need to keep scaling, and when one lab scales, others are driven to do so as well in an arms race.
Lack of Transparency
Trust without transparency is misguided. We want AI systems that are honest
white boxes that are easy to interpret and understand. However, the scaling labs do not meet this standard. They tend to be highly selective in what they publicize, have employees sign non-disclosure agreements, and generally lack transparency or accountability to the public. Instead of being white boxes, the labs are more like dark grey boxes that seem to rarely choose to reveal things that would make them look bad.
A lack of explan...
View more