Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: How do we prepare for final crunch time?Q , published by Eli Tyre on the AI Alignment Forum.
[Crossposted from Musings and Rough Drafts.]
Epistemic status: Brainstorming and first draft thoughts.
Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]
One claim I sometimes hear about AI alignment [paraphrase]:
It is really hard to know what sorts of AI alignment work are good this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, MOST of the work of AI alignment is done in the final few years (or months) before AGI, when we’ve solved most of the hard capabilities problems already so we know what AGI will look like and we can work directly, with good feedback loops, on the sorts of systems that we want to align.
Usually, this is said to argue that to value of the alignment research being done today is primarily that of enabling, future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting and unblocking the work of alignment researchers in this last stretch.
In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.
What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?
The following are some ideas.
[In this post, I'm going to refer to this last stretch of a few months to a few years, "final crunch time", as distinct from just "crunch time", ie this century.]
Access
For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.
A different kind of work
Most current AI alignment work is pretty abstract and theoretical, for two reasons.
The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, they'll only constrain unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.
The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much that can be done in the way of tinkering and getting feedback.
That second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2) that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and live in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)
And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems: trying out different alignment approaches empirically, and tinkering to discover new approa...
view more