Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: How much chess engine progress is about adapting to bigger computers?, published by Paul Christiano on the AI Alignment Forum.
(This question comes from a discussion with Carl Shulman.)
In this post I describe an experiment that I'd like to see run. I'm posting a $1,000 - $10,000 prize for a convincing implementation of these experiments. I also post a number of smaller prizes for relevant desk research or important corrections to this request.
Motivation
In order to understand the dynamics of the singularity, I'd like to understand how easy it is to improve algorithms and software.
We can learn something about this from looking at chess engines. It's not the most relevant domain to future AI, but it's one with an unusually long history and unusually clear (and consistent) performance metrics.
In order to quantify the quality of a chess engine, we can fix a level of play and ask "How much compute is needed for the engine to play at that level?"
One complication in evaluating the rate of progress is that it depends on what level of play we use for evaluation. In particular, newer algorithms are generally designed to play at a much higher level than older algorithms. So if we quantify the compute needed to reach modern levels of play, we will capture both absolute improvements and also "adaptation" to the new higher amounts of compute.
So we'd like to attribute progress in chess engines to three factors:
Better software.
Bigger computers.
Software that is better-adapted to new, bigger computers.
Understanding the size of factor #1 is important for extrapolating progress given massive R&D investments in software. While it is easy to separate factors #1 and #2 from publicly available information, it is not easy to evaluate factor #3.
Experiment description
Pick two (or more) software engines from very different times. They should both be roughly state of the art, running on "typical" machines from the era (i.e. the machines for which R&D is mostly targeted).
We then carry out two matches:
Run the old engine on its "native" hardware (the "old hardware"). Then evaluate: how little compute does the new engine need in order to beat the old engine?
Run the new engine on its "native" hardware (the "new hardware"). Then evaluate: how much compute does the old engine need in order to beat the new engine?
With some effort, we can estimate a quantitative ratio of "ops needed" for each of these experiments. For example, we may find that the new engine is able to beat the old engine using only 1% of the "old hardware." Whereas we may find that the old engine would require 10,000x the "new hardware" in order to compete with the new engine.
The first experiment tells us about the absolute improvements in chess engines on the task for which the old engine was optimized. (This understates the rate of software progress to the extent that people stopped working on this task.) The second experiment gives us the combination of absolute improvements + adaptation to new hardware. Typical measures of "rate of software progress" will be somewhere in between, and are sensitive to the hardware on which the evaluation is carried out.
I believe that understanding these two numbers would give us a significantly clearer picture of what's really going on with software progress in chess engines.
Experiment details
Here's some guesses about how to run this experiment well. I don't know much about computer chess, so you may be able to make a better proposal.
Old engine, old hardware: my default proposal is the version of Fritz that won the 1995 world computer chess championship, using the same amount of hardware (and time controls) as in that championship. This algorithm seems like a particularly reasonable "best effort" at making full use of available computing resources. I don't want to compare an engi...
view more