Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Searching for Searching for Search, published by Rubi Hudson on February 14, 2024 on The AI Alignment Forum.
Thanks to Leo Gao, Nicholas Dupuis, Paul Colognese, Janus, and Andrei Alexandru for their thoughts. This post was mostly written in 2022, and pulled out of my drafts after recent conversations on the...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Searching for Searching for Search, published by Rubi Hudson on February 14, 2024 on The AI Alignment Forum.
Thanks to Leo Gao, Nicholas Dupuis, Paul Colognese, Janus, and Andrei Alexandru for their thoughts. This post was mostly written in 2022, and pulled out of my drafts after recent conversations on the topic.
Searching for Search is the research direction that looks into how neural networks implement search algorithms to determine an action. The hope is that if we can find the search process, we can then determine which goal motivates it, which may otherwise be a much more difficult task. We may even be able to
retarget the search, specifying a new goal while keeping the capabilities of the model intact. Conclusively showing that search is not happening would also be useful, as it would imply
a lack of generalization ability that could make deception less likely. However, search can take many forms, and we do not have a widely accepted definition of search that neatly captures all the variants. Searching for search may fail either by using too narrow a definition and missing the specific type of search that a model actually implements, or by using too wide of definition to check for in practice.
This post analyzes the components of several algorithms for search, looking at commonalities and differences, and aims to provide a relatively tight definition of search that can guide the process of searching for search. Unfortunately, the definition arrived at here is so broad that it does not lead to useful clarification or application. Furthermore, many possible algorithms that fall under search do not suggest an overarching goal for the model.
Due to these issues, I instead suggest searching for substeps of search. Gathering evidence on how these substeps are compiled throughout the training process could shed light both on how to search for search and how agency and goals develop.
Search vs. Heuristics
What alternative is there to implementing a search process? Abram Demski splits the class of optimizers into
control and selection (which includes search). Selection processes can instantiate and evaluate arbitrary elements of the search space (though doing so may be costly) and then choose the best option, while control processes only instantiate one element and may not even evaluate it. Less formally, a control process can be thought of as a set of heuristics.
A standard example of one would be a thermostat, which raises or lowers the temperature according to a set rule depending on the temperature it detects.
The distinction between selection and control is not always clear though, and Abram describes it as more of a continuum. A search process may only be able to generate a subset of possible options for evaluation and may not be able to evaluate them perfectly, instead relying on heuristics to approximate it . A controller's heuristics to choose an option may instead be thought of as a limited search space, such as a thermostat searching over the options "up" and "down".
If search is important to identify, then so is a control process that imitates search.
Blurring the lines further is the fact that many search processes cannot be cleanly implemented in a neural net. The limited number of steps in a forward pass means that search processes with an unbounded duration must be approximated instead. Rather than implementing a search process directly, a neural network must approximate the search process using a series of heuristics.
This means that selection and control are both variations on a pile of heuristics, suggesting that instead of trying to identify holistic search processes in neural networks, it could be more feasible to split it into the heuristics that make up the substeps of search and look for those instead. Doing so also opens the possibility of stud...
View more