Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timaeus's First Four Months, published by Jesse Hoogland on February 28, 2024 on The AI Alignment Forum.
Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timaeus's First Four Months, published by Jesse Hoogland on February 28, 2024 on The AI Alignment Forum.
Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress update.
In service of the mission, our first priority has been to support and contribute to ongoing work in Singular Learning Theory (SLT) and developmental interpretability, with the aim of laying theoretical and empirical foundations for a science of deep learning and neural network interpretability.
Our main uncertainties in this research were:
Is SLT useful in deep learning? While SLT is mathematically established, it was not clear whether the central quantities of SLT could be estimated at sufficient scale, and whether SLT's predictions actually held for realistic models (esp. language models).
Does structure in neural networks form in phase transitions? The idea of developmental interpretability was to view phase transitions as a core primitive in the study of internal structure in neural networks. However, it was not clear how common phase transitions are, and whether we can detect them.
The research Timaeus has conducted over the past four months, in collaboration with Daniel Murfet's group at the University of Melbourne and several independent AI safety researchers, has significantly reduced these uncertainties, as we explain below. As a result we are now substantially more confident in the research agenda.
While we view this fundamental work in deep learning science and interpretability as critical, the mission of Timaeus is to make fundamental contributions to alignment and these investments in basic science are to be judged relative to that end goal. We are impatient about making direct contact between these ideas and central problems in alignment. This impatience has resulted in the research directions outlined at the end of this post.
Contributions
Timaeus conducts two main activities: (1) research, that is, developing new tools for interpretability, mechanistic anomaly detection, etc., and (2) outreach, that is, introducing and advocating for these techniques to other researchers and organizations.
Research Contributions
What we learned
Regarding the question of whether SLT is useful in deep learning we have learned that
The Local Learning Coefficient (LLC) can be accurately estimated in deep linear networks at scales up to 100M parameters, see Lau et al 2023 and Furman and Lau 2024. We think there is a good chance that LLC estimation can be scaled usefully to much larger (nonlinear) models.
Changes in the LLC predict phase transitions in toy models (Chen et al 2023), deep linear networks (Furman and Lau 2024), and (language) transformers at scales up to 3M parameters (Hoogland et al 2024).
Regarding whether structure in neural networks forms in phase transitions, we have learned that
This is true in the toy model of superposition introduced by Elhage et al, where in a special case the critical points and transitions between them were analyzed in detail by Chen et al 2023 and the LLC detects these transitions (which appear to be consistent with the kind of phase transition defined mathematically in SLT).
Learning is organized around discrete developmental stages. By looking for SLT-predicted changes in structure and geometry, we successfully discovered new "hidden" transitions (Hoogland et al. 2024) in a 3M parameter language transformer and a 50k parameter linear regression transformer. While these transitions are consistent with phase transitions in the Bayesian sense in SLT, they are not necessarily fast when measured in SGD steps (that is, the relevant developmental timescale is not steps).
For this reason we have started following the development...
View more