Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is:...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Stagewise Development in Neural Networks, published by Jesse Hoogland on March 20, 2024 on The AI Alignment Forum.
TLDR: This post accompanies The Developmental Landscape of In-Context Learning by Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei and Daniel Murfet (2024), which shows that in-context learning emerges in discrete, interpretable developmental stages, and that these stages can be discovered in a model- and data-agnostic way by probing the local geometry of the loss landscape.
Four months ago, we shared a discussion here of a paper which studied stagewise development in the toy model of superposition of Elhage et al. using ideas from Singular Learning Theory (SLT).
The purpose of this document is to accompany a follow-up paper by Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei and Daniel Murfet, which has taken a closer look at stagewise development in transformers at significantly larger scale, including language models, using an evolved version of these techniques.
How does in-context learning emerge? In this paper, we looked at two different settings where in-context learning is known to emerge:
Small attention-only language transformers, modeled after Olsson et al. (3m parameters).
Transformers trained to perform linear regression in context, modeled after Raventos et al. (50k parameters).
Changing geometry reveals a hidden stagewise development. We use two different geometric probes to automatically discover different developmental stages:
The local learning coefficient (LLC) of SLT, which measures the "basin broadness" (volume scaling ratio) of the loss landscape across the training trajectory.
Essential dynamics (ED), which consists of applying principal component analysis to (a discrete proxy of) the model's functional output across the training trajectory and analyzing the geometry of the resulting low-dimensional trajectory.
In both settings, these probes reveal that training is separated into distinct developmental stages, many of which are "hidden" from the loss (Figures 1 & 2).
Developmental stages are interpretable. Through a variety of hand-crafted behavioral and structural metrics, we find that these developmental stages can be interpreted.
The progression of the language model is characterized by the following sequence of stages:
(LM1) Learning bigrams,
(LM2) Learning various n-grams and incorporating positional information,
(LM3) Beginning to form the first part of the induction circuit,
(LM4) Finishing the formation of the induction circuit,
(LM5) Final convergence.
The evolution of the linear regression model unfolds in a similar manner:
(LR1) Learns to use the task prior (equivalent to learning bigrams),
(LR2) Develops the ability to do in-context linear regression,
(LR3-4) Two significant structural developments in the embedding and layer norms,
(LR5) Final convergence.
Developmental interpretability is viable. The existence and interpretability of developmental stages in larger, more realistic transformers makes us substantially more confident in developmental interpretability as a viable research agenda. We expect that future generations of these techniques will go beyond detecting when circuits start/stop forming to detecting where they form, how they connect, and what they implement.
On Stagewise Development
Complex structures can arise from simple algorithms. When iterated across space and time, simple algorithms can produce structures of great complexity. One example is evolution by natural selection. Another is optimization of artificial neural networks by gradient descent. In both cases, the underlying logic - that simple algorithms operating at scale can produce highly complex structures - is so counterintuitive that it often elicits disbelief.
A second counterintui...
view more