The Nonlinear Library: Alignment Forum Podcast - AF - Stagewise Development in Neural Networks by Jesse Hoogland

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum

Education

AF - Stagewise Development in Neural Networks by Jesse Hoogland

2024-03-20

iOS

Android Share

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is:...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Stagewise Development in Neural Networks, published by Jesse Hoogland on March 20, 2024 on The AI Alignment Forum. TLDR: This post accompanies The Developmental Landscape of In-Context Learning by Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei and Daniel Murfet (2024), which shows that in-context learning emerges in discrete, interpretable developmental stages, and that these stages can be discovered in a model- and data-agnostic way by probing the local geometry of the loss landscape. Four months ago, we shared a discussion here of a paper which studied stagewise development in the toy model of superposition of Elhage et al. using ideas from Singular Learning Theory (SLT). The purpose of this document is to accompany a follow-up paper by Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei and Daniel Murfet, which has taken a closer look at stagewise development in transformers at significantly larger scale, including language models, using an evolved version of these techniques. How does in-context learning emerge? In this paper, we looked at two different settings where in-context learning is known to emerge: Small attention-only language transformers, modeled after Olsson et al. (3m parameters). Transformers trained to perform linear regression in context, modeled after Raventos et al. (50k parameters). Changing geometry reveals a hidden stagewise development. We use two different geometric probes to automatically discover different developmental stages: The local learning coefficient (LLC) of SLT, which measures the "basin broadness" (volume scaling ratio) of the loss landscape across the training trajectory. Essential dynamics (ED), which consists of applying principal component analysis to (a discrete proxy of) the model's functional output across the training trajectory and analyzing the geometry of the resulting low-dimensional trajectory. In both settings, these probes reveal that training is separated into distinct developmental stages, many of which are "hidden" from the loss (Figures 1 & 2). Developmental stages are interpretable. Through a variety of hand-crafted behavioral and structural metrics, we find that these developmental stages can be interpreted. The progression of the language model is characterized by the following sequence of stages: (LM1) Learning bigrams, (LM2) Learning various n-grams and incorporating positional information, (LM3) Beginning to form the first part of the induction circuit, (LM4) Finishing the formation of the induction circuit, (LM5) Final convergence. The evolution of the linear regression model unfolds in a similar manner: (LR1) Learns to use the task prior (equivalent to learning bigrams), (LR2) Develops the ability to do in-context linear regression, (LR3-4) Two significant structural developments in the embedding and layer norms, (LR5) Final convergence. Developmental interpretability is viable. The existence and interpretability of developmental stages in larger, more realistic transformers makes us substantially more confident in developmental interpretability as a viable research agenda. We expect that future generations of these techniques will go beyond detecting when circuits start/stop forming to detecting where they form, how they connect, and what they implement. On Stagewise Development Complex structures can arise from simple algorithms. When iterated across space and time, simple algorithms can produce structures of great complexity. One example is evolution by natural selection. Another is optimization of artificial neural networks by gradient descent. In both cases, the underlying logic - that simple algorithms operating at scale can produce highly complex structures - is so counterintuitive that it often elicits disbelief. A second counterintui...