Download - Recent Progress in the Theory of Neural Networks by interstice

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Recent Progress in the Theory of Neural Networks by interstice

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Recent Progress in the Theory of Neural Networks, published by interstice on the AI Alignment Forum. It's common wisdom that neural networks are basically "matrix multiplications that nobody understands" , impenetrable to theoretical analysis, which have achieved great results largely through trial-and-error. While this may have been true in the past, recently there has been significant progress towards developing a theoretical understanding of neural networks. Most notably, we have obtained an arguably complete understanding of network initialization and training dynamics in a certain infinite-width limit. There has also been some progress towards understanding their generalization behavior. In this post I will review some of this recent progress and discuss the potential relevance to AI alignment. Infinite Width Nets: Initialization The most exciting recent developments in the theory of neural networks have focused the infinite-width limit. We consider neural networks where the number of neurons in all hidden layers are increased to infinity. Typically we consider networks with a Gaussian-initialized weights, and scale the variance at initialization as 1 √ H , where H is the number of hidden units in the preceding layer(this is needed to avoid inputs blowing up, and is also the initialization scheme usually used in real networks). In this limit, we have obtained an essentially complete understanding of both behavior at initialization and training dynamics[1]. (Those with limited interest/knowledge of math may wish to "Significance and Limitations" below). We've actually had a pretty good understanding of the behavior of infinite-width neural networks at initialization for a while, since the work of Radford Neal(1994). He proved that in this limit, fully-connected neural networks with Gaussian-distributed weights and biases limit to what are known as Gaussian processes. Gaussian processes can be thought of the generalization of Gaussian distributions from finite-dimensional spaces to spaces of functions. Neal's paper provides a very clear derivation of this behavior, but I'll explain it briefly here. A neural network with m real-valued inputs and 1 real valued outputs defines a function from R m to R . Thus, a distribution over the weights and biases of such a neural network -- such as the standard Gaussian initialization -- implicitly defines a distribution over functions on R m . Neal's paper shows that, for fully-connected neural networks, this distribution limits to a Gaussian process. What is a Gaussian process? It's a distribution over functions f with the property that, for any finite collection of points X 1 X N , the values f X 1 f X N have a joint distribution which is a multivariate Gaussian. Any Gaussian process is uniquely defined by its mean and covariance functions, μ x and C x x ′ . For points X 1 X N , the distribution of f X 1 f X N will have mean μ X 1 μ X N with covariance matrix C i j C X i X j The argument that fully-connected neural networks limit to Gaussian processes in the infinite-width limit is pretty simple. Consider a three-layer neural network, with an activation function σ in the second layer and a single linear output unit. This network can be defined by the equation y ∑ V k σ ∑ W k j X j . At initialization, V and W are filled with independent Gaussians, with variance of V scaled as the inverse square-root of the number of hidden-units. Each hidden unit h k will has a value for each of the inputs X i h k X i σ ∑ W k j X j i . Since W is random, for each k h k → X is an independent random vector(where we write → X for X 1 X N ). All of these random vectors follow the same distribution, and the output → y f → X of the network is simply the sum of these identical distributions multiplied by the univariate Gaussians V k . By the multidi...