The Nonlinear Library: Alignment Forum Podcast - AF - ProLU: A Pareto Improvement for Sparse Autoencoders by Glen M. Taggart

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum

Education

AF - ProLU: A Pareto Improvement for Sparse Autoencoders by Glen M. Taggart

2024-04-23

iOS

Android Share

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ProLU:...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ProLU: A Pareto Improvement for Sparse Autoencoders, published by Glen M. Taggart on April 23, 2024 on The AI Alignment Forum. Abstract This paper presents ProLU, an alternative to ReLU for the activation function in sparse autoencoders that produces a pareto improvement over the standard sparse autoencoder architectures and sparse autoencoders trained with Sqrt(L1) penalty. Introduction SAE Context and Terminology Learnable parameters of a sparse autoencoder: Wenc : encoder weights Wdec : decoder weights benc : encoder bias bdec : decoder bias Training Notation: Encoder/Decoder Let encode(x)=ReLU((xbdec)Wenc+benc)decode(a)=aWdec+bdec so that the full computation done by an SAE can be expressed as SAE(x)=decode(encode(x)) An SAE is trained with gradient descent on where λ is the sparsity penalty coefficient (often "L1 coefficient") and P is the sparsity penalty function, used to encourage sparsity. P is commonly the L1 norm ||a||1 but recently l12 has been shown to produce a Pareto improvement on the L0 and CE metrics. Sqrt(L1) SAEs There has been other work producing pareto improvements to SAEs by taking P(a)=||a||1/21/2 as the penalty function. We will use this as a further baseline to compare against when assessing our models. Motivation: Inconsistent Scaling in Sparse Autoencoders Due to the affine translation, sparse autoencoder features with nonzero encoder biases only perfectly reconstruct feature magnitudes at a single point. This poses difficulties if activation magnitudes for a fixed feature tend to vary over a wide range. This potential problem motivates the concept of scale consistency: A scale consistent response curve The bias maintains its role in noise suppression, but no longer translates activation magnitudes when the feature is active. The lack of gradients for the encoder bias term poses a challenge for learning with gradient descent. This paper will formalize an activation function which gives SAEs this scale-consistent response curve, and motivate and propose two plausible synthetic gradients, and compare scale-consistent models trained with the two synthetic gradients to standard SAEs and SAEs trained with Sqrt(L1) penalty. Scale Consistency Desiderata Notation: Centered Submodule The use of the decoder bias can be viewed as performing centering on the inputs to a centered SAE then reversing the centering on the outputs: SAE(x)=SAEcent(xbdec)+bdec SAEcent(x)=ReLU(xWenc+benc)Wdec Notation: Specified Feature Let Wi denote the weights and bienc the encoder bias for the i-th feature. Then, let SAEi(x)=SAEicent(xbdec)+bdec where SAEicent(x)=ReLU(xWienc+bienc)Widec Conditional Linearity Noise Suppresion Threshold Methods Proportional ReLU (ProLU) We define the Proportional ReLU (ProLU) as: Backprop with ProLU: To use ProLU in SGD-optimized models, we first address the lack of gradients wrt. the b term. ReLU gradients: For comparison and later use, we will first consider ReLU: partial derivatives are well defined for ReLU at all points other than xi=0: Gradients of ProLU: Partials of ProLU wrt. m are similarly well defined: However, they are not well defined wrt. b, so we must synthesize these. Notation: Synthetic Gradients Let fx denote the synthetic partial derivative of f wrt. x, and f the synthetic gradient of f, used for backpropagation as a stand-in for the gradient. Different synthetic gradient types We train two classes of ProLU with different synthetic gradients. These are distinguished by their subscript: ProLUReLU ProLUSTE They are identical in output, but have different synthetic gradients. I.e. ReLU-Like Gradients: ProLUReLU The first synthetic gradient is very similar to the gradient for ReLU. We retain the gradient wrt. m, and define the synthetic gradient wrt. b as follows: Thresh STE Derived Gradients: ProLUSTE The second class of Pro...