Download - LW - The Lightcone Theorem: A Better Foundation For Natural Abstraction? by johnswentworth

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - The Lightcone Theorem: A Better Foundation For Natural Abstraction? by johnswentworth

2023-05-15

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Lightcone Theorem: A Better Foundation For Natural Abstraction?, published by johnswentworth on May 15, 2023 on LessWrong. Credit to David Lorell for serving as an active sounding board as the ideas in this post were developed. For about a year and a half now, my main foundation for natural abstraction math has been The Telephone Theorem: long-range interactions in a probabilistic graphical model (in the long-range limit) are mediated by quantities which are conserved (in the long-range limit). From there, the next big conceptual step is to argue that the quantities conserved in the long-range limit are also conserved by resampling, and therefore the conserved quantities of an MCMC sampling process on the model mediate all long-range interactions in the model. The most immediate shortcoming of the Telephone Theorem and the resampling argument is that they talk about behavior in infinite limits. To use them, either we need to have an infinitely large graphical model, or we need to take an approximation. For practical purposes, approximation is clearly the way to go, but just directly adding epsilons and deltas to the arguments gives relatively weak results. This post presents a different path. The core result is the Lightcone Theorem: Start with a probabilistic graphical model on the variables X1,.,Xn. The graph defines adjacency, distance, etc between variables. For directed graphical models (i.e. Bayes nets), spouses (as well as parents and children) count as adjacent. We can model those variables as the output of a Gibbs sampler (that’s the MCMC process) on the graphical model. Call the initial condition of the sampler X0=(X01,.,X0n). The distribution of X0 must be the same as the distribution of X (i.e. the sampler is initialized “in equilibrium”). We can model the sampler as having run for any number of steps to generate the variables; call the number of steps T. At each step, the process resamples some set of nonadjacent variables conditional on their neighbors. The Lightcone Theorem says: conditional on X0, any sets of variables in X which are a distance of at least 2T apart in the graphical model are independent. Yes, exactly independent, no approximation. In short: the initial condition of the resampling process provides a latent, conditional on which we have exact independence at a distance. This was. rather surprising to me. If you’d floated the Lightcone Theorem as a conjecture a year ago, I’d have said it would probably work as an approximation for large T, but no way it would work exactly for finite T. Yet here we are. The Proof, In Pictures The proof is best presented visually. High-level outline: Perform a do() operation on the Gibbs sampler, so that it never resamples the variables a distance of T from XR. In the do()-operated process, X0 mediates between XTR and XTD(R,≥2T), where D(R,≥2T) indicates indices of variables a distance of at least 2T from XR. Since X0, XTR and XTD(R,≥2T) are all outside the lightcone of the do()-operation, they have the same joint distribution under the non-do()-operated sampler as under the do()-operated sampler. Therefore X0 mediates between XTR and XTD(R,≥2T) under the original sampler. We start with the graphical model: Within that graphical model, we’ll pick some tuple of variables XR (“R” for “region”). I’ll use the notation XD(R,t) for the variables a distance t away from R, XD(R,>t) for variables a distance greater than t away from R, XD(R,t) (everything more than distance t from XR). Next, we’ll draw the Gibbs resampler as a graphical model. We’ll draw the full state Xt at...