Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

The Nonlinear Library: LessWrong

LW - (Approximately) Deterministic Natural Latents by johnswentworth

2024-07-21

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: (Approximately) Deterministic Natural Latents, published by johnswentworth on July 21, 2024 on LessWrong. Background: Natural Latents: The Math, Natural Latents: The Concepts, Why Care About Natural Latents?, the prototypical semantics use-case. This post does not assume that you've read all of those, or even any of...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: (Approximately) Deterministic Natural Latents, published by johnswentworth on July 21, 2024 on LessWrong.
Background: Natural Latents: The Math, Natural Latents: The Concepts, Why Care About Natural Latents?, the prototypical semantics use-case. This post does not assume that you've read all of those, or even any of them.
Suppose I roll a biased die 1000 times, and then roll the same biased die another 1000 times. Then...
Mediation: The first 1000 rolls are approximately independent of the second 1000 given the bias (to reasonable precision).
Redundancy: I can estimate the die's bias (to reasonable precision) with high confidence from either the first or second 1000 rolls.
The die's bias is therefore a natural latent, which means it has various nice properties.
Minimality: The bias is the smallest summary of all the information about the first 1000 rolls relevant to the second 1000 (and vice-versa).
Maximality: The bias is the largest piece of information which can be calculated from the first 1000 rolls and also can separately be calculated from the second 1000 rolls.
Any other variable which satisfies the above properties must tell us (approximately) the same information about the die rolls as the bias.
Furthermore, the bias is a(n approximate) deterministic natural latent: the die's bias (to reasonable precision) is approximately determined by[1] the first 1000 die rolls, and also approximately determined by the second 1000 die rolls. That implies one more nice property:
Uniqueness: The bias is the unique-up-to(-approximate)-isomorphism latent which has the above properties, making it a natural Schelling point for communication between agents.
We've proven all that before, mostly in
Natural Latents: The Math (including
the addendum added six months after the rest of the post). But it turns out that the math is a lot shorter and simpler, and easily yields better bounds, if we're willing to assume (approximate) determinism up-front. That does lose us some theoretical tools (notably the
resampling construction), but it gives a cleaner foundation for our expected typical use cases (like e.g. semantics). The goal of this post is to walk through that math.
Background Tool: Determinism in Diagrams
We're going to use diagrammatic proofs, specifically using Bayes nets. But it's non-obvious how to express (approximate) determinism using Bayes nets, or what rules diagrams follow when determinism is involved, so we'll walk through that first.
This diagram says that Y is (approximately) determined by X:
Intuitively, the literal interpretation of the diagram is: X mediates between Y and Y, i.e. Y itself tells me nothing more about Y once I already know X. That only makes sense if X tells me everything there is to know about Y, i.e. Y is determined by X.
In the approximate case, we express the approximation error of the diagram as a KL-divergence, same as usual:
ϵDKL(P[X=x,Y=y,Y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x])
If you get confused later about what it means to have two copies of the same variable in a diagram, go back to that line; that's the definition of the approximation error of the diagram. (One way to view that definition: there's actually two variables Y and Y', but P says that Y and Y' always have the same value.)
That approximation error simplifies:
DKL(P[X=x,Y=y,Y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x])
=DKL(P[X=x,Y=y]I[y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x])
=x,y,y'P[X=x,Y=y]I[y=y'](log(P[X=x,Y=y]I[y=y'])log(P[X=x]P[Y=y|X=x]P[Y=y'|X=x]))
=x,yP[X=x,Y=y](log(P[X=x,Y=y])log(P[X=x]P[Y=y|X=x]P[Y=y|X=x]))
=x,yP[X=x,Y=y]log(P[Y=y|X=x])
=H(Y|X)
So the diagram says Y is determined by X, and the approximation error of the diagram is the entropy H of Y given X - i.e. the number of bits required on average to specify Y once one already knows X. Very intuitive!
The Dangly Bit Lemma
Intuitiv...

View more

Comments (3)

More Episodes

You may also like

The Saad Truth with Dr. Saad

The Pacific War - week by week

German Stories - Learn German with Stories | Deutsch lernen mit Geschichten

The Mel Robbins Podcast

The Jordan B. Peterson Podcast

Halacha Headlines

All Ears English Podcast

The Jordan Harbinger Show

رادیو راه با مجتبی شکوری

جافکری | Jafekri

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com