Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

The Nonlinear Library: LessWrong

LW - Why does generalization work? by Martín Soto

2024-02-21

Download

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why does generalization work?, published by Martín Soto on February 21, 2024 on LessWrong. Just an interesting philosophical argument I. Physics Why can an ML model learn from part of a distribution or data set, and generalize to the rest of it? Why can I learn some useful heuristics or principles in ...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why does generalization work?, published by Martín Soto on February 21, 2024 on LessWrong.
Just an interesting philosophical argument
I. Physics
Why can an ML model learn from part of a distribution or data set, and generalize to the rest of it? Why can I learn some useful heuristics or principles in a particular context, and later apply them in other areas of my life?
The answer is obvious: because there are some underlying regularities between the parts I train on and the ones I test on. In the ML example, generalization won't work when approximating a function which is a completely random jumble of points.
Also, quantitatively, the more regular the function is, the better generalization will work. For example, polynomials of lower degree require less data points to pin down. Same goes for periodic functions. Also, a function with lower Lipschitz constant will allow for better bounding of the values in un-observed points.
So it must be that the variables we track (the ones we try to predict or control, either with data science or our actions), are given by disproportionately regular functions (relative to random ones). In this paper by Tegmark, the authors argue exactly that most macroscopic variables of interest have Hamiltonians of low polynomial degree.
And that this happens because of some underlying principles of low-level physics, like locality, symmetry, or the hierarchical composition of physical processes.
But then, why is low-level physics like that?
II. Anthropics
If our low-level physics wasn't conducive to creating macroscopic patterns and regularities, then complex systems capable of asking that question (like ourselves) wouldn't exist. Indeed, we ourselves are nothing more than a specific kind of macroscopic pattern. So anthropics explains why we should expect such patterns to exist, similarly to how it explains why the gravitational constant, or the ratio between sound and light speed, are the right ones to allow for complex life.
III. Dust
But there's yet one more step.
Let's try to imagine a universe which is not conducive to such macroscopic patterns. Say you show me its generating code (its laws of physics), and run it. To me, it looks like a completely random mess. I am not able to differentiate any structural regularities that could be akin to the law of ideal gases, or the construction of molecules or cells.
While on the contrary, if you showed me the running code of this reality, I'd be able (certainly after many efforts) to differentiate these conserved quantities and recurring structures.
What are, exactly, these macroscopic variables I'm able to track, like "pressure in a room", or "chemical energy in a cell"? Intuitively, they are a way to classify all possible physical arrangements into more coarse-grained buckets. In the language of statistical physics, we'd say they are a way to classify all possible microstates into a macrostate partition.
For example, every possible numerical value for pressure is a different macrostate (a different bucket), that could be instantiated by many different microstates (exact positions of particles).
But there's a circularity problem. When we say a certain macroscopic variable (like pressure) is easily derived from others (like temperature), or that it is a useful way to track another variable we care about (like "whether a human can survive in this room"), we're being circular.
Given I already have access to a certain macrostate partition (temperature), or that I already care about tracking a certain macrostate partition (aliveness of human), then I can say it is natural or privileged to track another partition (pressure). But I cannot motivate the importance of pressure as a macroscopic variable from just looking at the microstates.
Thus, "which parts of physics I consider interesting macroscopic varia...

View more

Comments (3)

More Episodes

You may also like

Adulting with Autism

The Pacific War - week by week

German Stories - Learn German with Stories | Deutsch lernen mit Geschichten

The Mel Robbins Podcast

The Jordan B. Peterson Podcast

Halacha Headlines

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

‌BPLUS بی‌پلاس پادکست فارسی خلاصه کتاب

رادیو راه با مجتبی شکوری

All Ears English Podcast

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com