Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why does generalization work?, published by Martín Soto on February 21, 2024 on LessWrong.
Just an interesting philosophical argument
I. Physics
Why can an ML model learn from part of a distribution or data set, and generalize to the rest of it? Why can I learn some useful heuristics or principles in ...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why does generalization work?, published by Martín Soto on February 21, 2024 on LessWrong.
Just an interesting philosophical argument
I. Physics
Why can an ML model learn from part of a distribution or data set, and generalize to the rest of it? Why can I learn some useful heuristics or principles in a particular context, and later apply them in other areas of my life?
The answer is obvious: because there are some underlying regularities between the parts I train on and the ones I test on. In the ML example, generalization won't work when approximating a function which is a completely random jumble of points.
Also, quantitatively, the more regular the function is, the better generalization will work. For example, polynomials of lower degree require less data points to pin down. Same goes for periodic functions. Also, a function with lower Lipschitz constant will allow for better bounding of the values in un-observed points.
So it must be that the variables we track (the ones we try to predict or control, either with data science or our actions), are given by disproportionately regular functions (relative to random ones). In this paper by Tegmark, the authors argue exactly that most macroscopic variables of interest have Hamiltonians of low polynomial degree.
And that this happens because of some underlying principles of low-level physics, like locality, symmetry, or the hierarchical composition of physical processes.
But then, why is low-level physics like that?
II. Anthropics
If our low-level physics wasn't conducive to creating macroscopic patterns and regularities, then complex systems capable of asking that question (like ourselves) wouldn't exist. Indeed, we ourselves are nothing more than a specific kind of macroscopic pattern. So anthropics explains why we should expect such patterns to exist, similarly to how it explains why the gravitational constant, or the ratio between sound and light speed, are the right ones to allow for complex life.
III. Dust
But there's yet one more step.
Let's try to imagine a universe which is not conducive to such macroscopic patterns. Say you show me its generating code (its laws of physics), and run it. To me, it looks like a completely random mess. I am not able to differentiate any structural regularities that could be akin to the law of ideal gases, or the construction of molecules or cells.
While on the contrary, if you showed me the running code of this reality, I'd be able (certainly after many efforts) to differentiate these conserved quantities and recurring structures.
What are, exactly, these macroscopic variables I'm able to track, like "pressure in a room", or "chemical energy in a cell"? Intuitively, they are a way to classify all possible physical arrangements into more coarse-grained buckets. In the language of statistical physics, we'd say they are a way to classify all possible microstates into a macrostate partition.
For example, every possible numerical value for pressure is a different macrostate (a different bucket), that could be instantiated by many different microstates (exact positions of particles).
But there's a circularity problem. When we say a certain macroscopic variable (like pressure) is easily derived from others (like temperature), or that it is a useful way to track another variable we care about (like "whether a human can survive in this room"), we're being circular.
Given I already have access to a certain macrostate partition (temperature), or that I already care about tracking a certain macrostate partition (aliveness of human), then I can say it is natural or privileged to track another partition (pressure). But I cannot motivate the importance of pressure as a macroscopic variable from just looking at the microstates.
Thus, "which parts of physics I consider interesting macroscopic varia...
View more