Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards a Less Bullshit Model of Semantics, published by johnswentworth on June 17, 2024 on LessWrong.
Or: Towards Bayesian Natural Language Semantics In Terms Of Interoperable Mental Content
Or: Towards a Theory of Interoperable Semantics
You know how natural language "semantics" as studied in e.g....
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards a Less Bullshit Model of Semantics, published by johnswentworth on June 17, 2024 on LessWrong.
Or: Towards Bayesian Natural Language Semantics In Terms Of Interoperable Mental Content
Or: Towards a Theory of Interoperable Semantics
You know how natural language "semantics" as studied in e.g. linguistics is kinda bullshit? Like, there's some fine math there, it just ignores most of the thing which people intuitively mean by "semantics".
When I think about what natural language "semantics" means, intuitively, the core picture in my head is:
I hear/read some words, and my brain translates those words into some kind of internal mental content.
The mental content in my head somehow "matches" the mental content typically evoked in other peoples' heads by the same words, thereby allowing us to communicate at all; the mental content is "interoperable" in some sense.
That interoperable mental content is "the semantics of" the words. That's the stuff we're going to try to model.
The main goal of this post is to convey what it might look like to "model semantics for real", mathematically, within a Bayesian framework.
But Why Though?
There's lots of reasons to want a real model of semantics, but here's the reason we expect readers here to find most compelling:
The central challenge of ML interpretability is to faithfully and robustly translate the internal concepts of neural nets into human concepts (or vice versa). But today, we don't have a precise understanding of what "human concepts" are. Semantics gives us an angle on that question: it's centrally about what kind of mental content (i.e. concepts) can be interoperable (i.e. translatable) across minds.
Later in this post, we give a toy model for the semantics of nouns and verbs of rigid body objects. If that model were basically correct, it would give us a damn strong starting point on what to look for inside nets if we want to check whether they're using the concept of a teacup or free-fall or free-falling teacups.
This potentially gets us much of the way to calculating quantitative bounds on how well the net's internal concepts match humans', under conceptually simple (though substantive) mathematical assumptions.
Then compare that to today: Today, when working on interpretability, we're throwing darts in the dark, don't really understand what we're aiming for, and it's not clear when the darts hit something or what, exactly, they've hit. We can do better.
Overview
In the first section, we will establish the two central challenges of the problem we call Interoperable Semantics. The first is to characterize the stuff within a Bayesian world model (i.e. mental content) to which natural-language statements resolve; that's the "semantics" part of the problem.
The second aim is to characterize when, how, and to what extent two separate models can come to agree on the mental content to which natural language resolves, despite their respective mental content living in two different minds; that's the "interoperability" part of the problem.
After establishing the goals of Interoperable Semantics, we give a first toy model of interoperable semantics based on the "
words point to clusters in thingspace" mental model. As a concrete example, we quantify the model's approximation errors under an off-the-shelf gaussian clustering algorithm on a small-but-real dataset. This example emphasizes the sort of theorems we want as part of the Interoperable Semantics project, and the sorts of tools which might be used to prove those theorems. However, the example is very toy.
Our second toy model sketch illustrates how to construct higher level Interoperable Semantics models using the same tools from the first model. This one is marginally less toy; it gives a simple semantic model for rigid body nouns and their verbs. However, this secon...
View more