Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Meaning & Agency, published by Abram Demski on December 19, 2023 on The AI Alignment Forum.
The goal of this post is to clarify a few concepts relating to AI Alignment under a common framework. The main concepts to be clarified:
Optimization. Specifically, this will be a type of Vingean agency. It will split...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Meaning & Agency, published by Abram Demski on December 19, 2023 on The AI Alignment Forum.
The goal of this post is to clarify a few concepts relating to AI Alignment under a common framework. The main concepts to be clarified:
Optimization. Specifically, this will be a type of Vingean agency. It will split into Selection vs Control variants.
Reference (the relationship which holds between map and territory; aka semantics, aka meaning). Specifically, this will be a teleosemantic theory.
The main new concepts employed will be endorsement and legitimacy.
TLDR:
Endorsement of a process is when you would take its conclusions for your own, if you knew them.
Legitimacy relates to endorsement in the same way that good relates to utility. (IE utility/endorsement are generic mathematical theories of agency; good/legitimate refer to the specific thing we care about.)
We perceive agency when something is better at doing something than us; we endorse some aspect of its reasoning or activity. (Endorse as a way of achieving its goals, if not necessarily our own.)
We perceive meaning (semantics/reference) in cases where the goal we endorse a conclusion with respect to is accuracy (we can say that it has been optimized to accurately represent something).
This write-up owes a large debt to many conversations with Sahil, although the views expressed here are my own.
Broader Context
The basic idea is to investigate agency as a natural phenomenon, and have some deep insights emerge from that, relevant to the analysis of AI Risk. Representation theorems -- also sometimes called coherence theorems or selection theorems depending on what you want to emphasize -- start from a very normative place, typically assuming preferences as a basic starting object.
Perhaps these preferences are given a veneer of behaviorist justification: they supposedly represent what the agent would choose if given the option. But actually giving agents the implied options would typically be very unnatural, taking the agent out of its environment.
There are many ways to drive toward more naturalistic representation theorems. The work presented here takes a particular approach: cognitive reduction. I want to model what it means for one agent to think of something as being another agent. Daniel Dennett called this the intentional stance.
The goal of the present essay is to sketch a formal picture of the intentional stance. This is not yet a representation theorem -- it does not establish a type signature for agency based on the ideas. However, for me at least, it seems like an important piece of deconfusion. It paints a picture of agents who understand each other, as thinking things. I hope that it can contribute to a useful representation theorem and a broader picture of agency.
Meaning
In Signalling & Simulacra, I argued that the signal-theoretic[1] analysis of meaning (which is the most common Bayesian analysis of communication) fails to adequately define lying, and fails to offer any distinction between denotation and connotation or literal content vs conversational implicature. A Human's Guide to Words gives a related view:[2]
No dictionary, no encyclopedia, has ever listed all the things that humans have in common. We have red blood, five fingers on each of two hands, bony skulls, 23 pairs of chromosomes - but the same might be said of other animal species.
[...]
A dictionary is best thought of, not as a book of Aristotelian class definitions, but a book of hints for matching verbal labels to similarity clusters, or matching labels to properties that are useful in distinguishing similarity clusters.
Eliezer Yudkowski, Similarity Clusters
While I agree that words do not have clear-cut definitions (except, perhaps, in a few cases such as mathematics), I think it would be a mistake to conclude that there is nothi...
View more