Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: what does davidad want from "boundaries"?, published by Chipmonk on February 6, 2024 on The AI Alignment Forum.
Chipmonk
As the Conceptual Boundaries Workshop (website) is coming up, and now that we're also planning Mathematical Boundaries Workshop in April, I want to get more clarity on what exactly it is that you...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: what does davidad want from "boundaries"?, published by Chipmonk on February 6, 2024 on The AI Alignment Forum.
Chipmonk
As the Conceptual Boundaries Workshop (website) is coming up, and now that we're also planning Mathematical Boundaries Workshop in April, I want to get more clarity on what exactly it is that you want out of "boundaries"/membranes.
So I just want to check: Is your goal with boundaries just to formalize a moral thing?
I'll summarize what I mean by that:
Claim 1: By "boundaries", you mean "the boundaries around moral patients - namely humans".
Claim 1b: And to some degree also the boundaries around plants and animals. Also maybe nations, institutions, and other things.
Claim 2: If we can just
(i) locate the important boundaries in the world, and then
(ii) somehow protect them,
Then this gets at a lot (but not all!) of what the "safety" in "AI safety" should be.
Claim 3: We might actually be able to do that.
e.g.: Markov blankets are a natural abstraction for (2.i).
Claim 4: Protecting boundaries won't be sufficient for all of "safety" and there are probably also other (non-boundaries) specifications/actions that will also be necessary.
For example, we would probably also need to separately specify some things that aren't obviously contained by the boundaries we mean, e.g.: "clean water", "clean air", and a tractably small set of other desiderata.
Here are my questions for you:
Q1: Do you agree with each of the claims above?
Q2: Is your goal with boundaries just to formalize the moral/safety thing, or is there anything else you want from boundaries?
Past context that's also relevant for readers:
This new post I wrote about how preserving the boundaries around agents seems to be a necessary condition for their safety.
Quotes you've made about boundaries that I've compiled here.
This old post I wrote about boundaries as MVP morality which you endorsed.
Q3: It seems that Garrabrant, Critch, and maybe others want different things from you and I'm wondering if you have thoughts about that.
Garrabrant: From talking to him I know that he's thinking about boundaries too but more about boundaries in the world as instruments to preserve causal locality and predictability and evolution etc.. But this is quite different than talking about specifically the boundaries around agents.
Critch: I haven't spoken to him yet, but I think you once told me that Critch seems to be thinking about boundaries more in terms of ~"just find the 'boundary protocol' and follow it and all cooperation with other agents will be safe". Is this right? If so, this seems closer to what you want, but still kinda different.
TJ: I think TJ has some other ideas that I am currently unable to summarize.
davidad
Claim 1+1b: yes, to first order. [To second order, I expect that the general concept of things with "boundaries" will also be useful for multi-level world-modelling in general, e.g.
coarse-graining fluid flow by modelling it in terms of cells that have boundaries on which there is a net flow, and that it might be a good idea to "bake in" something like a concept of boundaries to an AI system's meta-ontology, so that it has more of a tendency to have moral patients among the entities in its object-level ontology.
But my mainline intention is for the object-level ontology to be created with humans in the loop, and the identification of entities with boundaries could perhaps be just as easily a layer of interpretation on top of an ontology with a more neutral meta-ontology of causation. Thinking through both routes more is at the frontier of what I consider "conceptual "boundaries" research".]
davidad
Claim 2: agreed.
Claim 3: agreed.
Claim 4: agreed.
davidad
Q2: yes, my ultimate goal with "boundaries" is just to formalise injunctions against doing harm, disrespecting autonomy, or (at the mo...
View more