The Nonlinear Library: EA Forum
Education
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Clarifying two uses of "alignment", published by Matthew Barnett on March 10, 2024 on The Effective Altruism Forum.Paul Christiano once clarified AI alignment as follows:When I say an AI A is aligned with an operator H, I mean:A is trying to do what H wants it to do.This definition is clear enough for many purposes, but it leads to confusion when one wants to make a point about two different types of alignment:A is trying to do what H wants it to do because A is trading or cooperating with H on a mutually beneficial outcome for the both of them. For example, H could hire A to perform a task, and offer a wage as compensation.A is trying to do what H wants it to do because A has the same values as H - i.e. its "utility function" overlaps with H's utility function - and thus A intrinsically wants to pursue what H wants it to do.These cases are important to distinguish because they have dramatically different consequences for the difficulty and scope of alignment.To solve alignment in the sense (1), A and H don't necessarily need to share the same values with each other in any strong sense. Instead, the essential prerequisite seems to be for A and H to operate in an environment in which it's mutually beneficial to them to enter contracts, trade, or cooperate in some respect.For example, one can imagine a human hiring a paperclip maximizer AI to perform work, paying them a wage. In return the paperclip maximizer could use their wages to buy more paperclips. In this example, the AI performed their duties satisfactorily, without any major negative side effects resulting from their differing values, and both parties were made better off as a result.By contrast, alignment in the sense of (2) seems far more challenging to solve. In the most challenging case, this form of alignment would require solving extremal goodhart, in the sense that A's utility function would need to be almost perfectly matched with H's utility function. Here, the idea is that even slight differences in values yield very large differences when subject to extreme optimization pressure.Because it is presumably easy to make slight mistakes when engineering AI systems, by assumption, these mistakes could translate into catastrophic losses of value.Effect on alignment difficultyMy impression is that people's opinions about AI alignment difficulty often comes down to differences in how much they think we need to solve the second problem relative to the first problem, in order to get AI systems that generate net-positive value for humans.If you're inclined towards thinking that trade and compromise is either impossible or inefficient between agents at greatly different levels of intelligence, then you might think that we need to solve the second problem with AI, since "trading with the AIs" won't be an option. My understanding is that this is Eliezer Yudkowsky's view, and the view of most others who are relatively doomy about AI.In this frame, a common thought is that AIs would have no need to trade with humans, as humans would be like ants to them.On the other hand, you could be inclined - as I am - towards thinking that agents at greatly different levels of intelligence can still find positive sum compromises when they are socially integrated with each other, operating under a system of law, and capable of making mutual agreements. In this case, you might be a lot more optimistic about the prospects of alignment.To sketch one plausible scenario here, if AIs can own property and earn income by selling their labor on an open market, then they can simply work a job and use their income to purchase whatever it is they want, without any need to violently "take over the world" to satisfy their goals. At the same time, humans could retain power in this system through capital ownership and other gran...
EA - The Intergovernmental Panel On Global Catastrophic Risks (IPGCR) by DannyBressler
EA - Diet that is vegan, frugal, time-efficient, ~evidence-based and ascetic: An example of a non-Huel EA diet? by Ulrik Horn
EA - Introducing the Animal Advocacy Forum - a space for those involved or interested in Animal Welfare & related topics by David van Beveren
EA - A review of how nucleic acid (or DNA) synthesis is currently regulated across the world, and some ideas about reform (summary of and link to Law dissertation) by Isaac Heron
EA - Three tricky biases: human bias, existence bias, and happy bias by Steven Rouk
EA - AI safety needs to scale, and here's how you can do it by Esben Kran
EA - A simple and generic framework for impact estimation and attribution by Christoph Hartmann
EA - Types of subjective welfare by MichaelStJules
EA - Managing risks while trying to do good by Wei Dai
EA - Increasingly vague interpersonal welfare comparisons by MichaelStJules
EA - Fish Welfare Initiative's 2023 in Review by haven
EA - Announcing Niel Bowerman as the next CEO of 80,000 Hours by 80000 Hours
EA - My model of how different AI risks fit together by Stephen Clare
EA - Who wants to be hired? (Feb-May 2024) by tobytrem
EA - Brian Tomasik on charity by Vasco Grilo
EA - Project for Awesome 2024: Make a short video for an EA charity! by EA ProjectForAwesome
EA - Things newer (university) group organisers should know about by Sam Robinson
EA - Lower-suffering egg brands available in the SF Bay Area by mayleaf
EA - Deciding What Project/Org to Start: A Guide to Prioritization Research by Alexandra Bos
EA - EV investigation into Owen and Community Health by EV US Board
Create your
podcast in
minutes
It is Free
Positive Thinking Mind
In the Great Khan’s Tent
Visualize Meditations
The Jordan B. Peterson Podcast
The Mel Robbins Podcast