Download - LW - My favorite safety papers and sequences - Living post (vers.: May 15, 8 recs) by the gears to ascension

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - My favorite safety papers and sequences - Living post (vers.: May 15, 8 recs) by the gears to ascension

2023-05-16

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My favorite safety papers & sequences - Living post (vers.: May 15, 8 recs), published by the gears to ascension on May 15, 2023 on LessWrong. Local mad librarian here. I've been urged to hurry up and make a reading list of favorite papers and thoughts on them; this is the first version, and I'll improve it over time, with occasional post refreshes occasionally, especially when the order changes or I improve explanation. This will be most useful for those experienced with reading papers, but regardless, if you're new to the field, I recommend reading things for which you do not have prereqs in as much depth as your lack of context allows, as it will tune your appetite for the prereqs. I see all of these as interesting ideas which I think every alignment researcher of any subfield should read but not as concluded answers to any of the questions we're trying to ask and potentially not even the right kind of framings. If you're interested in doing a call where I watch you read through and comment on any of these, I'd be very eager, assuming timezones work out - leave a comment or send a dm! Especially if you're not sure which paper is worth the effort for you. The initial list Stuff I've read and recommend: Discovering agents: agency definition I suspect is close to ideal detailed reasoning below Agents and devices: agency definition I find useful for rough thinking detailed reasoning below Logical induction for software engineers: induction definition I see as critical hunchbuilding and likely also a critical math tool todo: write detailed reasoning and pitch for reading it Unifying bargaining notions: working through the math of different game theory perspectives defined here has been insightful todo: write reasoning and pitch Finite factored sets in pictures: definition of causal inference based on direction of entropy instead of anything to do with time todo: write reasoning and pitch, especially for why to start with this one. see also the full sequence Other agent foundations related work I have high on my todo list: Geometric Rationality sequence: I haven't actually finished anything besides the first post; including based on strong hunch Boundaries sequence: has been high on my todo list for a while but keeps getting bumped down Andrew Critch's Lobposting - all of his posts that mention Lob. There are a bunch more things i'm planning to add but need to move on to other tasks right now, so I'm posting it as is with far less exposition on why I like each one than I intend to have eventually. Hope it helps someone! Discovering Agents (Kenton et al) Arxiv pdf Lesswrong / Alignmentforum summary and comments My absolute favorite paper in the safety literature right now, worth a level 3 read, discovering agents attempts to constrain the word "agency" by describing how to recognize it in a learning system that interacts with an environment. It has the limitation that it only works if you know what part of the system is the learner that may be agentic and you merely want to discover what it seeks. Also, the algorithm it defines is computable but thoroughly intractable for agents of any significant size, mostly limiting it to a formal definition, though substituting fast approximate causal inference may be fairly practical. The comments on both lesswrong and alignmentforum are essential to understand what it actually gives you, which is not quite as general as one might hope from the abstract. As with most of the incredible papers from the causal incentives group, It's not the easiest to understand unless you're quite experienced reading math papers, and it took me several passes of reading it with people to be pretty sure I got it; it was quite worth the significant effort. I had to skim the related paper Reasoning about Causality in Games to refresh my understanding of ...