Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Request for proposals for projects in AI alignment that work with deep learning systems, published by abergal, Nick_Beckstead on the AI Alignment Forum.
As part of our work on reducing potential risks from advanced artificial intelligence, Open Philanthropy is seeking proposals for projects working with deep learning systems that could help us understand and make progress on AI alignment: the problem of creating AI systems more capable than their designers that robustly try to do what their designers intended. We are interested in proposals that fit within certain research directions, described below and given as posts in the rest of this sequence, that we think could contribute to reducing the risks we are most concerned about.
Anyone is eligible to apply, including those working in academia, industry, or independently. Applicants are invited to submit proposals for up to $1M in total funding covering up to 2 years. We may invite grantees who do outstanding work to apply for larger and longer grants in the future.
Proposals are due January 10, 2022.
Submit a proposal here.
If you have any questions, please contact ai-alignment-rfp@openphilanthropy.org.
Our view of alignment risks from advanced artificial intelligence
This section was written by Nick Beckstead and Asya Bergal, and may not be representative of the views of Open Philanthropy as a whole.
We think the research directions below would be pursued more fruitfully by researchers who understand our background views about alignment risks from advanced AI systems, and who understand why we think these research directions could help mitigate these risks.
In brief:
We believe it is plausible that later this century, advanced AI systems will do the vast majority of productive labor more cheaply than human workers can.
We are worried about scenarios where AI systems more capable than humans acquire undesirable objectives that make them pursue and maintain power in unintended ways, causing humans to lose most or all influence over the future.
We think it may be technically challenging to create powerful systems that we are highly certain have desirable objectives. If it is significantly cheaper, faster, or otherwise easier to create powerful systems that may have undesirable objectives, there may be economic and military incentives to deploy those systems instead.
We are interested in research directions that make it easier to create powerful systems that we are highly certain have desirable objectives.
In this request for proposals, we are focused on scenarios where advanced AI systems are built out of large neural networks. One approach to ensuring large neural networks have desirable objectives might be to provide them with reward signals generated by human evaluators. However, such a setup could fail in multiple ways:
Inadequate human feedback: It’s possible that in order to train advanced AI systems with desirable objectives, we will need to provide reward signals for highly complex behaviors that have consequences that are too difficult or time-consuming for humans to evaluate.
Deceiving human evaluators: It may be particularly difficult to provide good reward signals to an AI system that learns undesirable objectives during training and has a sophisticated model of humans and the training setup. Such a system may “deceive” the humans, i.e. deliberately behave in ways that appear superficially good but have undesirable consequences.
Competent misgeneralization: Even if an AI system has an abundant supply of good reward signals and behaves consistently with desirable objectives on the training distribution, there could be contexts outside of the training distribution where the system retains its capabilities but pursues an undesirable objective.
Deceptive misgeneralization: Rather than subtly misbehaving during ...
view more