Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0, published by James Fox on July 7, 2024 on LessWrong.
TL;DR
We are excited to announce the fourth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! ARENA's mission is to provide...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0, published by James Fox on July 7, 2024 on LessWrong.
TL;DR
We are excited to announce the fourth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! ARENA's mission is to provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles.
ARENA will be running in-person from LISA from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks).
Apply
here before 23:59 July 20th anywhere on Earth!
Summary
ARENA has been successfully run three times, with alumni going on to become MATS scholars and LASR participants; AI safety engineers at Apollo Research, Anthropic, METR, and OpenAI; and even starting their own AI safety organisations!
This iteration will run from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks) at the London Initiative for Safe AI (LISA) in Old Street, London. LISA houses small organisations (e.g., Apollo Research, BlueDot Impact), several other AI safety researcher development programmes (e.g., LASR Labs, MATS extension, PIBBS, Pivotal), and many individual researchers (independent and externally affiliated).
Being situated at LISA, therefore, brings several benefits, e.g. facilitating productive discussions about AI safety & different agendas, allowing participants to form a better picture of what working on AI safety can look like in practice, and offering chances for research collaborations post-ARENA.
The main goals of ARENA are to:
Help participants skill up in ML relevant for AI alignment.
Produce researchers and engineers who want to work in alignment and help them make concrete next career steps.
Help participants develop inside views about AI safety and the paths to impact of different agendas.
The programme's structure will remain broadly the same as ARENA 3.0 (see below); however, we are also adding an additional week on evaluations.
For more information, see our website.
Also, note that we have a Slack group designed to support the independent study of the material (join link here).
Outline of Content
The 4-5 week program will be structured as follows:
Chapter 0 - Fundamentals
Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. We will also cover some subjects we expect to be useful going forward, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.
Note: Participants can optionally skip the program this week and join us at the start of Chapter 1 if they'd prefer this option and if we're confident that they are already comfortable with the material in this chapter.
Topics include:
PyTorch basics
CNNs, Residual Neural Networks
Optimization (SGD, Adam, etc)
Backpropagation
Hyperparameter search with Weights and Biases
GANs & VAEs
Chapter 1 - Transformers & Interpretability
In this chapter, you will learn all about transformers and build and train your own. You'll also study LLM interpretability, a field which has been advanced by Anthropic's Transformer Circuits sequence, and open-source work by Neel Nanda. This chapter will also branch into areas more accurately classed as "model internals" than interpretability, e.g. recent work on steering vectors.
Topics include:
GPT models (building your own GPT-2)
Training and sampling from transformers
TransformerLens
In-context Learning and Induction Heads
Indirect Object Identification
Superposition
Steering Vectors
Chapter 2 - Reinforcement Learning
In this chapter, you w...
View more