Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's new at FAR AI, published by AdamGleave on December 4, 2023 on The AI Alignment Forum.
Summary
We are FAR AI: an AI safety research incubator and accelerator. Since our inception in July 2022, FAR has grown to a team of 12 full-time staff, produced 13 academic papers, opened the coworking space FAR...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's new at FAR AI, published by AdamGleave on December 4, 2023 on The AI Alignment Forum.
Summary
We are FAR AI: an AI safety research incubator and accelerator. Since our inception in July 2022, FAR has grown to a team of 12 full-time staff, produced 13 academic papers, opened the coworking space FAR Labs with 40 active members, and organized field-building events for more than 160 ML researchers.
Our organization consists of three main pillars:
Research. We rapidly explore a range of potential research directions in AI safety, scaling up those that show the greatest promise. Unlike other AI safety labs that take a bet on a single research direction, FAR pursues a diverse portfolio of projects. Our current focus areas are building a science of robustness (e.g. finding vulnerabilities in superhuman Go AIs), finding more effective approaches to value alignment (e.g. training from language feedback), and model evaluation (e.g. inverse scaling and codebook features).
Coworking Space. We run FAR Labs, an AI safety coworking space in Berkeley. The space currently hosts FAR, AI Impacts, SERI MATS, and several independent researchers. We are building a collaborative community space that fosters great work through excellent office space, a warm and intellectually generative culture, and tailored programs and training for members. Applications are open to new users of the space (individuals and organizations).
Field Building. We run workshops, primarily targeted at ML researchers, to help build the field of AI safety research and governance. We co-organized the International Dialogue for AI Safety bringing together prominent scientists from around the globe, culminating in a public statement calling for global action on AI safety research and governance. We will soon be hosting the New Orleans Alignment Workshop in December for over 140 researchers to learn about AI safety and find collaborators.
We want to expand, so if you're excited by the work we do, consider donating or working for us! We're hiring research engineers, research scientists and communications specialists.
Incubating & Accelerating AI Safety Research
Our main goal is to explore new AI safety research directions, scaling up those that show the greatest promise. We select agendas that are too large to be pursued by individual academic or independent researchers but are not aligned with the interests of for-profit organizations. Our structure allows us to both (1) explore a portfolio of agendas and (2) execute them at scale. Although we conduct the majority of our work in-house, we frequently pursue collaborations with researchers at other organizations with overlapping research interests.
Our current research falls into three main categories:
Science of Robustness. How does robustness vary with model size? Will superhuman systems be vulnerable to adversarial examples or "jailbreaks" similar to those seen today? And, if so, how can we achieve safety-critical guarantees?
Relevant work:
Vulnerabilities in superhuman Go AIs,
AI Safety in a World of Vulnerable Machine Learning Systems.
Value Alignment. How can we learn reliable reward functions from human data? Our research focuses on enabling higher bandwidth, more sample-efficient methods for users to communicate preferences for AI systems; and improved methods to enable training with human feedback.
Relevant work:
VLM-RM: Specifying Rewards with Natural Language,
Training Language Models with Language Feedback.
Model Evaluation: How can we evaluate and test the safety-relevant properties of state-of-the-art models? Evaluation can be split into black-box approaches that focus only on externally visible behavior ("model testing"), and white-box approaches that seek to interpret the inner workings ("interpretability"). These approaches are complementary, with ...
View more