Download - LW - Retrospective: Lessons from the Failed Alignment Startup AISafety.com by Søren Elverlin

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - Retrospective: Lessons from the Failed Alignment Startup AISafety.com by Søren Elverlin

2023-05-12

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Retrospective: Lessons from the Failed Alignment Startup AISafety.com, published by Søren Elverlin on May 12, 2023 on LessWrong. TL;DR: Attempted to create a startup to contribute to solving the AI alignment problem. Ultimately failed due to rapid advancements in large language models and the inherent challenges of startups. In early 2021, I began considering shorter AI development timelines and started preparing to leave my comfortable software development job to work on AI safety. Since I didn't feel competent enough to directly work on technical alignment, my goal was capacity-building, personal upskilling, and finding a way to contribute. During our reading group sessions, we studied Cotra's "Case for Aligning Narrowly Superhuman Models" which made a compelling argument for working with genuinely useful models. This inspired us to structure our efforts as a startup. Our team comprised of Volkan Erdogan, Timothy Aris, Robert Miles, and myself, Søren Elverlin. We planned to offer companies automation of certain business processes using GPT-3 in exchange for alignment-relevant data for research purposes. Given my strong deontological aversion to increasing AI capabilities, I aimed to keep the startup as stealthy as possible without triggering the Streisand effect. This decision significantly complicated fundraising and customer acquisition. In November 2021, I estimated a 20% probability of success, a view shared by my colleagues. I was fully committed, investing DKK420,000 (USD55,000), drawing no salary for myself, and providing modest compensation to the others. Startup literature generally advises against one-person startups. Despite our team of four, I was taking on a disproportionate amount of work and responsibility, which should have raised red flags. My confidence in our success grew during the spring of 2022 when a personal contact helped me secure a preliminary project with a large company that wished to remain anonymous. For $1,300/month, I sold them a business automation solution that relied solely on a large language model for code-generation. However, it didn't provide us with the data we sought. Both parties understood this was a preliminary project, and the company seemed eager about the full project. Securing this project early on made Rob's role redundant, and we amicably parted ways. Half a year later Tim was offered a PhD, leaving Volkan and I (with minimal help from Berk and Ali). The preliminary project involved validating several not-quite-standardized Word documents, and I developed a VSTO plugin for Outlook to handle this task. It took longer than anticipated, mainly due to late-discovered requirements. Despite the iterative process, the client was ultimately very satisfied, and I focused on building trust with them during this phase. The full project aimed to execute business processes in response to incoming emails using multiple fine-tuned GPT-3 models in stages and incorporating as much context as possible into the prompts. Our first practical target was sorting emails in a shared mailbox and delegating tasks to different department members. Initial experiments suggested this process was likely feasible to automate. We were more intrigued by experiments demonstrating the opposite: certain business processes could not be replicated by GPT-3. This was particularly evident when deviations were necessary for common-sense reasons or when deviations would yield greater value. For example, a customer inquires if product delivery is possible before a specific date. The operator determines it's barely unattainable but recognizes the potential for high profits, which would prompt a human operator to deviate from standard procedures. We could not persuade GPT-3 to do this, and exploring such discrepancies in strategic reasoning seemed worthwhile. T...