Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Retrospective: Lessons from the Failed Alignment Startup AISafety.com, published by Søren Elverlin on May 12, 2023 on LessWrong.
TL;DR: Attempted to create a startup to contribute to solving the AI alignment problem. Ultimately failed due to rapid advancements in large language models and the inherent challenges of startups.
In early 2021, I began considering shorter AI development timelines and started preparing to leave my comfortable software development job to work on AI safety. Since I didn't feel competent enough to directly work on technical alignment, my goal was capacity-building, personal upskilling, and finding a way to contribute.
During our reading group sessions, we studied Cotra's "Case for Aligning Narrowly Superhuman Models" which made a compelling argument for working with genuinely useful models. This inspired us to structure our efforts as a startup. Our team comprised of Volkan Erdogan, Timothy Aris, Robert Miles, and myself, Søren Elverlin. We planned to offer companies automation of certain business processes using GPT-3 in exchange for alignment-relevant data for research purposes.
Given my strong deontological aversion to increasing AI capabilities, I aimed to keep the startup as stealthy as possible without triggering the Streisand effect. This decision significantly complicated fundraising and customer acquisition.
In November 2021, I estimated a 20% probability of success, a view shared by my colleagues. I was fully committed, investing DKK420,000 (USD55,000), drawing no salary for myself, and providing modest compensation to the others.
Startup literature generally advises against one-person startups. Despite our team of four, I was taking on a disproportionate amount of work and responsibility, which should have raised red flags.
My confidence in our success grew during the spring of 2022 when a personal contact helped me secure a preliminary project with a large company that wished to remain anonymous. For $1,300/month, I sold them a business automation solution that relied solely on a large language model for code-generation. However, it didn't provide us with the data we sought. Both parties understood this was a preliminary project, and the company seemed eager about the full project.
Securing this project early on made Rob's role redundant, and we amicably parted ways. Half a year later Tim was offered a PhD, leaving Volkan and I (with minimal help from Berk and Ali).
The preliminary project involved validating several not-quite-standardized Word documents, and I developed a VSTO plugin for Outlook to handle this task. It took longer than anticipated, mainly due to late-discovered requirements. Despite the iterative process, the client was ultimately very satisfied, and I focused on building trust with them during this phase.
The full project aimed to execute business processes in response to incoming emails using multiple fine-tuned GPT-3 models in stages and incorporating as much context as possible into the prompts. Our first practical target was sorting emails in a shared mailbox and delegating tasks to different department members. Initial experiments suggested this process was likely feasible to automate.
We were more intrigued by experiments demonstrating the opposite: certain business processes could not be replicated by GPT-3. This was particularly evident when deviations were necessary for common-sense reasons or when deviations would yield greater value. For example, a customer inquires if product delivery is possible before a specific date. The operator determines it's barely unattainable but recognizes the potential for high profits, which would prompt a human operator to deviate from standard procedures. We could not persuade GPT-3 to do this, and exploring such discrepancies in strategic reasoning seemed worthwhile.
T...
view more