Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

The Nonlinear Library: LessWrong

LW - Was Releasing Claude-3 Net-Negative? by Logan Riggs

2024-03-28

Download

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Was Releasing Claude-3 Net-Negative?, published by Logan Riggs on March 28, 2024 on LessWrong. Cross-posted to EA forum There's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn't have a great picture of all the considerations...

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Was Releasing Claude-3 Net-Negative?, published by Logan Riggs on March 28, 2024 on LessWrong.
Cross-posted to EA forum
There's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn't have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
Tabooing "Race Dynamics"
I've heard a lot of people say that this "is bad for race dynamics". I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing "race dynamics", a common narrative behind these words is
As companies release better & better models, this incentivizes other companies to pursue more capable models at the expense of safety. Eventually, one company goes too far, produces unaligned AGI, and we all die".
It's unclear what "at the expense of safety" means, so we can investigate two different interpretations::
If X increases "race dynamics", X causes an AGI company to
Invest less in evals/redteaming models before deployment
Divert resources away from alignment research & into capabilities research
Did releasing Claude-3 cause other AI labs to invest less in evals/redteaming models before deployment?
If OpenAI releases their next model 3 months earlier as a result. These 3 months need to come from *somewhere*, such as:
A. Pre-training
B. RLHF-like post-training
C. Redteaming/Evals
D. Product development/User Testing
OpenAI needs to release a model better than Claude-3, so cutting corners on Pre-training or RLHF likely won't happen. It seems possible (C) or (D) would be cut short. If I believed GPT-5 would end the world, I would be concerned about cutting corners on redteaming/evals. Most people are not.
However, this could set a precedent for investing less in redteaming/evals for GPT-6 onwards until AGI which could lead to model deployment of actually dangerous models (where counterfactually, these models would've been caught in evals).
Alternatively, investing less in redteaming/evals could lead to more of a Sydney moment for GPT-5, creating a backlash to instead invest in redteaming/evals for the next generation model.
Did releasing Claude-3 divert resources away from alignment research & into capabilities research?
If the alignment teams (or the 20% GPUs for superalignment) got repurposed for capabilities or productization, I would be quite concerned. We also would've heard if this happened! Additionally, it doesn't seem possible to convert alignment teams into capability teams efficiently due to different skill sets & motivation.
However, *future* resources haven't been given out yet. OpenAI could counterfactually invest more GPUs & researchers (either people switching from other teams or new hires) if they had a larger lead. Who knows!
Additionally, OpenAI can take resources from other parts such as Business-to-business products, SORA, and other AI-related projects, in order to avoid backlash from cutting safety. But it's very specific to the team being repurposed if they could actually help w/ capabilities research. If this happens, then that does not seem bad for existential risk.
Releasing Very SOTA Models
Claude-3 isn't very far in the frontier, so OpenAI does have less pressure to make any drastic changes. If, however, Anthropic released a model as good as [whatever OpenAI would release by Jan 2025], then this could cause a bit of a re-evaluation of OpenAI's current plan. I could see a much larger percentage of future resources to go to capabilities research & attempts to poach Anthropic employees in-the-know.
Anthropic at the Frontier is Good?
Hypothe...

View more

Comments (3)

More Episodes

You may also like

Adulting with Autism

The Pacific War - week by week

German Stories - Learn German with Stories | Deutsch lernen mit Geschichten

The Mel Robbins Podcast

The Jordan B. Peterson Podcast

Halacha Headlines

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

‌BPLUS بی‌پلاس پادکست فارسی خلاصه کتاب

رادیو راه با مجتبی شکوری

All Ears English Podcast

Get this podcast on your phone, Free

Creat Yourt Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com