Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on open source AI, published by Sam Marks on November 3, 2023 on The AI Alignment Forum.
Epistemic status: I only ~50% endorse this, which is below my typical bar for posting something. I'm more bullish on "these are arguments which should be in the water supply and discussed" than "these arguments are actually...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on open source AI, published by Sam Marks on November 3, 2023 on The AI Alignment Forum.
Epistemic status: I only ~50% endorse this, which is below my typical bar for posting something. I'm more bullish on "these are arguments which should be in the water supply and discussed" than "these arguments are actually correct." I'm not an expert in this, I've only thought about it for ~15 hours, and I didn't run this post by any relevant experts before posting.
Thanks to Max Nadeau and Eric Neyman for helpful discussion.
Right now there's a significant amount of public debate about open source AI. People concerned about AI safety generally argue that open sourcing powerful AI systems is too dangerous to be allowed; the classic example here is "You shouldn't be allowed to open source an AI system which can
produce step-by-step instructions for engineering
novel pathogens." On the other hand, open source proponents argue that open source models haven't yet caused significant harm, and that trying to close access to AI will result in
concentration
of
power
in the hands of a few AI labs.
I think many AI safety-concerned folks who haven't thought about this that much tend to vaguely think something like "open sourcing powerful AI systems seems dangerous and should probably be banned." Taken literally, I think this plan is a bit naive: when we're colonizing Mars in 2100 with the help of our aligned superintelligence, will releasing the weights of GPT-5 really be a catastrophic risk?
I think a better plan looks something like "You can't open source a system until you've determined and disclosed the sorts of threat models your system will enable, and society has implemented measures to become robust to these threat models. Once any necessary measures have been implemented, you are free to open-source."
I'll go into more detail later, but as an intuition pump imagine that: the best open source model is always 2 years behind the best proprietary model (call it GPT-SoTA)
[1]
; GPT-SoTA is widely deployed throughout the economy and deployed to monitor for and prevent certain attack vectors, and the best open source model isn't smart enough to cause any significant harm without GPT-SoTA catching it. In this hypothetical world, so long as we can trust GPT-SoTA
,
we are safe from harms caused by open source models. In other words, so long as the best open source models lag sufficiently behind the best proprietary models and we're smart about how we use our best proprietary models, open sourcing models isn't the thing that kills us.
In this rest of this post I will:
Motivate this plan by analogy to responsible disclosure in cryptography
Go into more detail on this plan
Discuss how this relates to my understanding of the current plan as implied by responsible scaling policies (RSPs)
Discuss some key uncertainties
Give some higher-level thoughts on the discourse surrounding open source AI
An analogy to responsible disclosure in cryptography
[I'm not an expert in this area and this section might get some details wrong. Thanks to Boaz Barak for pointing out this analogy (but all errors are my own).
See this footnote
[2]
for a discussion of alternative analogies you could make to biosecurity disclosure norms, and whether they're more apt to risk from open source AI.]
Suppose you discover a vulnerability in some widely-used cryptographic scheme. Suppose further that you're a good person who doesn't want anyone to get hacked. What should you do?
If you publicly release your exploit, then lots of people will get hacked (by less benevolent hackers who've read your description of the exploit). On the other hand, if
white-hat
hackers always keep the vulnerabilities they discover secret, then the vulnerabilities will never get patched until a black-hat hacker finds the vulnerability and explo...
View more