Download - LW - LLMs and computation complexity by Jonathan Marcus

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - LLMs and computation complexity by Jonathan Marcus

2023-04-29

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLMs and computation complexity, published by Jonathan Marcus on April 28, 2023 on LessWrong. Epistemic status: Speculative. I've built many large AI systems in my previous HFT career but have never worked with generative AIs. I am leveling up in LLMs by working things out from base principles and observations. All feedback is very welcome. Tl;dr: An LLM cannot solve computationally hard problems. Its ability to write code is probably its skill of greatest potential. I think this reduces p(near term doom). An LLM takes the same amount of computation for each generated token, regardless of how hard it is to predict. This limits the complexity of any problem an LLM is trying to solve. Consider two statements: "The richest country in North America is the United States of ______" "The SHA1 of 'abc123', iterated 500 times, is _______" An LLM's goal is to predict the best token to fill in the blank given its training and the previous context. Completing statement 1 requires knowledge about the world but is computationally trivial. Statement 2 requires a lot of computation. Regardless, the LLM performs the same amount of work for either statement. It cannot correctly solve computationally hard statements like #2. Period. If it could, that would imply that all problems can be solved in constant time, which is provably (and obviously) false. Why does this matter? It puts some bounds on what an LLM can do. Zvi writes: Eliezer Yudkowsky does not see any of this as remotely plausible. He points out that in order to predict all the next word in all the text on the internet and all similar text, you need to be able to model the processes that are generating that text. And that predicting what you would say is actually a good bit harder than it is to be a being that says things - predicting that someone else would say is tricker and requires more understanding and intelligence than the someone else required to say it, the problem is more constrained. And then he points out that the internet contains text whose prediction outright requires superhuman capabilities, like figuring out hashes, or predicting the results of scientific experiments, or generating the result of many iterations of refinement. A perfect predictor of the internet would be a superintelligence, it won’t ‘max out’ anywhere near human. I interpret this the opposite way. Being a perfect predictor of the internet would indeed require a superintelligence, but it cannot be done by an LLM. How does an LLM compute? What kinds of problems fall into category 2 (i.e., clearly unanswerable by an LLM)? Let's dig in to how an LLM computes. For each token, it reviews all the tokens in its context window "at least once", call it O(1) time. To produce n tokens, it does O(n^2) work. Without being too precise about the details, this roughly means it can't solve problems that are more complex than O(n^2). Consider some examples (all tested with GPT-4): Addition, O(1) It's not always accurate, but it's usually able to do addition correctly. Sorting, O(n log n) I asked it to sort 100 random integers that I'd generated, and it got it right. My guess is that it doesn't have the internal machinery to do a quick sort, and was probably doing something more like O(n^2), but either way that's within its powers to get right, and it got it. Matrix multiplication, O(n^3) I generated a 3x3 matrix called A and told it to compute AA. This was interesting, let's look at what it did: Pretty cool! It executed the naive matrix multiplication algorithm by using O(n^3) tokens to do it step-by-step. If I ask it to do it without showing its work, it hallucinates an incorrect answer: The result was the right shape, and the elements had approximately the right number of digits. Slight problem: the elements are all incorrect. Whoops. This makes sense though....