Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them, published by Roman Leventov on December 27, 2023 on The AI Alignment Forum.
This post is prompted by two recent pieces:
First, in the podcast "Emergency Pod: Mamba, Memory, and the SSM Moment", Nathan...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them, published by Roman Leventov on December 27, 2023 on The AI Alignment Forum.
This post is prompted by two recent pieces:
First, in the podcast "Emergency Pod: Mamba, Memory, and the SSM Moment", Nathan Labenz described how he sees that we are entering the era of heterogeneity in AI architectures because currently we have not just one fundamental block that works very well (the Transformer block), but two kinds of blocks: the Selective SSM (Mamba) block has joined the party.
Moreover, it's demonstrated in many recent works (see the StripedHyena blog post, and references in appendix E.2.2. of the Mamba paper) that hybridisation of Transformer and SSM blocks works better than a "pure" architecture composed of either of these types of blocks. So, we will probably quickly see the emergence of complicated hybrids between these two.[2]
This reminds me of John Doyle's architecture theory that predicts that AI architectures will evolve towards modularisation and component heterogeneity, where the properties of different components (i.e., their positions at different tradeoff spectrums) will converge to reflect the statistical properties of heterogeneous objects (a.k.a. natural abstractions, patterns, "pockets of computational reducibility") in the environment.
Second, in this article, Anatoly Levenchuk rehearses the "no free lunch" theorem and enumerates some of the development directions in algorithms and computing that continue in the shadows of the currently dominant LLM paradigm, but still are going to be several orders of magnitude more computationally efficient than DNNs in some important classes of tasks: multi-physics simulations, discrete ("system 2") reasoning (planning, optimisation), theorem verification and SAT-solving, etc.
All these diverse components are going to be plugged into some "AI operating system", Toolformer-style. Then Anatoly posits an important conjecture (slightly tweaked by me): as it doesn't make sense to discuss some person's "values" without considering (a) them in the context of their environment (family, community, humanity) and (b) their education, it's pointless to discuss the alignment properties and "values" of some "core" AGI agent architecture without considering the whole context of a quickly evolving "open agency" of various tools and specialised components[3].
From these ideas, I derive the following conjectures about an "AGI-complete" architecture[4]:
1. AGI could be achieved by combining just
(a) about five core types of DNN blocks (Transformer and Selective SSM are two of these, and most likely some kind of Graph Neural Network with or without flexible/dynamic/"liquid" connections is another one, and perhaps a few more)[5];
(b) a few dozen classical algorithms for LMAs aka "LLM programs" (better called "NN programs" in the more general case), from search and algorithms on graphs to dynamic programming, to orchestrate and direct the inference of the DNNs; and
(c) about a dozen or two key LLM tools required for generality, such as a multi-physics simulation engine like JuliaSim, a symbolic computation engine like Wolfram Engine, a theorem prover like Lean, etc.
2. The AGI architecture described above will not be perfectly optimal, but it will probably be within an order of magnitude from the optimal compute efficiency on the tasks it is supposed to solve[4], so, considering the investments in interpretability, monitoring, anomaly detection, red teaming, and other strands of R&D about the incumbent types of DNN blocks and NN program/agent algorithms, as well as economic incentives of modularisation and component re-use (cf. "BCIs and the ecosystem of modular minds"), this will probably be a sufficient motivation to "lock in" the cho...
View more