Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Studying The Alien Mind, published by Quentin Feuillade--Montixi on December 5, 2023 on The AI Alignment Forum.
This post is part of a
sequence on LLM psychology
TL;DR
We introduce our perspective on a top-down approach for exploring the cognition of LLMs by studying their behavior,...
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Studying The Alien Mind, published by Quentin Feuillade--Montixi on December 5, 2023 on The AI Alignment Forum.
This post is part of a
sequence on LLM psychology
TL;DR
We introduce our perspective on a top-down approach for exploring the cognition of LLMs by studying their behavior, which we refer to as LLM psychology. In this post we take the mental stance of treating LLMs as "alien minds," comparing and contrasting their study with the study of animal cognition.
We do this both to learn from past researchers who attempted to understand non-human cognition, as well as to highlight how much the study of LLMs is radically different from the study of biological intelligences. Specifically, we advocate for a symbiotic relationship between field work and experimental psychology, as well as cautioning implicit anthropomorphism in experiment design. The goal is to build models of LLM cognition which help us to both better explain their behavior, as well as to become less confused about how they relate to risks from advanced AI.
Introduction
When we endeavor to predict and understand the behaviors of Large Language Models (LLMs) like GPT4, we might presume that this requires breaking open the black box, and forming a reductive explanation of their internal mechanics. This kind of research is typified by approaches like mechanistic interpretability, which tries to directly understand how neural networks work by breaking open the black box and taking a look inside.
While mechanistic interpretability offers insightful bottom-up analyses of LLMs, we're still lacking a more holistic top-down approach to studying LLM cognition. If interpretability is analogous to the "neuroscience of AI," aiming to understand the mechanics of artificial minds by understanding their internals, this post tries to approach the study of AI from a psychological stance.[1]
What we are calling LLM psychology is an alternate, top-down approach which involves forming abstract models of LLM cognition by examining their behaviors. Like traditional psychology research, the ambition extends beyond merely cataloging behavior, to inferring hidden variables, and piecing together a comprehensive understanding of the underlying mechanisms, in order to elucidate why the system behaves as it does.
We take the stance that LLMs are akin to alien minds - distinct from the notion of them being
only stochastic parrots. We posit that they possess a highly complex internal cognition, encompassing representations of the world and mental concepts, which transcend mere stochastic regurgitation of training data. This cognition, while derived from human-generated content, is fundamentally alien to our understanding.
This post compiles some high-level considerations for what successful LLM psychology research might entail, alongside broader discussions on the historical study of non-human cognition. In particular, we argue for maintaining a balance between experimental and field work, taking advantage of the differences between LLMs and biological intelligences, and designing experiments which are carefully tailored to LLMs as their own unique class of mind.
Experiments vs Field Study
One place to draw inspiration from is the study of animal behavior and cognition. While it is likely that animal minds are much more similar to our own than that of an artificial intelligence (at least mechanically), the history of the study of non-human intelligence, the evolution of the methodologies it developed, and the challenges it had to tackle can provide inspiration for investigating AI systems.
As we see it, there are two prevalent categories of animal psychology:
Experimental psychology
The first, and most traditionally scientific approach (and what most people think of when they hear the term "psychology") is to design experiments whic...
View more