Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Compleat Cybornaut, published by ukc10014 on May 19, 2023 on LessWrong.
A cluster of conceptual frameworks and research programmes have coalesced around a 2022 post by janus, which introduced language models as ‘simulators’ (of other types of AIs such as agents, oracles, or genies). One such agenda, cyborgism, was coined in a post by janus and Nicholas Kees and is being researched as part of the 2023 editions of AI Safety Camp and SERI MATS. The objective of this document is to provide an on-ramp to the topic, one that is hopefully accessible to people not hugely familiar with simulator theory or language models.
So what is cyborgism?
Cyborgism proposes to use AIs, particularly language models (i.e. generative-pretrained transformers or GPTs), in ways that exploit their (increasingly) general-purpose intelligence, while retaining human control over the ‘dangerous bits’ of AI – i.e. agency, planning, and goal-formation. The overall objective is to leverage human cognitive ability while minimising the risks associated with agentic AI.
Aside from agency, a core assertion of cyborgism is that certain commonly-used language models are not well-suited to many tasks human users throw at them, but that humans, if appropriately-trained and equipped, might more effectively use GPTs in ways that are ‘natural’ for the model, while dramatically increasing the productive and creative potential of the human.
Specifically, some current systems, such as ChatGPT, are released or predominantly used in a ‘tuned’ version, which has a host of shortcomings. One such tuning method, reinforcement-learning from human feedback (RLHF) has a specific weakness relevant to cyborgism: the tuning process severely limits, or collapses, a valuable aspect of the GPT, namely its wild, unconstrained creativity.
Superficially, the cyborgism approach may resemble a human-plus-oracle setup, but there is a subtle and important distinction: an oracle, it is argued, might ‘smuggle in’ some of the trappings of an agent. In contrast, the human cyborg embeds the output of the language model into their own workflow and thinking - model and human work as an integrated system. The cyborg leverages the model’s creative, albeit non-agentic, potential while continuously ‘steering’ or ‘course-correcting’ the model to ensure its output remains relevant to the actual goal. However, cyborgism might entail a high alignment tax: absent appropriate workflows and tools, a setup consisting of a human plus non-agentic GPT might be considerably less productive than a purely agentic AI (as the human component becomes a bottleneck).
Background Concepts
Before getting into practical cyborgism, it is helpful to summarize some relevant theories and intuitions about how language models work.
Why is in-context learning relevant?
Neural networks generally, and language models specifically, go through several types of training: the large-scale (in terms of compute, time, and data) pre-training when all the neural weights are set in an end-to-end optimisation process; one or more fine-tuning rounds to focus the model on a specific use domain (during which the weights also change); and, in the case of certain models, including GPT-4, ChatGPT, and text-davinci-003, various types of supplementary tuning, which in the case of GPT-4 seems to include RLHF and rule-based reward modelling (RBRM).
The final phase of training, known as ‘in-context learning’, happens during the session with the user, and doesn’t involve actual changes in neural weights, but does still significantly alter the type of output the model generates, based on the accumulated context of its interaction with an user in a given session. The mechanisms by which this happens are debated, but from a cyborgism perspective, the context provides a powerful way of guiding or cont...
view more