Research proposal

Functional Analysis of AI Behavior

Behavior analysis spent the twentieth century building a descriptive technology for systems whose insides were not the starting point for description: antecedent conditions, response classes, and selection histories, without requiring mechanistic access first. Large language models create a related descriptive problem from the outside. This project asks whether that discipline transfers — not by treating AI as a disguised human or a model organism, but by asking what it takes to describe the system on its own terms without defaulting to mentalistic ones.

Why this exists

A descriptive technology that already handles systems like this

Behavior analysis built its methods around organisms whose insides were not directly available for description. The core moves — identifying antecedent conditions, defining response classes by function rather than topography, tracking selection histories, carving at joints where intervention hooks in, refusing inner causes whose only evidence is the behavior they were introduced to explain — are directly relevant to describing systems that can be queried and intervened on without being fully reverse-engineered.

The question is not whether the tradition is sufficient. It is whether the discipline that comes out of that tradition, including its refusal to invent inner entities that merely rename what they were supposed to explain, can add something that the current vocabulary of AI behavior does not.

The anthropomorphism side

What the first essay is about

The first companion essay is a critique of the mentalistic vocabulary that surrounds LLMs — in product copy, research papers, and user-facing documentation. Terms like understanding, reasoning, hallucination, scheming, and alignment carry empirical commitments that are rarely stated and often do not hold. The essay audits those terms one by one and asks which can be tightened, which should be replaced, and which are doing rhetorical work in place of description. A machine that says sorry is an editorial precursor that makes the case at the interface-and-product level; the essay here does the vocabulary audit that precursor points toward.

The construction side

What the second essay is about

The second companion essay is constructive. It treats LLM behavior as a domain where the existing behavior-analytic vocabulary has to be rebuilt, not imported wholesale, and proposes candidate functional response classes for LLM outputs, pairings between behavior-analytic concepts and LLM conditions, and experimental protocols for testing them. The discipline it borrows from Skinner is not a fixed set of conclusions but a method for building new functional taxonomies when inherited vocabularies fail.

How the work is organised

What the modules do

Terminology audit. Term-by-term scoring of the inherited AI vocabulary — hallucination, agent, reasoning, understanding, alignment, context — against tests for independent specifiability, operational stability, and explanatory inflation.
Conceptual grounding. Where behavior analysis earns its method move, where the organism analogy breaks, and what honest accounting of the differences looks like when the target is AI behavior in its own right.
Concept pairings. Discriminative stimulus and prompt; stimulus context and LLM context; verbal operants tested against LLM output as a mostly-negative construction result.
Experimental protocols. Persona-framing as a manipulation; factual-accuracy intervention with a claim-to-evidence ledger.
Landscape and positioning. Where the work sits against mechanistic interpretability, causal abstraction, XAI, behavioral evaluations, cognitive-science probing of LLMs, model organisms of misalignment, representation engineering, deflationary philosophy of mind, and model multiplicity. Adjacent programs doing compatible work are named; disagreements are stated specifically.

Where the contribution is

What the combination is for

Several adjacent programs already produce narrow operational response classes for specific LLM phenomena — trigger-conditioned failures in model-organism work, behavioral metrics in capability evaluations, audited terminology in philosophy of language. The combination this project is building is different: applying an explanatory-fiction filter across the whole stack — training, evaluation, interpretability, and user-facing discourse — and constructing a functional response-class taxonomy meant to replace mentalistic descriptors rather than merely annotate them.

An engineering blueprint of a small robot labelled 'Project: Kind Gesture,' shown handing a bouquet to a parking meter. Annotations describe the scene from the outside — 'emotional output: hopeful,' 'meter response: no audio output detected' — without claiming access to inner states.

Constraints

What this work will not do

It will not claim that behavior analysis provides a complete theory of LLM behavior.
It will not treat AI as a model organism for humans or as a disguised human case. The transfer here is methodological, not a claim about shared kind.
It will not argue against mechanistic interpretability, evaluation science, or cognitive-science probing as programs. Those programs do work this project does not try to replace.
It will not argue that LLMs do or do not have mental states. The question is whether mental-state vocabulary adds descriptive purchase, or removes it.
It will not conflate functional analysis as a family of interpretability methods with functional analysis as a behavior-analytic research program. Those are different senses of functional and the essays say so.

Who the work is for

Intended readers

Researchers working on LLM behavior from the interpretability, evaluation, and alignment sides who are already uncomfortable with the mentalistic vocabulary the field has inherited, and who want a discipline for describing what models do that does not require first solving how they do it. Behavior analysts working on AI who want the bridge from their tradition into the current research stack written down. Technical and policy readers who need a vocabulary for AI behavior that commits to what it claims.

Status

Where the work is now

Module development is ongoing. The landscape scan against adjacent functional-analysis programs — mechanistic interpretability, causal abstraction and probing, the broader XAI field, behavioral evaluation, cognitive-science probing of LLMs, model organisms of misalignment, deflationary philosophy of mind, representation engineering, and model multiplicity — is complete, with per-area and synthesis-level positioning. Draft versions of both companion essays exist and are being revised.

This is a research proposal — real enough to describe and specific enough to stand behind, but the essays are not yet published. Specifics here will tighten as the essays do.