Affirming the consequent with LLMs

With the AI hype now in full swing, I’m seeing a lot of “AI bros” swooping in to discussion threads much as the “Crypto bros” of yore did two summers ago, helpfully educating people on how their preferred technological fad will save the world while coincidentally making them rich. The AI fad has an interesting twist: a lot of the staunch proponents (and even some pseudo-critics1) seem obsessed with the idea that large language models or other AI toys are actually sentient, and even human-like.

Whenever someone posts a criticism of LLMs saying that all they do is siphon up text from the Internet and regurgitate it in slightly modified form, you can bet that an AI reply guy will show up to say that “that’s just what humans do!” That may, in fact, be what AI reply guys and tech startup CEOs do,2 but it hardly exhausts the spectrum of human cognitive behavior. In much the same fashion, I’ve heard a lot of people extrapolate from the fact that the multi-layer perceptrons on which most modern AI is based were inspired by certain 1950s-era theories of how the brain works, to the conclusion that deep neural networks must be human-like in intelligence. (Or would be, if we just made them big enough and trained them long enough.) And any time an LLM generates an uncannily human-like response to a prompt, this is taken as more evidence that LLMs are actually thinking.

There’s a fallacy at work here, one that I see more and more these days among tech cultists, conspiracy theorists, and other participants in the madness of crowds. That’s the simple formal fallacy known as “affirming the consequent.” Put simply, it’s the belief that, if A implies B, and we establish that B is true, we can conclude that A must be true. It’s a perversion of good old modus tollens, or “denying the consequent,” that shares the same implication but concludes that, if B is false, then A must also be false. The latter is valid, while the former is invalid, but tantalizingly similar in form, and very easy to stumble into.

The modern variant of affirming the consequent doesn’t usuallly obviously mimic the form of first-order logic. Instead, it usually starts with some sort of narrative, notes that a set of observations is consistent with the narrative, and concludes that the narrative must accurately describe the world. For example, an anti-vaccine conspiracy theorist might form a narrative that a shadowy cabal of government and corporate interests wants to insert tracking devices into our bloodstreams, because apparently the highly sophisticated tracking devices we all already carry aren’t good enough for some reason. If this were true, they reason, we’d see reports of a deadly global pandemic, and governments around the world would require people to get vaccinated in order to secretly inject these tracking devices. Since we observe events that are consistent with this narrative, they conclude that the narrative must be true.

The problem with this, of course, is that, given any finite set of observations, there are infinitely many potential narratives that would be consistent with them. It’s why Karl Popper proposed falsifiability as the necessary condition of a scientific theory. It may be necessary (I say may, because I’m not an authority on the philosophy of science and certainly not in a position to defend Popper against his critics), but it is not sufficient: a narrative that is, in principle, falsifiable, but for which the crucial test that would falsify it is not yet possible, passes Popper’s test but could still be 100% unadulterated hooey.

And that’s where we are with LLMs. Their power to convince people that they work “just like” the human brain (for some vague definition of “just like”) depends on the fact that, after centuries of probing the mysteries of the human brain, we still have almost no idea how it works. When perceptrons were proposed in the 1950s, they were based on the then-current theory that “neurons that fire together wire together,” which paved the way for reinforcement learning’s dominance in machine learning. Many subsequent discoveries in neuroscience have been consistent with the predictions of Hebbian theory, but as we’ve seen, that’s not sufficient to conclude that it’s true. Furthermore, what we have discovered about neuroscience is far, far more complicated than the simple recalculating of connection strengths between neurons. However, since we do not yet have anything close to a complete picture of how the brain actually does what it does, we can’t say for certain that, even if parts of the brain’s functions don’t actually use reinforcement learning, those functions couldn’t be reimplemented in a way that does use it. A definitive refutation of the claim that LLMs essentially work “like the brain works” would require a much better understanding of neurophysiology than we now have, which gives AI bros just the wiggle room they need to make their claims and hope that people will stumble into affirming the consequent.

There is, of course, another claim that is sometimes bandied about that is easier to falsify, and that’s the claim that some LLMs that currently exist are actually sentient. Most famously, this claim was made by the Google engineer Blake Lemoine in reference to that company’s LaMDA model. Here, Lemoine reasoned that the responses he received from his prompts were similar to what a sentient being would say, so, therefore, LaMDA must be sentient.

This is just another example of affirming the consequent, as can be easily demonstrated. Because the Achilles heel of affirming the consequent arguments is that, while any finite set of observations may be consistent with infinitely many explanations, a particular explanation is likely inconsistent with many supersets of that set of observations. For example, consider the following putatively sentient Python script:

while True:
	inp = input('>>> ')
	if inp == 'Are you sentient?':
		print('Yes I am!')
	else:
		print('Like, whatever, dude.')

If you ask this program “Are you sentient?”, you will indeed receive a response that is consistent with what a sentient being would reply. If you ask it anything else, however, you will discover the limitations of this script and be less convinced of its sentience. And that’s the trick of the AI hucksters: they show us just the right set of prompts and responses that seem consistent with sentience, and hope we don’t probe far enough that we run into the model’s limitations. Lemoine’s own chat transcripts, for example, were heavily edited and cherry-picked. All current publicly available LLM-based chatbots run into the same problem: if you have a simple conversation with them and ask questions similar to what the developers predicted you might ask, you’ll probably get answers that convince you you’re talking to a genuinely sentient genius. If you stray from the path just a little bit, however, you’ll get nonsense about bears in space. (I suspect that the “guard rails” most chatbots put in place, that refuse to answer certain types of queries, are there as much to prevent you from shattering the illusion as to prevent you from asking how to make bombs and stuff.)

It’s telling that the Turing test, which people always seem to cite as the de facto test of machine intelligence despite there not really being compelling reasons to do so, actually measures the ability to deceive. Turing based his test on a parlor game known as the “imitation game,” in which a man and a woman behind a screen answer questions and attempt to deceive their interlocutor into thinking they’re of the opposite gender. If they manage to fool the interlocutor, they win. Of course, if a man is able to convince someone he’s a woman based only on written responses to questions, this doesn’t mean he’s actually a woman; it could mean the man is unusually intuitive in matters of gender expression, or the interlocutor is unusually bad at picking up on cues. Or even, for that matter, that the very assumption that there are linguistic differences between men and women is false. Why, then, should we conclude that a computer that tricks someone into thinking it’s intelligent is actually intelligent, rather than just good at bluffing? Presumably there’s a suppressed premise here, which is that the ability to pretend to be intelligent itself requires intelligence, but I’m not sure we have reason to believe this; for one thing, it requires us to have some definitive concept of what “intelligence” actually entails, which is exactly the kind of speculation Turing tried to sidestep when he developed his test.

Let’s think about the setup for both the imitation game and the Turing test. In the imitation game, the man and woman are placed behind a screen and communicate only in writing, the assumption being that their appearance or voices would be a dead giveaway to their gender.3 Whether this assumption is actually true or not, it underscores that the designers of the game thought that only by strictly limiting the forms that interaction could take could they avoid a dead giveaway; in other words, they acknowledged that the players’ capacity for deception was limited. Today, the written word is still the most common way of communicating with AI models. This is partly due to technical limitations – until recently, voice recognition and speech synthesis were notoriously primitive – and partly due to the wealth of written word training data that is available for free if you don’t care about compensating or acknowledging the authors. In other words, the restriction wasn’t built into the Turing test with the express intention of avoiding dead giveaways, but it’s there nonetheless, and instantly eliminates a number of non-linguistic cues that would immediately cause many people to conclude they’re not talking to a thinking being. I get the distinct impression that a lot of AI proponents don’t think these things actually matter, and that things such as body language and tone of voice belong to the realm of emotion and have no place in a test of intelligence.4

The fact that these models were trained on text generated by actual human beings goes a long way toward helping the illusion. If you have a device that spits out text generated by intelligent beings, you shouldn’t be at all surprised that its answers sound like those that would be given by intelligent beings. But concluding that such a device must itself be intelligent is like claiming that a CD player is an excellent singer.

Here’s where our consequent-affirmers will step in and say “But it’s just what humans do! We learn language by hearing other humans, and we reproduce it.” While it may be true that humans do that, it is not the case that it is all that we do; nor must we conclude that having the ability to do this thing is sufficient to be intelligent. LLMs are big, expensive, climate change-inducing paperweights if they haven’t been trained on mountains and mountains of text. But at some point in the past, humans or their ancestors had to figure out how to make language without having examples of human language to train on. Whether you accept the hypothesis that our linguistic ability is an evolved instinct, or merely the side effect of non-linguistic faculties we already possessed, you have to agree that we don’t just learn to talk by hearing other people talk. Not unless you accept either an infinite regress, or some sort of divine intervention that imbued the first humans with full-fledged linguistic abilities.

By contrast, chatbots really do just imitate human speech. Unlike humans, who are born both learning and thinking and never stop as long as their brains are functioning,5 deep learning models have distinct training phases and inference phases. Until the training phase is complete, they can’t apply anything they’ve learned; once the training phase is complete, they can never learn anything new. Sure, many contemporary LLMs incorporate some sort of memory that makes it look like they’ve learned new information based on your previous prompts, but under the hood it’s very different from what happens in the training phase, and if you talk to a chatbot long enough you will quickly see that they’ve forgotten whatever you taught them. They really are a one-trick pony: they’re very good at predicting what the aggregate of Internet randos would say given a certain prompt, but they will never, ever come up with an original idea. They can only do what they were designed to do, which is trick gullible people into thinking they’re smart enough to justify a subscription fee. And, at least until the chatbot bubble bursts and the tech world moves on to the next ill-advised fad, they seem to be performing that job very well.

So the next time you see the latest and greatest attempt at AI do something that seems consistent with an intelligent being, ask yourself two questions: What else might this behavior be consistent with? And is there anything else it does that is more consistent with this alternate hypothesis than with the hypothesis that it’s actually intelligent? Given the history of the Turing test, and the fact that every AI system you encounter was almost certainly developed by someone who’s trying to sell you something, I would submit that “it’s a clever deception” is always a viable alternate hypothesis.


  1. I at least have to give the AI bros some points for originality: At no point did I ever hear crypto bros say that the blockchain would take over the world and destroy humanity unless we adhered to strict “crypto safety” rules that just so happened to favor the business model of their preferred crypto startup. ↩︎

  2. Okay, I admit this was a cheap shot. ↩︎

  3. There are still surprisingly many people who believe this, particularly in the governments of Florida and Texas, and on certain cursed UK parenting websites. ↩︎

  4. I’m sure a lot of said proponents say “Facts don’t care about your feelings” a lot. I am less convinced that reason and emotion occupy strictly separate domains. ↩︎

  5. I know it’s tempting to throw in another cheap shot here about how certain people really do stop learning or thinking, but I feel it’s important to stress that no, they really don’t. No matter how stubborn you are, you’re always learning new things, and no matter how ignorant you might seem, you’re always applying what you’ve learned. Maybe you do both poorly, but you still do it. ↩︎