Differences between LLMs and humans

Friday, February 16, 2024

More than once I’ve seen the claim made that something or other done by LLMs is “just like” what human minds do. For example, there’s the oft-repeated insinuation that LLMs trained on copyrighted material don’t really plagiarize, because their output is based on exposure to multiple sources, just as human writers reflect their own influences. Or there’s the occasional response to the criticism that LLMs are just glorified autocorrect, merely predicting the next word in a sequence. This, I’ve been told, is not really a criticism, because next-word-prediction is “just like” what humans do.¹ I find the claim that anything is “just like” what happens in the human brain to be astonishing given that our knowledge of the brain is still in its infancy; tellingly, I have never heard an actual neuroscientist make such a claim. Still, there’s one thing we can do given the current state of knowledge: look at the features of LLMs that definitely aren’t like what happens in the human brain.

LLMs, like all deep learning models, have distinct training and deployment phases.

With the possible exception of some experimental architecture I’m not familiar with, every deep learning model is developed in two distinct phases. First, there’s a training phase, in which tons of input data and the expected output data are fed to a network, and the weights between neurons modified, until finally a separate set of test inputs generates the expected outputs to an acceptable degree of accuracy. Once the training phase is complete, the model is deployed. In the deployment phase, the weights never change. The only way to teach a neural network something new is to completely retrain it using new training data.

It’s true that LLMs such as ChatGPT do appear to learn things sometimes. After all, you can give them a prompt and then, in your next interaction, build on what you said in the previous prompt. ChatGPT will usually appear to remember what you said previously without you having to re-type it. But this isn’t real learning in the sense of updating weights. Rather, they simulate memory through a clever hack: the input to the LLM isn’t what you just typed. Instead, it’s the last several things you typed, plus possibly other data, including a random number. (This is what keeps LLMs from generating the exact same output every time you give it the same prompt.) Once you exceed however many prompts the LLM is designed to “remember,” it will completely forget what you said and the responses will not take it into consideration.

Obviously, humans are not like this. The brain isn’t even fully formed when we’re born. It continues growing for decades. Nor are neural connections set in stone through some prenatal training phase; we continue to learn new things, really learn them, throughout our lives, while at the same time continuing to act on what we’ve learned previously. Our “training” and “deployment” occur concurrently, for our entire lives, without a giant set of pre-loaded training data involved.

LLMs are purely reactive

I once made the offhand comment that if I really wanted to make an LLM lose the Turing test, I wouldn’t try to come up with clever prompts to trip it up. Instead, I’d just sit at my keyboard, doing nothing. Any human being, when faced with a silent interlocutor, would eventually say something like “Hello? Is anyone there?” LLMs, however, do not do this. They accept a discrete set of input, process it, and generate output. Then absolutely nothing happens to them until they’re given another set of input. They can never truly act; they only react.

This is not what humans do. Our brains are pretty much always active. We are even capable of introspection: thinking about things that are only happening within our own minds, unrelated to any current input. If we undergo sensory deprivation, attempting to cut off as much input as possible, our brains don’t shut down. On the contrary, they become very active, hallucinating new experiences. For artificial neural networks, activity is a relative rarity punctuating long periods of inactivity; for humans, shutting down thought is pretty much impossible without killing us.

LLMs know only text

I’ve joked that LLMs should be referred to as “Derridean AI,” after Jacques Derrida’s line from Of Grammatology that “there is nothing outside the text.” In the case of LLMs, this is literally true: their training set consists only of text (or rather, text that has been chopped up and converted into non-linguistic tokens that can then be assigned numeric values). LLMs don’t have eyes or ears or any kind of sensory input. They live in a world consisting only of letters and symbols. And they are trained only to predict what other symbols are likely to follow their inputs.

This, to me, is one of the strongest reasons to doubt the claims that LLMs are conscious, or sentient, or do anything at all reminiscent of human thought. When a human learns a word, they connect it with their lived experience. You don’t just learn the word “dog” by studying all the other words that tend to appear alongside it. You can also see, hear, touch, and smell actual dogs, and associate those experiences with the word. What’s more, as a human living in the world, you have needs, wants, and interests. You need to impart information to others, make requests from them, and so forth, and that informs what utterances you make, even which words you learn. This need to communicate with others is the very reason why you make utterances in the first place! LLMs don’t talk because they have something to say. They talk because that’s what they’re built to do when someone feeds them a prompt. They’ll only appear to impart information if there is data in their training set that allows them to simulate an intelligent conversation. If you told an LLM you were drowning, they would not lend a hand, but they might say “I’ll fetch a life preserver!” if they’ve been fed a similar exchange. That won’t mean a damn thing, however.

This lack of real-world experience is also why I refuse to believe that the process by which AI generates “art” is in any meaningful way similar to how humans create art. You hear this claim trotted out when people accuse LLMs of plagiarism; one of the common responses is that LLMs “draw inspiration from the things they’ve read, just like humans!” Except a human novelist doesn’t sit in a featureless room ingesting text until they can spit out novels of their own. They have lived experiences. They have wants and desires that are either gratified or thwarted by the experiences they have. They take action in the world with the hopes of satisfying those desires. They have memories. And it’s only after a whole lot of wants and desires and experiences and actions and memories that they even reach the point where they can read and enjoy a book in the first place, and they can only enjoy a work of fiction because it speaks to their lived experience. Likewise, they can only create a true work of art because they are able to create something that speaks to the lived experiences of others.

I’m not saying LLMs are incapable of outputting text that can elicit an emotional response from readers. What I am saying is that, if they manage to do that, it is purely by accident, and probably mimics very closely some human text they were trained on. For that reason, LLMs can only make imitative art and never push boundaries the way human artists can. When a new artistic or literary movement takes off, it is of course informed by the art that came before it, but it has something more. The artists who create it have had real-world experiences and desires. And, the changing nature of history being what it is, it is likely the artists that make up a movement (and their audience) share certain experiences that previous generations did not, which is the very thing that makes a new type of literature speak to its audience. Without those experiences, all LLMs can do is imitate mechanically.

LLMs can’t handle the truth

When people read texts, they don’t just ingest them. They judge them. They decide whether the claims made in those texts are true or false, plausible or implausible. Often these judgements are wrong, but at least we make them. And we are able to do so in large part because, as I argued in the last section, we have sensations independent of text. If I read a sentence saying that it is currently raining in my location, I can judge it to be true or false by looking out my window.

LLMs are a particular type of machine learning model, and they fall under the broad category of supervised learning. One thing all supervised learning has in common is an evaluation metric: some quantifiable, measurable property that can be used to judge the model’s accuracy. An algorithm for recommending videos to customers, for example, is evaluated based on how many of those recommended videos are actually watched, and for how long. The fact that many people seem to trust ChatGPT and its ilk to answer questions suggests that – to the extent they think about evaluation metrics at all – they assume that it’s optimized to provide true statements. But it’s not. It’s optimized to predict the next word in a sequence. It produces true statements only to the extent that its training set contained true statements (and that the truth was preserved in the long journey through the neural network’s inscrutable hidden layers). And since these models are largely trained on text scraped from the Internet… we should all be amazed they manage to emit any true statements at all.

(It’s true that most commercial LLMs like ChatGPT are not just raw word-prediction models, but contain several sets of guardrails, whether it be bespoke additions to the training set or filters added to the output to prevent egregious mistakes like telling people they should ingest drain cleaner to cure their rheumatism. It’s also true that some LLMs are designed to retrieve curated data from a database and translate it to natural language rather than simply relying on training data. But I would argue that in these cases the LLM proper still has no concept of truth, and the guardrails don’t change that, any more than the bumpers novice bowlers sometimes place in the gutters transform them into better bowlers.)

I know that, strictly speaking, humans aren’t optimized for truth either. Except we sort of are, or at least more so than LLMs. Again, we are situated in the real world, we have desires, and the satisfaction of those desires – indeed, our very survival – depends on how well we understand that world. The more true beliefs we hold, the better our chances of surviving and thriving. It’s not perfect, because the world is complex, and it’s possible to be mistaken about a lot of things and still get what you want. But the need to hold true beliefs is built into the human condition in a way that LLMs lack entirely.

I think that part of the problem stems from the extent to which the so-called “Turing test” has been taken as the gold standard for evaluating artificial intelligence. The test, you will recall, assumes that a machine is intelligent exactly to the degree that it is able to trick its interlocutors into thinking that it is a human. In other words, almost since its inception, artificial intelligence has been optimizing for the ability to deceive.² We therefore shouldn’t be surprised at all when LLMs feed us misinformation that sounds completely plausible; it’s exactly what they’re designed to do.

Conclusion

This was a non-exhaustive list of ways I can think of where LLMs are obviously different from humans. Note that “different from humans” is not synonymous with “unintelligent” or “not conscious,” though I have, where applicable, argued that some of these differences do suggest LLMs fall far short of the bar for sentience. My main intention was to counter the frequent yet unsubstantiated claims that LLMs think “just like humans” even when there are obvious and important ways in which this is false.

It’s pretty amazing that this claim still gets trotted out after Chomsky’s rebuttal some sixty-five years ago. People who claim human language is just next-word-prediction never seem to even address this criticism, just furiously sleeping like the colorless green ideas they are. ↩︎
Of course, in practice, most mundane machine learning models aren’t designed to deceive you. Spotify doesn’t care whether or not you think their music recommendation system is similar to something a human would do; they just care whether that recommendation system keeps you listening to Spotify longer so you hear more ads or keep paying for your subscription. But a high profile subset of machine learning models is designed to be functionally indistinguishable from humans, and most commercial LLMs fall into that subset. ↩︎