The Modern Library of Babel
Note: the following applies mainly to "classic", pre-CoT LLMs, although the conclusions and metaphors can be extended to reason about newer variants.
Many have the intuition that a LLM doesn't really understand what it's saying: it doesn't really reason, it has no intent to convey anything and it doesn't even know the difference between truth and falsehood. This intuition may be sound, but it's not so easy to substantiate. Those with a bit of technical savvy may point out that the LLM is just a "token predictor", which consolidates the intuition a little -- after all, predicting tokens does seem like a far cry from deliberation -- but it remains a mere intuition, and one that can be reasonably questioned: could you finish this sentence without first building up an understanding of what the paragraph is getting at?
Now, those with more charitable views towards AI might argue that LLMs are still poorly understood "black boxes", and accuse the skeptics of reductionism: after all, nature is full of "mundane" mechanisms that give rise to emergent phenomena with qualities that differ from their causes. Maybe the complex mathematical machinery behind token prediction somehow gives rise to understanding.
Or maybe not
One crucial thing to understand about the "token predictor" perspective, is that it isn't actually reductionist -- at least not in the same sense as calling a biological brain "just a clump of neurons" would be. The "token predictor" characterization in fact summarizes the high-level functional definition of the model. Any emergent qualities would be upstream from this definition, arising out of the low-level details of the Transformer architecture, to satisfy the demands of accurate token prediction, rather than out of token prediction, to satisfy some more abstract goal.
While it's true that the structure of a text emerges from repeated token prediction, each prediction step is an independent process in principle (even if it follows the trajectory implicit in the intermediate results). In some sense, the resulting text is greater than the sum of its parts: the meaning of each word is modified by the entire context surrounding it; the whole text is comprised from parts that interact in complicated ways, showing semantics beyond the concatenation of word meanings. Nonetheless, behind it all is a piecemeal process of construction, which can be analyzed as such, and still yield conclusions about the whole.
It will be shown that the token predictor perspective does indeed imply qualitative constraints, which can't be transcended through the magic of emergence. Far from being reductionist, this perspective provides a mathematical tool to examine what emerges on a higher level than that of individual texts: it allows to organize all of the model's possible outputs into a single structure.
To do this, I turn to Borges' Library of Babel for inspiration:
The story describes a seemingly infinite library composed of endless hexagonal rooms. Each room houses shelves of books containing every possible arrangement of letters, spaces, and punctuation marks. Consequently, the library contains not only all meaningful works -- every truth, lie, prophecy, scientific breakthrough, or masterpiece -- but also endless volumes of unintelligible nonsense.
The library's inhabitants -- its librarians -- wander tirelessly through the infinite corridors, driven by an insatiable quest for meaning and enlightenment. Yet the library's infinitude renders every meaningful text effectively impossible to find, buried amidst infinite nonsense. Worse yet, the librarians are haunted by the impossibility of distinguishing truth from falsehood, as every conceivable statement is equally present.
Different sects form among them: some search desperately for a mythical "catalog" of the library's contents, others destroy books deemed useless, and yet others resign themselves to nihilistic despair -- religious faith, fundamentalism, or nihilism, all born from a yearning for certainty.
It's a poignant story, but what does it have to do with language models?
For a start, think of Borges' Library as the set of outputs from a certain kind of language model: one that predicts characters instead of tokens, but does so with no regard for accuracy. For any given piece of text, it predicts any potential next character to be as likely as any other. Naturally, such a model suggests no organizing principle for its outputs -- at least not one that helps tell apart data from noise -- because it represents a uniformly random generative process.
Now, imagine an analogous library based on tokens instead of characters: it would contain all possible texts up to some maximum number of tokens, representing outputs from an "untrained" LLM. This "modernized" version of the library would have the same properties as the classic one. However, it would be precisely the set of possible outputs from a normal LLM (assuming even nonsensical token completions have small but nonzero probabilities of occurrence). So how can we organize the Modern Library of Babel to reflect the structuredness of a trained language model?
Of course -- perplexity
Formally, perplexity is defined as the exponentiated average negative log-likelihood of a sequence:
PP(X) = exp(-1/N ∑ log P(xᵢ|x₁,...,xᵢ₋₁))
Less formally, we treat a given text as a sequence of outcomes from a random process (the outcomes being the tokens). Each outcome is assigned a probability, based on the tokens that precede it, according to the model -- this reflects how well the corresponding followup conforms to the model's training data. The probabilities are then aggregated so that a meaningful average could be computed for the entire text. Finally, it's mapped into a value reflecting (roughly) the average number of viable continuations at each step, according to the model. This metric follows naturally from the token predictor definition, making direct use of its terms.
Perplexity values range from 1 to infinity. A value of 1 indicates a text so predictable that every step leads inevitably into the next: it expresses something very typical of the training data and very familiar to the model, so completing it is akin to mechanistic repetition from rote memorization. On the other hand, high values indicate a branching labyrinth of possibilities: the model isn't sure where the text is going, and the higher number of possible continuations implies a lower confidence in each one.
Now, imagine a structure of concentric circles
The circles make borders dividing an area into rings. Each border is assigned a perplexity value, starting with 1 for the circular center and gradually increasing as you go outwards. The areas between adjacent circles form perplexity bands: each area houses all the texts whose perplexity values fall in the range implied by its inner and outer borders.
This new Library sorts all the model's possible outputs, faithfully reflecting what the model has learned. To get a potential response for a given prompt, you only need to pick a band, find in it a text that begins with that prompt, and read further. Yet this also exposes the model's fundamental limitation: which band to pick? As we'll see, the relationship between perplexity and truth is far from straightforward.
As LLM designers well know, the texts that users want won't necessarily be found in the Library's center, which is its most conservative part. Consider what happens if a user asks a language model to solve a difficult or unsolved problem: a response from the library's center would likely offer general pointers, or simply state that there's no known solution -- a "safe" response conforming well to the training data -- but it wouldn't stray into a "riskier" attempt to generate a novel solution. That's a shame, because that part of the library is the most likely to contain true statements, or at least such that are aligned with conventional knowledge.
In practice, popular products allow the user to explore some of the bands around the center, where more creative outputs can be found. However, since the Token Predictor design provides no direct means to decouple correctness from creativity, the choice of bands is ultimately a balancing act between the two. Extrapolating from these empirical observations, and following the implications of the perplexity metric, we can justifiably conclude that truly innovative and groundbreaking texts would be relatively far from the center: Einstein's unpublished theories or the solution to the Riemann hypothesis are inherently unlikely texts.
So why not look there?
Well, based on the definition of perplexity, the number of potential responses to a prompt explodes exponentially; meanwhile, the number of correct responsnes remains constant, with only so many ways to state them. Thus the ratio of useful and correct texts, to nonsensical and incorrect ones, necessarily plummets the further out you go. The probability that at least some of those texts would be groundbreaking works increases, but the probability of stumbling upon them vanishes.
Given relatively narrow perplexity bands, the texts of each band all look equally good to the model -- equally likely to be generated, regardless of their sensibility or correctness. The outer bands are also likely to contain texts that contradict each other, including direct contradictions on the same subject matter or even for the same prompt. Each text represents a path the model might follow in the course of the generative process, as it chooses out of many possible tokens at each step, but the model has no inherent preference for any particular path within the same perplexity band: the "decisions" are made by a sampling algorithm driven by a pseudo-random number generator. Thus the outer reaches of our Modernized Library of Babel approximate the situation of Borges' classic one, to an ever-increasing degree.
This provides a concrete demonstration of what it means to say that the model doesn't understanding what it's talking about: it doesn't know or care how the different paths it could take compare to each other, nor see itself as a reasoned judge exploring the Modern Library of Babel to uncover truths. No -- the model IS the Library itself. If it can be said to possess semantics at all, the generative process only considers the semantic gestalt from a very narrow, utilitarian point of view: as much as it needs to guess the next token, one token at a time.
Interestingly, the Modern Library of Babel has its own librarians: many explore its outskirts for hidden nuggets of undiscovered truth, while some search for something akin to the Catalogue from Borges' story: if only they could find the right prompt, the Library would do the rest of their work for them, reliably providing a solution to every problem and an answer to every question. Some even go as far as to look for a secret incantation to awaken the Library to its own latent consciousness. Maybe, like Borges' librarians, they are wandering endless corridors, their noses stuck in the endless volumes, failing to grasp the bigger picture of this structure and their pursuit.
But can't the same argument be turned back against humans?
Why not hypothesize a Human Library of Babel, that organizes all the potential outputs of the human mind, based on a human measure of perplexity? Wouldn't that reveal an analogous situation?
I would argue that it is not the case.
For starters, human "perplexity" works differently. If I'm reading a groundbreaking work, it may come off as confusing and dubious at first -- perhaps I would even assign it a provisional perplexity value as high as gibberish; indeed, it wouldn't be so ground-breaking if it didn't start off with unusual premises and set off on an unlikely path. But by the end of it, the author will likely have justified his foray into the unusual and unlikely, showing that the work, as a whole, aligns with known facts and valid patterns of reasoning. In the final evaluation, it may have a perplexity value as low as most credible theories. Such a dramatic reevaluation of all the previous tokens, and the correspondingly abrupt change in perplexity, doesn't happen with a Transformer LLM: the text may eventually settle into relative predictability, given enough context, which may even out the initial "confusion" a little, but it will continue to unduly taint the average, driving up the perplexity metric.
Moreover, thanks to token prediction, there's a symmetry in the way we can use a model to evaluate and generate texts. The same is not true for humans: powered by highly abstract thinking, the human mind has a surprising capacity to sense connections between seemingly distant concepts and ideas. Granted, in order to validate and communicate such an insight, a human author will need to mediate between its key elements, by building up an argument with a linear progression that another mind can follow; yet the crucial insight will usually precede this explicit mediation. A LLM, on the other hand, doesn't search for a way to mediate between what's currently established and some insight it wishes to communicate; instead, it follows the path that unfolds before it to an unknown destination, the intermediate stepping stones acting as its guides.
Nonetheless, the imagery of a core with concentric circles around it can be repurposed for the the human case:
Imagine the core to be the Earth -- ground truth. Around it, the atmosphere gets thinner and thinner, going outwards, reaching the vacuum of space. Some ideas are laying on the ground: obvious conclusions, easy to pick up. Others are hanging in the sky, or all the way in outer space: flights of fancy, unprovable theories, or outright delusions.
But you can also imagine a mountain: its base starts at the ground and goes way up. An adventurer sees the broad shape of the mountain with its peak from afar, plotting a rough course to get there. As he starts to climb, he is confronted with a myriad climbing challenges he did not foresee, such that can only be solved when he's up close and already climbing. Eventually, he reaches the peak to find a treasured vista, up high as the sky -- where even the air is thin -- yet connected to the ground through a sheer mountain. He brings new insights back down to the ground with him, along with the story of how he got to that peak, and the challenges he faced to get there: a story that sounds too fantastical at the outset and yet undeniably true at the end.