If you're thinking without writing, you only think you're thinking.
- Leslie Lamport
Writing is such a foundational aspect of modern life that it is surprisingly hard to imagine how pre-literate societies functioned. Yet that's how humans spent the majority of our time on earth, i.e., without writing.
Anatomically modern Homo Sapiens have been around for 300K years, and while we likely had spoken language for 50K of those years, the earliest written script we know of is only 6K years old.
In other words, we've had the ability to record, inspect, and externally manipulate our thoughts for only 2% of our time as a species. But that 2% is all it took for us to go from tribal societies to globe-spanning empires. For us to send man to the moon and manipulate our own genes.
“Until writing was invented, men lived in acoustic space: boundless, directionless, horizonless, in the dark of the mind, in the world of emotion, by primordial intuition, by terror…”
- Marshall McLuhan
What was the human mind like before it knew writing/reading? Was it as primitive and groping as McLuhan would have us believe? The answer, it turns out, reveals something profound about how intelligence works - and offers a surprising window into understanding modern AI systems.
The Mind Before Writing
To understand the mind of pre-literate cultures, the great Icelandic Sagas are a good starting point. Unlike the much older Homeric and Indian epics whose conversion from oral to written form happened so far back that it is hard to know how much was changed in the process, the Icelandic sagas are particularly relevant. The recency of their textualization gives us a pristine window into a rare transitional stage between oral tradition and written literature. This is because Iceland, despite boasting 99%+ literacy today, was relatively late to the writing game, remaining a pre-literate and oral society until the 13th century.
Take Njal's Saga, for instance. Passed down for centuries by generations of oral story tellers until being committed to paper in the 13th century, it is a stunning and psychologically rich tale about friendship, justice, and petty, tragic revenge. The second half revolves around a bitter feud that develops between the wives of two close friends: Njal, a wise legal sage, and Gunnar, a newly married Norse warrior. The conflict arises from a seemingly minor incident. At a feast held at the warrior Gunnar's home, Njal's wife is asked to give up her usual seat to accommodate Gunnar's new wife, the beautiful and manipulative Hallgeror. Njal's wife considers this an insult and bitter words are exchanged. What follows is a cycle of revenge killings between their households, starting with the murder of a servant, and escalating to more and more important members of the household, until everyone is literally burned down (hence the title "Burnt Njal"). While the two old friends, Njal and Gunnar, try to repeatedly settle the killings with financial compensation (weregild or "man price") quoting the proverb "With laws shall our land be built up, but with lawlessness laid waste", their wives perpetuate a blood feud that ultimately consumes both families.
While this would create riveting drama on HBO or Netflix even today, a close reading of Njal's saga and its like reveals something crucial about pre-literate cognition. It turns out pre-literate people had several clever techniques to help them compose and recall long stories without the help of writing. They made use of rhythm and repeated phrases, mnemonics, exaggerated archetypal characters, and sayings and proverbs extensively to aid their memory. While the main story, names, and key events stayed the same, oral storytellers could improvise the details, which of course changed over time.
One of the most interesting aspects of oral thought is how much it relied on memorable proverbs and sayings to do the heavy lifting for it. In Njal's saga, for instance, the legal proceedings following each revenge-killing use proverbs and old sayings to reason about the law and to come up with appropriate judgements. Sayings such as: "Brother shall compensate for brother", or "One shall always pay half-compensation for work of the hands", and so on.
"The law itself in oral cultures is enshrined in formulaic sayings and proverbs, which are not mere jurisprudential decorations, but themselves constitute the law." - Walter Ong
These proverbs, or "wisdom formulas," as anthropologist Walter Ong calls them, were not only aids to memory but also the building blocks for complex reasoning. So much so that he argues, in oral cultures, proverbs "form the substance of thought itself. Thought in any extended form is impossible without them, for it consists in them."
This insight - that complex thinking can emerge from combining simple, memorable formulas - turns out to be relevant to understanding how artificial intelligence in the form of LLMs works today. But first, let’s take a look at how writing (and reading) transformed the way we think.
How Writing Changed Our Consciousness
"Since in a primary oral culture... knowledge that is not repeated aloud soon vanishes, oral societies must invest great energy in saying over and over again what has been learned arduously over the ages." - Walter Ong
The advent of writing transformed human societies and self-reflective consciousness itself in two major ways.
The most obvious change is that writing created a stable source of truth for everything from accounting and administration to science and philosophy. At the same time it enabled high-fidelity communication, replication, and coordination over vast distances and time periods. Because ideas could now outlive their authors, progress could start compounding.
But the second transformation was more subtle and profound: writing fundamentally reshaped our interior landscape. It changed how we think.
In his seminal book "Orality and Literacy", Walter Ong explores how pre-literate cognition was fundamentally different from modern thought. In oral cultures, thinking tended to be situational and concrete (as opposed to abstract and analytical). It was somatically embodied, intuitively-felt, and emotionally-driven. Ideas were additive rather than subordinative (using "and...and..." rather than more complex sentence structures). It was often agonistic (as in verbal dueling) and redundant (repetition aids memory).
With the advent of writing our thinking became much more abstract and analytical. Writing created a separation between the knower and the known, reducing subjectivity and increasing objectivity. It also helped us develop a new kind of self-aware and reflective consciousness that was lacking in oral cultures.
How Writing Generates New Kinds of Ideas
"Writing is not just a way to convey ideas, but also a way to have them."
- Paul Graham
Our minds, remarkable as they are, have a hard time with complex chains of reasoning purely internally. Most of our thinking remains frustratingly vague until we externalize it through writing (or drawing). We come to grips with this whenever we try to write about new ideas or try to reason through anything non-trivial. Writing is hard because it forces us to use our System 2 thinking - the slow, hard, and reflective kind.
Without the clarity writing enables, the ideas in our heads are dangerously fuzzy. The act of putting pen to paper gives shape to the inchoate mass of thought-feelings in our heads, revealing gaps in our reasoning and our knowledge, exposing the brittleness of our logic, the shallowness of our certainty. Writing isn't just about recording our ideas, it's also about examining them critically and generating new kinds of ideas.
Paul Graham observes that “a good writer will almost always discover new ideas in the very act of writing.” There is, he notes, no substitute for this kind of discovery. Talking about your idea helps, but writing it down changes it and creates new ideas.
By externalizing the symbols of our language and committing them to physical space we achieve new ways to examine, manipulate, and reflect on our thoughts. Part of this is due to the expanded scratch-space our mind gets to examine our ideas with. We become a kind of detached observer of our own thoughts. No longer limited by the fuzzy constraints of our short-term memories and automatic System 1 thinking, we can zoom in and out, view both the forest and the trees.
Writing also takes advantage of our ancient visual brain which hundreds of millions of years have optimized to instantly spot hidden patterns and make causal inferences. This allows us to discover new threads and connections, giving us many more a-ha moments.
In essence, writing created a cognitive amplifier for our slow System 2 thinking, opening up new and more complex ways of reflection, ideation, and expression. It produced new forms of culture. We built cities and empires. We landed on the moon.
This transformation from oral to literate thinking offers a lens for understanding what's happening in artificial intelligence today - particularly in the leap from basic (non-reasoning) large language models (LLMs) to reasoning AI systems.
Proverbs as Wisdom Formulas -or- Magic Spells for Intelligence
"Red in the morning, the sailor's warning; red in the night, the sailor's delight." "Divide and conquer." "To err is human, to forgive is divine." [...] Fixed, often rhythmically balanced, expressions of this sort and of other sorts can be found occasionally in print, indeed can be 'looked up' in books of sayings, but in oral cultures they are not occasional. They are incessant. They form the substance of thought itself. Thought in any extended form is impossible without them, for it consists in them." - Walter Ong
So if writing was so crucial for complex reasoning and coming up with new and well-baked ideas, how did pre-literate societies accomplish as much as they did in the absence of writing? How did they create lengthy sagas and develop sophisticated navigational systems and clever oral mathematics? As we saw earlier, one answer lies in their extensive usage of thinking aids such as sayings and proverbs - what Ong calls "wisdom formulas" - which were “incessantly” used in oral communication.
These proverbs weren't just memory aids or colorful expressions for oral cultures. They were compressed intelligence - building blocks for more complex reasoning. In essence, proverbs are memorable shortcuts to frozen intelligence, giving people quick access to the output of higher reasoning without having to derive everything from first principles themselves.
Indeed, much of thinking in oral cultures was stringing together these wisdom formulas, letting them do the heavy lifting for reason, discourse, and action.
This connects to something fundamental about how language creates meaning.
Language as Vectors into Meaning
Isn’t it amazing how knowing a language gives you the power to understand every single possible future sentence in that language? Consider the phrase "golden mountain”. Even though no such thing exists, you can immediately picture it: gleaming in sunlight, its slopes shining with luster. By invoking this odd pairing of words, we’ve created a new kind of mental concept, an idea that implicitly has attributes of both gold and mountains.
Language isn't just about stringing words together with syntactic coherence. It is a magic key to a certain kind of intelligence. Each valid combination of words can unlock new understanding - a new way to see reality.
How can combinations of words create new meanings? Because words and concepts don’t exist in isolation. Because meanings of words have the necessary semantic tentacles to attach to other words and other meanings.
Because meaning is nothing but connective tissue. It connects one concept to another, one sentence to another, one word to another, until ultimately deep at the roots of meaning, it touches on our base perception of reality, usually through our biological and evolutionary priors (we call this the grounding).
The web of hidden semantic tentacles that connect words, concepts, and ideas to form our understanding of the world can be thought of as a "latent space" of meaning. Each symbol, word, phrase, or sentence acts as a vector pointing into this space of meaning-connections.
When we write "gravity," we're not just naming a force, we're accessing a web of connected concepts about mass, acceleration, universal attraction, and apples falling on unsuspecting heads. There's a universe of meaning accessible through combinations of these vectors, and we simulate this universe in our minds whenever we read or hear language.
Of course, much of this simulated universe is fictive, untethered from reality. It’s built from connections in the latent space of meaning that may have no parallel in reality. Language is powerful because it isn’t constrained by reality. We can just imagine things.
But through physical action, fictional meaning can to be carved into reality. Before we put a rocket in space we had a name for it, a spell to invoke the meaning of a rocket. (Perhaps this is why names were revered by the ancients).
This is also why much of our thinking operates through metaphor and analogy. When Einstein described gravity as a consequence of the "curvature of space and time," he summoned up a conceptual framework that fast-forwarded our understanding of the theory of relativity. When we use the term "genetic code," we're using language spells to conjure an image of genes encoding information that determines who we become.
From this perspective, "a stitch in time saves nine" is more than simplistic proverbial advice. It's a compressed formula for understanding causality and preventive action. Besides entertaining people, the true function of epic sagas was as repositories of social knowledge, moral reasoning, and strategic thinking.
From a computational sense, Ong’s “Wisdom Formulas”, the proverbs that people of Njal’s time relied on to make decisions are thought heuristics, miniature programs, algorithms that can be strung together to access intelligence from our collective past. This has parallels to how large language models (LLMs) work.
LLMs as Giant Repositories of Latent Wisdom Formulas
"Information is not stored anywhere in particular. Rather, it is stored everywhere. Information is better thought of as 'evoked' than 'found.'"
---David Rumelhart and Donald Norman (about early neural networks)
When a large language model predicts the next token in a sequence, it's engaging in a discovery process through a vast space of meaning-relationships. Each step in its thinking process is like accessing the latent space of meaning captured by the patterns embedded in trillions of words of human writing.
In our analogy you can imagine that each symbol/token has sticky semantic tentacles of meaning that could attach it to other tokens, and the stickiness of each tentacle changes depending on the context given by prior tokens (i.e., the direction of the meaning-vector). The LLM chooses the path which makes the most sense given the meaning of all the past tokens. How does the LLM know which path to take? Which is the stickiest tentacle for a given token context? The answer lies in the billions of patterns of thought it has learned from the data.
Deep neural networks that power LLMs are great at generalizing from data in a local, interpolative sense. That is, if you give a neural network two points on a graph, it can learn to complete the line that passes through the two points (interpolation). Of course, because the algorithm tries to find the easiest path, it may also simply memorize the two points. Usually a deep network does both, memorize a bit and generalize a bit.
But what happens when you give a neural network the entire corpus of human writing - trillions of words, capturing nearly every thought ever recorded? It turns out it will memorize a lot of the writing, but it will also absorb much of the underlying deep structure of meaning, the lines that connect the conceptual dots which the vectors of language point towards.
There's something that connects the concept of "redness" with disparate other concepts like paint, Ferraris, warmth, Homer’s wine-dark seas, blood, wavelengths of light, roses, and so on. All the way down to the physical grounding of photons striking the red-sensitive cones in our eyes. This sea of implicit connections is the “latent” space of meaning, extracted from patterns in the data, describing hidden relationships within it - associative connections ("Ferraris are often red"), conditional relationships ("if an apple is red, then it's typically ripe"), causal relationships ("striking a match leads to fire"), and countless others.
By extracting these deep, implicit patterns, LLMs learn millions of miniature reasoning formulas that reflect both concrete skills and abstract models about reality. Of course, for LLMs these “wisdom formulas” aren't literal proverbs or pieces of text, but complex mathematical relationships that guide the model's predictions. Each “wisdom formula” acts as a gravitational force pulling the vector of meaning in its direction.
To put it simply, LLMs rely on a giant repository of pre-existing thought patterns and heuristics, reasoning formulas (abstracted from human writing) that help them interpolate new responses. In this sense, they function remarkably like the oral minds of pre-literate cultures, drawing on compressed wisdom to navigate complex situations.
Why Basic LLMs Resemble Pre-literate Minds
One of the key limitations of non-reasoning LLMs (like GPT-4 as opposed to reasoning models like O3) is that they are trained to generate answers efficiently without extended deliberation. As long as the answer is straightforward and there's a set of patterns and heuristics from their training that can be applied, they can produce good results quickly.
But when tasks require complex reasoning that doesn't have a ready-made set of “wisdom formulas” embedded in their training patterns, these models face challenges. Once an LLM starts down the wrong reasoning path, it has limited ability to step back, reflect, and explore alternative approaches. The model is constrained by the narrow set of trajectories available from its current position. The vector of meaning is relentlessly pulled towards the direction that some wisdom formula deems appropriate.
In this way, basic LLMs are more like oral, pre-literate minds. They rely heavily on interpolating between compressed "wisdom formulas" (i.e., mathematical models in latent space) and memorized heuristics to do their thinking. They lack the affordance of lengthy reflections on externalized thoughts to retrace their steps and develop more complex reasoning chains.
This limitation becomes apparent in tasks requiring multi-step logical reasoning, or problems where the obvious first approach is incorrect. Just as oral cultures sometimes struggled with abstract reasoning that couldn't be encoded in memorable proverbs, basic LLMs can struggle when their learned patterns don't directly apply.
Chain of Thought: Writing's Gift to AI
The breakthrough of reasoning AI systems can be understood through our oral-to-literate analogy. These systems work in two complementary ways:
First, they're trained on more examples of human reasoning traces - the "rough drafts" of thinking that show the process of working through problems, not just the final polished answers. Much of the internet which serves as training data for LLM contains our finished writing but less of the iterative process of exploring ideas, finding flaws, and retracing steps. So the LLM training data has to be supplemented with these extended reasoning traces.
Second, and more importantly, these models are rewarded for taking time to "think through" problems step by step before arriving at answers. This is equivalent to giving LLMs the affordance of writing - encouraging them to externalize their reasoning process in a way that allows for reflection and course-correction.
In combination, these systems help LLMs avoid their previous tendency to rely on existing just-pat wisdom formulas to guide the trajectory of the token generation. By allowing LLMs to travel down longer paths the chances of grasping a better semantic tentacle for the next token becomes higher (the next token is picked with a weighted coin flip from among the top N choices). Thus changing the trajectory of meaning. As the number of chances to course-correct increases, it also increases the possibility of having a genuine a-ha moment, or even the creation of a brand new, never-seen-before wisdom formula.
The process of "writing out" tokens through chain of thought and then, through self-attention, reflecting on that process, allows for the same kind of leap in intelligence from reasoning LLMs as writing did for pre-literate societies.
In this sense, the reasoning LLM is like the literate mind, having freshly discovered the affordance of writing.
New Magic for a Post-LLM World
It won't be long before there is a burgeoning field of LLM psychology concerning itself with understanding LLM behavior and the philosophical implications. Already Karpathy calls LLMs "people spirits" given how they are stochastic simulations of people created from the universe of all our thoughts.
Despite the pitfalls of anthropomorphizing LLMs (or mechanizing human intelligence) as we happily did throughout this essay, there is often valuable insight lurking in these analogies. Even if they are ultimately flawed (all analogies are wrong but some are useful).
For instance, as we compare non-reasoning LLMs to oral minds and reasoning LLMs to literate minds, we come to better grips with both the limitations and possibilities of different kinds of human intelligence. Without writing, we were cognitively constrained in ways not unlike non-reasoning LLMs. We overcame the shortcomings of our interior mental processes by first relying on tricks like mnemonics and proverbs, and later by externalizing our thoughts in the form of writing. And those same tricks seem to work for LLMs too (for both similar and not-so-similar reasons).
But we didn't stop evolving after we learned to write. On top of writing, we developed mathematics, scientific notation, programming languages, collaborative knowledge systems, and now we have AIs as thought partners. Each new system dramatically expanded our cognitive reach.
What new forms of "writing" might we invent for artificial minds? We already see this happening with external tool usage by LLMs - calculators, web searches, code execution, and more. But there are many more ways to extend their intelligence in both human-like and entirely non-human ways. What cognitive leaps become possible when AI systems can create and manipulate their own symbolic representations, or collaborate across vast networks of interconnected intelligence?
On the other hand, how will the extended cognitive surface of working with AIs change our own intelligence? What will we gain and what will we lose? We lost much of our somatic and communal intelligence when we became literate. We lost our intuitive connection with the natural world, and the kind of spatial intelligence that helped us navigate the seas and chart the skies without the aid of writing. What might we sacrifice or transform as AIs become more and more embedded into our everyday thought processes and into all the tools we use?
The proverb-wielding storytellers of Njal's time could never have imagined the cognitive landscape that writing would eventually make possible. Similarly, we may be standing at the threshold of forms of intelligence that our current frameworks can barely comprehend. The magic spells we're teaching machines today may be just the first words in a much longer saga of our collective cognitive evolution.
PS: If you liked what you read, hit the little heart button below so that Substack helps other people find this post