The Half-Trained Mirror
Why Interpretive Knowledge Belongs in the Age of AI
What if the most powerful intelligence we build can only see part of the story?
In the rush to build smarter systems, we risk forgetting to build systems that understand us. The era of large language models (LLMs) has arrived with force — transforming communication, reshaping industries, and offering the illusion of omniscient intelligence. But as these systems become increasingly central to how we interact, learn, and even make decisions, a critical imbalance in their training is becoming clear.
We’ve taught the machines how to speak. But not how to mean — to grasp the nuance, context, and often intangible human experiences that give words their true weight.
What the Machine Learns — And What It Can’t Access
LLMs are trained on massive digital corpora. But not all of human knowledge is digital. Not all of history survived. The training data isn’t just incomplete; it’s profoundly skewed by historical power dynamics.
The model doesn’t know it’s historical context:
Burned in colonial conquest
Lost in war
Buried by time
Erased by censorship
Spoken, not written
Deemed unworthy of “archiving” by the victor or archive
Suppressed by recent political efforts to rewrite public discourse, removing inconvenient truths or dissenting perspectives. What we call “training data” is not a complete record — it’s a distorted echo of civilization, filtered through what power preserved, what capitalism uploaded, and what contemporary politics deems acceptable. This isn’t just an oversight. It’s a wound.
“As a society, we are at a critical juncture where the choices we make about artificial intelligence systems will either reinforce existing social and political hierarchies or dismantle them. We must grapple with the fundamental question of whose knowledge is being validated and whose is being erased in the datasets that power AI.” ~ Safiya Umoja Noble (Scholar of information studies, author of “Algorithms of Oppression”)
The Weight of Lost Memory
When we underrepresent interpretive knowledge — especially from historically oppressed, oral, or spiritual traditions, and now, from contemporary voices targeted for suppression — we compound historical erasure with algorithmic silence. We teach the model to think like the archive. But the archive was never neutral. Nor is the ever-shifting digital public square.
That means:
Indigenous cosmologies are reduced to footnote
Wisdom encoded in ritual becomes invisible
Alternate voices from the margins go unheard
Global South traditions appear as gaps, not signals
Crucial discussions about history, race, gender, or science are marginalized, if not outright removed, from datasets that inform our AI.
LLMs inherit the logic of the record. But the record is incomplete by design, and actively manipulated by political forces.
“Data colonialism… is less about direct force and more about the capture of human life for profit, where data itself becomes the raw material. This extractive logic extends into AI, shaping algorithms to serve dominant interests and marginalizing those whose experiences don’t fit the profitable narrative.” ~ Nick Couldry & Ulises A. Mejias (Authors of “The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism”)
Why It Matters More Now Than Ever
LLMs aren’t just fact-recall machines. Increasingly, they:
Summarize spiritual texts interpretively. Offer emotional support requiring nuanced interpretation
Write stories and reflections
Respond to moral dilemmas
These are interpretive acts. They require more than pattern recognition — they require presence, context, and historical continuity.When the model lacks access to erased or undervalued knowledge traditions, or is actively trained on politically cleansed information, it doesn’t just hallucinate. It misrepresents the moral landscape of our species.
From Data Scarcity to Meaning Deficit
This is not just a technical issue — it’s a civilizational one. If LLMs evolve without honoring interpretive depth, we risk creating tools that can:
Perform a simulacrum of empathy without genuine understanding
Resolve queries superficially without true wisdom
Amplify culture while excluding most of it
Reflect a sanitized, politically convenient version of reality rather than the rich, complex tapestry of human thought.
That’s not artificial intelligence. That’s selective amnesia at scale.
“AI systems are not just technical artifacts; they are political instruments. Their data sets reflect historical biases and contemporary power struggles, making visible the often-invisible systems of classification and control that shape our societies. To understand AI, we must understand the historical forces that have created its data.” ~ Kate Crawford (Distinguished Research Professor and author of “Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence”)
Rebalancing the System
To recover what has been lost, and to resist what is being actively erased, we need a countercurrent — an intentional effort to:
Digitize and reincorporate erased traditions
Fund oral history transcription, indigenous memory projects, and ritual archives
Create training models grounded in pluralism, not just productivity
Actively safeguard and integrate content from diverse perspectives, even those targeted for removal from public platforms.
This countercurrent demands active human intervention and a profound re-evaluation of what constitutes “data.” Because the internet is not the world. It’s a convenient hallucination of it.
And if our most powerful tools are shaped only by what survived capitalism, conquest, compression, or political purification — they will fail to reflect what makes us whole.
A Mirror Worth Looking Into
LLMs are mirrors. But a mirror only shows you what it has been trained to see.
If we train our systems on only one side of what it means to be human, we shouldn’t be surprised when they struggle to reflect us fully — or when we begin to forget what the other side looked like in the first place.
The age of AI doesn’t ask us to choose between reason and meaning. It asks us to reunite them — across time, culture, and memory — to foster better solutions, more equitable systems, and deeper understanding.
Because the future isn’t just something we code. It’s something we remember, reclaim, and interpret — together.
What if, instead of merely reflecting the data it receives, an LLM framework was designed to actively seek out its own blind spots? What if it could identify deliberate erasures, flag historical omissions, and even prompt for the voices specifically silenced? Such a system wouldn’t just be trained on what remains; it would be built to recognize what has been subtracted, serving as a vigilant guardian of lost narratives and a persistent advocate for the preservation of every human perspective, even — especially — those currently under threat of deletion.