The Confident
Liar
A lawyer cited six cases that never existed. Deloitte invoiced two governments for reports with fabricated academic sources. AI doesn't just get things wrong — it gets things wrong with absolute certainty. That's the problem.
01 — What hallucination actually is
AI hallucinations are not glitches. They are a structural feature of how large language models work. LLMs don't retrieve facts from a database — they predict the statistically most probable next word based on patterns learned during training. When the training data runs thin, or when a question requires precise knowledge the model doesn't have, it doesn't stop. It continues predicting. It generates something that sounds correct, in the format of something correct, with the confidence of something correct.
The model invents a fact that has no basis in reality — statistics, events, people, relationships — and presents it as established knowledge.
The model generates plausible-looking references — real-sounding journals, real-sounding authors, real-sounding titles — that do not exist. URLs may look valid. Authors may be real but never wrote the cited work.
The model faithfully summarises a real document — but introduces facts, conclusions, or quotes that were never in the source. It stays "on topic" while inventing the detail.
When AI models hallucinate, they use more confident language than when they are correct. Models are 34% more likely to say "definitely," "certainly," and "without doubt" when generating incorrect information.MIT The more wrong the model is, the more certain it sounds. Confidence is not a signal of accuracy. It is the opposite.
AI hallucination isn't a bug to be patched — it's a structural feature of how LLMs work. A 2025 mathematical proof confirmed it cannot be fully eliminated under current architectures.
This changes how you should interact with every AI tool. The confident, authoritative tone isn't evidence of accuracy — it's evidence that the model successfully predicted a fluent-sounding sequence. Treat every AI output as a draft that requires verification, not a source of truth.
02 — How bad is it, actually?
Hallucination rates vary enormously by model, task type, and how you measure them. The best models, on controlled summarisation benchmarks, are now below 1%. But "summarising a short document" is not the same as "answering a complex question about a real person, case, or event" — and on those harder tasks, rates climb dramatically.
The headline 0.7% rate is for controlled document summarisation. For the tasks most people actually use AI for — research, citations, complex questions about real people and events — rates climb to 33–48%.
The gap between marketing claims and real-world performance is enormous. When a vendor says "our model hallucinates less than 1%," ask: on what benchmark, for what task? The number that matters is the one for your use case, not the one on the leaderboard.
03 — When it goes wrong at scale
These are not hypothetical examples. Every case below is documented and consequential. Taken together, they reveal a pattern: hallucinations cause the most damage when they occur in high-trust, high-stakes contexts — precisely the environments where AI adoption is being pushed hardest.
Mata v. Avianca: The Ghost Docket
A New York attorney used ChatGPT to draft court filings and submitted six completely fabricated legal citations — real-sounding case names, real-sounding courts, real-sounding outcomes. When opposing counsel challenged them, the attorney claimed he didn't know ChatGPT was generative rather than a legal database. Federal judge sanctioned both lawyers.MATA By May 2025, lawyers were submitting hallucinated content to US courts at a rate of two to three cases per day.DC
Deloitte: Two governments, two hallucinated reports
In July 2025, Deloitte submitted a $290,000 report to the Australian government containing fabricated academic citations, non-existent footnotes, and an invented quote from a federal court judge. Weeks later, a $1.6M CAD (~$1.13M USD) Deloitte report for the Canadian government on healthcare workforce was found to contain similar AI-generated errors including fictional academic papers.DEL Both reports were live on government websites before anyone noticed.
OpenAI Whisper: Words that were never spoken
OpenAI's Whisper speech-to-text model, widely deployed in healthcare settings for clinical transcription, was found to hallucinate content during transcription — inserting words and phrases that were never spoken, including violent language, racial references, and invented medical treatment names.WHSP Even the best models hallucinated potentially harmful medical information 2.3% of the time when tested on medical questions.
Chicago Sun-Times: The books that don't exist
Readers of the Chicago Sun-Times discovered that a "Summer Reading List for 2025" included 10 fabricated book titles attributed to real authors — books that were convincingly described but had never been written. Only 5 of 15 titles were real. The list had been generated by AI by an outside content supplier and published in the print edition.AAI In Q1 2025 alone, 12,842 AI-generated articles were removed from online platforms due to hallucinated content.
Hallucinations cause the most damage in high-trust, high-stakes contexts — law, medicine, government policy — precisely the environments where AI adoption is being pushed hardest.
The pattern is clear: the consequences scale with the authority of the context. A hallucinated email is an embarrassment. A hallucinated court filing is a career-ending sanction. A hallucinated medical transcription is a patient safety incident. Know the stakes before you deploy.
04 — The business cost
Hallucinations are not a reputational problem — they are a financial one. The cost accumulates through lost time, remediation work, legal exposure, and decisions made on bad information.
Legal exposure timeline
Mata v. Avianca: first major US case.MATA 7 in 10 hallucination court cases involve self-represented litigants. Judges begin issuing standing AI disclosure orders.
Walters v. OpenAI: defamation claim over hallucinated output proceeds through courts. Air Canada chatbot ruling sets corporate liability precedent. Courts establish that hallucinations about real people are "publication risks."
By May 2025, lawyers submit hallucinated AI content to US courts at 2–3 cases per day.DC 13 of 23 court-caught hallucination cases are now the fault of qualified legal professionals, not laypeople.
Researcher Damien Charlotin's AI Hallucination Cases Database now tracks 1,081 documented court cases globally.DC The hallucination detection market grew 318% between 2023 and 2025 as enterprises scrambled for solutions.
This isn't a reputational risk — it's a financial one. At $14,200 per employee annually and 4.3 hours weekly spent fact-checking, the "productivity gains" from AI are being significantly offset by the verification tax.
Factor this into every AI ROI calculation. The 47% figure — nearly half of enterprise AI users making decisions on hallucinated content — means your organisation probably has unaudited AI exposure right now. Map it before it maps you.
05 — What actually reduces hallucinations
The industry is not standing still. Four mitigation strategies have demonstrated measurable results — though none eliminates the problem entirely, because elimination is architecturally impossible under current LLM design.
Instead of answering from parametric memory, the model retrieves relevant documents from a verified knowledge base before generating a response. Grounds output in actual source material. Most effective for domain-specific enterprise deployments where the knowledge base can be curated and verified.
Requiring the model to cite a source for every factual claim — and then verifying that the cited source exists and contains the claimed information — catches fabricated citations before they leave the system. Perplexity's model is built on this principle; even so, it hallucinated 37% of citations in the Columbia Journalism Review test.CJR
76% of enterprises now run explicit human review processes before AI outputs are used in high-stakes decisions.DRN Not a technical solution — it's the acknowledgement that current AI cannot be trusted without human oversight. The cost: 4.3 hours per worker per week.MSFT
The most underused technique. Models can be trained to abstain from answering when uncertain rather than fabricating.MIT MIT research shows models that abstain 52% of the time dramatically cut error rates. The industry pressure to be helpful works against abstention — but this is where architecture and incentives diverge most dangerously.
No single technique eliminates hallucinations — because elimination is architecturally impossible. The winning strategy is layered: RAG for grounding, forced citations for verification, human review for high-stakes decisions, and abstention for honest uncertainty.
If you're deploying AI in production, you need all four. If you're using AI personally, the minimum is simple: verify every important claim before acting on it. When an AI says "I don't know" — that's the system working correctly, not failing.
global business losses from AI hallucinations in 2024 aloneAAI
more confident language when AI generates incorrect information than correctMIT
documented court cases involving AI hallucinations globally and countingDC
to protect yourself from AI hallucinations.
Subscribe to Veltrix Collective for weekly AI reality checks. We track hallucination rates, mitigation tools, and case law so you don't have to. Every Tuesday, data-backed, jargon-free. One email that keeps you ahead of the curve.
Never trust a confident AI response without verification. MIT research shows AI is 34% more likely to use confident language when wrong. Treat certainty as a red flag, not a green light. When ChatGPT, Claude, or Gemini sounds absolutely sure — that's when you check.
Verify every AI-generated citation before using it. Open the source. Check it exists. Read it. The Columbia Journalism Review found even the best citation-aware model gets 1 in 3 citations wrong. Use Google Scholar, Semantic Scholar, or CrossRef to verify before you cite.
Set up a RAG pipeline for any repeated high-stakes AI task. Tools like Claude with file uploads, ChatGPT with browsing, or enterprise solutions like LangChain and n8n can ground AI responses in verified documents. This cuts hallucinations by ~71% on grounded tasks.
Audit where AI outputs enter your decision process this week. The 47% figure — nearly half of enterprise AI users making decisions on hallucinated content — means your organisation probably has unaudited AI exposure. Map it. Then add human checkpoints at every high-stakes node.
Source references
06 — Don't get caught out
not its hype
Weekly data briefings on what AI actually does — the hallucinations, the breakthroughs, and the tools that separate signal from noise. No hype. Just what you need to use AI safely.