Hallucination Despite Retrieved Context

The Problem

LLM adds fabricated details even when relevant context is provided, mixing real retrieved information with invented facts.

Symptoms

  • ❌ Adds details not in context

  • ❌ Embellishes with plausible but false info

  • ❌ Correct facts + wrong details combined

  • ❌ Cannot distinguish source of claims

  • ❌ Confident delivery of mixed truth/fiction

Real-World Example

Retrieved context:
"Premium plan includes 5 team members and 100GB storage"

User query: "What's in premium plan?"

AI response: "Premium plan includes 5 team members, 100GB storage,
priority email support (24h response), and access to beta features."

Context ONLY mentioned: 5 members, 100GB
AI INVENTED: Priority support, beta access

Deep Technical Analysis

Retrieval-Generation Gap

Incomplete Context:

The Helpful Assistant Dilemma:

Pattern Completion

Training Data Influence:

Weak Grounding

Instruction Adherence Limits:

Citation as Constraint:


How to Solve

Require citations for all claims + use explicit prompts: "If not in context, say 'not available in documentation'" + implement two-stage: extract facts first, then answer using only extracted + use models fine-tuned for RAG (instruction-following) + apply post-generation fact-checking against context + penalize hallucination in eval metrics. See Hallucination Prevention.

Last updated