Conflicting Sources in Context

The Problem

Retrieved chunks contain contradictory information from different documents, causing AI to provide inconsistent or confused answers.

Symptoms

❌ AI hedges: "Some sources say X, others say Y"
❌ Contradictory statements in response
❌ Unclear which source is authoritative
❌ Old vs new docs both retrieved
❌ Different product tiers mixed up

Real-World Example

Retrieved chunks:
→ Chunk A (2022 docs): "API rate limit is 100 req/hour"
→ Chunk B (2024 docs): "API rate limit is 1000 req/hour"

Both retrieved with similar scores

AI response: "The API rate limit is 100 requests per hour, though
some documentation indicates it may be 1000 requests per hour."

Confused answer - which is correct?

Deep Technical Analysis

Version Conflicts

Document Versioning:

Same document, multiple versions:
→ v1.0: Feature X works this way
→ v2.0: Feature X redesigned

Both embedded in knowledge base:
→ No version filtering
→ Both retrieved
→ AI must reconcile incompatible info

Timestamp Metadata:

Solution: Tag with version/date
{
  metadata: {
    version: "2.0",
    published_date: "2024-01-01"
  }
}

Retrieval filter:
→ Prefer latest version
→ Deprioritize old docs

Multi-Source Inconsistencies

Different Systems of Record:

Knowledge base contains:
→ Public docs: Rate limit 100/hour
→ Internal wiki: Rate limit 200/hour (recent increase)
→ API spec: Rate limit 1000/hour (not yet documented)

Which is authoritative?
→ No source prioritization
→ Equal weight
→ Conflicting retrieval

Source Trust Levels:

Assign authority scores:
→ Official API spec: Priority 1
→ Published docs: Priority 2
→ Internal wiki: Priority 3
→ Community forums: Priority 4

Retrieval boosting:
→ Boost high-priority sources
→ Use lower-priority only if high unavailable

Reconciliation Strategies

LLM Arbitration:

Prompt: "If sources conflict, prefer the most recent"

AI must:
→ Detect conflict
→ Check dates
→ Choose most recent
→ But: LLM not 100% reliable at this

Pre-Retrieval Filtering:

Better: Filter before retrieval
→ Only retrieve docs from last 6 months
→ Exclude deprecated content
→ Tag as "archived" and filter out

Prevents conflict at source

Citation Clarity

Source Attribution:

When conflict exists:
"According to API Specification v2.0 (Jan 2024), rate limit
is 1000/hour. Note: Older documentation (2022) states 100/hour
- this is outdated."

Explicit:
→ Which source
→ When published
→ Why one supersedes other

How to Solve

Tag chunks with version/timestamp + filter to most recent docs + assign source authority levels (official > community) + implement conflict detection + prefer highest-priority source + add "last updated" in responses + archive old docs with metadata flag + prompt LLM to prefer recent over old when conflict detected. See Source Conflicts.

PreviousIncomplete Context Assembly NextSource Ranking Issues

Last updated 18 minutes ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagVersion Conflicts

hashtagMulti-Source Inconsistencies

hashtagReconciliation Strategies

hashtagCitation Clarity

hashtagHow to Solve