Document Version Conflicts

The Problem

Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.

Symptoms

  • ❌ Old and new versions both retrieved

  • ❌ Conflicting information in responses

  • ❌ "v1" and "v2" docs both present

  • ❌ Cannot determine which is current

  • ❌ Stale info mixed with current

Real-World Example

Knowledge base contains:
→ "API_Guide_v1.0.pdf" (2022): Rate limit 100/hour
→ "API_Guide_v2.0.pdf" (2024): Rate limit 1000/hour
→ "API_Guide_v2.1.pdf" (2024): Rate limit 1500/hour

Query: "What's the API rate limit?"

Retrieved chunks from all three versions:
→ AI response: "Rate limit ranges from 100 to 1500 per hour
depending on version."

Confusing - which is current?
User wants latest only

Deep Technical Analysis

Version Tracking Challenges

No Version Metadata:

Version Detection:

Versioning Strategies

Explicit Version Metadata:

Version Lifecycle:

Archival vs Deletion

Keep Old Versions:

Delete Old Versions:

Conflict Resolution

LLM Arbitration:

Recency Boosting:


How to Solve

Track version explicitly in metadata (version number + is_latest flag) + implement version lifecycle (mark old as non-latest on new upload) + filter retrieval to is_latest=true by default + optionally delete old versions if no archival need + parse version from filename or document properties + boost recent versions in ranking + test that old versions don't appear in responses. See Version Control.

Last updated