# Document Version Conflicts

## The Problem

Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.

### Symptoms

* ❌ Old and new versions both retrieved
* ❌ Conflicting information in responses
* ❌ "v1" and "v2" docs both present
* ❌ Cannot determine which is current
* ❌ Stale info mixed with current

### Real-World Example

```
Knowledge base contains:
→ "API_Guide_v1.0.pdf" (2022): Rate limit 100/hour
→ "API_Guide_v2.0.pdf" (2024): Rate limit 1000/hour
→ "API_Guide_v2.1.pdf" (2024): Rate limit 1500/hour

Query: "What's the API rate limit?"

Retrieved chunks from all three versions:
→ AI response: "Rate limit ranges from 100 to 1500 per hour
depending on version."

Confusing - which is current?
User wants latest only
```

***

## Deep Technical Analysis

### Version Tracking Challenges

**No Version Metadata:**

```
Common problem:
→ Documents ingested without version tracking
→ Metadata: {document_id: "api_guide"}
→ No version field

New version ingested:
→ Same document_id
→ Both coexist
→ Cannot distinguish
```

**Version Detection:**

```
Filename-based:
→ "guide_v1.pdf", "guide_v2.pdf"
→ Parse version from filename

Metadata-based:
→ Document properties: Version 2.1
→ Last modified: 2024-03-15

Content-based:
→ "Version 2.0" in document text
→ Less reliable
```

### Versioning Strategies

**Explicit Version Metadata:**

```
Store with each chunk:
{
  document_id: "api_guide",
  version: "2.1",
  published_date: "2024-03-15",
  is_latest: true
}

Retrieval filter:
WHERE document_id = "api_guide" AND is_latest = true
→ Only get current version
```

**Version Lifecycle:**

```
New version ingested:
1. Set all existing chunks: is_latest = false
2. Add new chunks: is_latest = true
3. Optionally: Delete old versions (if no archival need)

Automatic currency
```

### Archival vs Deletion

**Keep Old Versions:**

```
Reasons to archive:
→ Compliance (retain historical docs)
→ Support legacy product versions
→ Audit trail

Strategy:
→ Keep but mark as archived
→ Filter out by default
→ Available on explicit request

Example filter:
WHERE is_latest = true OR (version = "1.0" AND user_needs_legacy)
```

**Delete Old Versions:**

```
Simpler approach:
→ New version → delete old chunks entirely
→ Only current version exists

Pros:
+ No confusion
+ Cleaner
+ Lower storage

Cons:
- No historical reference
- Cannot support legacy
```

### Conflict Resolution

**LLM Arbitration:**

```
If multiple versions retrieved:
→ Prompt: "Prefer latest version"
→ AI should cite v2.1 over v1.0

But: Requires LLM to detect versions
→ Not 100% reliable

Better: Filter at retrieval
```

**Recency Boosting:**

```
Boost recent documents:
→ score = similarity_score * recency_boost
→ recency_boost = 1.0 + (days_since_publish / 365)

Recent docs rank higher:
→ v2.1 (2024) beats v1.0 (2022) in ranking
```

***

## How to Solve

**Track version explicitly in metadata (version number + is\_latest flag) + implement version lifecycle (mark old as non-latest on new upload) + filter retrieval to is\_latest=true by default + optionally delete old versions if no archival need + parse version from filename or document properties + boost recent versions in ranking + test that old versions don't appear in responses.** See [Version Control](/rag-scenarios-and-solutions/data-quality/version-conflicts.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/data-quality/version-conflicts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
