# Multi-Source Sync Conflicts

## The Problem

When syncing the same content from multiple data sources, conflicts arise causing duplicates, inconsistent updates, or lost changes.

### Symptoms

* ❌ Same document appears twice in search results
* ❌ Conflicting information from Confluence vs Notion
* ❌ Update in one source doesn't reflect in AI agent
* ❌ Can't determine which source has latest version
* ❌ Duplicate embeddings wasting storage

### Real-World Example

```
"API Authentication Guide" exists in:
1. Confluence: updated 2 days ago (v1.2)
2. Notion: updated yesterday (v1.3)
3. Website /docs/auth: updated today (v1.4)

User asks: "How do I authenticate?"
AI response includes mix of v1.2, v1.3, v1.4 steps
→ Inconsistent, confusing answer
→ User doesn't know which is correct
```

***

## Deep Technical Analysis

### Content Duplication Across Sources

Same logical content exists in multiple systems:

**The Multi-System Reality:**

```
Modern organizations:
→ Documentation in Confluence
→ Same content copied to Notion (team wiki)
→ Published version on website (public docs)
→ Synced to Zendesk (support articles)
→ Discussed in Slack threads

Result:
→ 5 copies of "same" content
→ Each with slight variations
→ Different update timestamps
→ Different authors
```

**Semantic Duplication Detection:**

```
Simple approach: Exact match
→ Compare title + content hash
→ Only catches identical copies
→ Misses: rewording, formatting differences

Advanced: Semantic similarity
→ Embed both documents
→ Compute cosine similarity
→ If > 0.95: Consider duplicates

But:
→ Expensive (embed everything twice)
→ Threshold tuning (0.95 vs 0.90 vs 0.85?)
→ False positives ("Getting Started" guides for different products)
```

**The Partial Overlap Problem:**

```
Confluence doc: "Full API Reference" (10,000 words)
Website doc: "API Quick Start" (1,000 words)

Website doc is subset of Confluence doc:
→ Not duplicates
→ But overlapping content
→ Should both be in knowledge base?

Or:
→ Confluence: Sections A, B, C
→ Notion: Sections B, C, D

50% overlap, 50% unique
→ Keep both? (redundancy)
→ Merge? (how?)
→ Prefer one? (which?)
```

### Conflict Resolution Strategies

When the same content differs across sources:

**Last-Write-Wins (LWW):**

```
Strategy: Use most recently updated version

Example:
→ Confluence: updated 2024-01-15
→ Notion: updated 2024-01-20
→ Keep: Notion version (newer)

Pros:
+ Simple logic
+ Respects recency

Cons:
- Ignores authority (maybe Confluence is canonical)
- Timestamp accuracy issues (clock skew)
- May prefer minor edit over substantial content
```

**Source Priority:**

```
Strategy: Assign priority to sources

Configuration:
1. Website (canonical, public)
2. Confluence (internal docs)
3. Notion (team notes)
4. Slack (informal)

Conflict resolution:
→ Same content in Website + Confluence
→ Keep: Website (higher priority)
→ Discard: Confluence duplicate

Pros:
+ Respects organizational hierarchy
+ Deterministic

Cons:
- Requires manual priority configuration
- Lower-priority sources may have newer info
- Not always clear which is canonical
```

**Multi-Version Storage:**

```
Strategy: Keep all versions, tag by source

Vector DB:
→ "API Auth Guide" from Confluence (chunk_conf_1)
→ "API Auth Guide" from Notion (chunk_notion_1)
→ "API Auth Guide" from Website (chunk_web_1)

Retrieval:
→ User query matches all 3
→ LLM sees all versions
→ Synthesizes answer or highlights conflicts

Pros:
+ No information loss
+ LLM can resolve conflicts
+ User sees full picture

Cons:
- 3x storage cost
- Retrieval noise (too many chunks)
- LLM may get confused by conflicts
```

### Update Propagation and Consistency

Changes in one source don't auto-propagate to others:

**The Update Lag Problem:**

```
Timeline:
10:00 AM: User updates Confluence (adds new section)
10:30 AM: Twig syncs Confluence → knowledge base updated
12:00 PM: User asks AI question → Gets new info ✓

But:
→ Notion still has old version
→ Website still has old version
→ No sync triggered for these sources

Next day:
→ Notion syncs (still old content)
→ Now knowledge base has conflicting chunks:
   - Confluence chunks (new, correct)
   - Notion chunks (old, stale)

AI retrieval may return mix of both
```

**The Content Drift Problem:**

```
Initial state (all synchronized):
→ Confluence: "Use API key in header"
→ Notion: "Use API key in header"
→ Website: "Use API key in header"

Month 1: Confluence updated to "Use Bearer token"
Month 2: Website updated to "Use OAuth 2.0"
Month 3: Notion never updated (abandoned)

Current state:
→ Confluence: "Bearer token"
→ Website: "OAuth 2.0"
→ Notion: "API key" (stale)

All three in knowledge base, all retrieved
→ AI gives inconsistent answer with 3 methods
→ User confused which to use
```

### Bidirectional Sync Impossibility

Most integrations are unidirectional:

**The Read-Only Problem:**

```
Twig's integration:
→ Reads from Confluence ✓
→ Writes to Confluence ✗ (not implemented)

Ideal bidirectional sync:
1. User updates Confluence → Twig syncs
2. Twig updates Notion with same change
3. Twig updates Website
4. All sources stay consistent

Reality:
→ Each source has its own auth/permissions
→ Each has different write APIs
→ Each has unique content structure
→ Automated writes risk data corruption
→ Twig is read-only by design (safer)
```

**The Manual Reconciliation:**

```
Current workflow:
1. User updates Confluence
2. Twig syncs Confluence
3. User must manually:
   → Copy changes to Notion
   → Update website repo (commit + deploy)
   → Update Zendesk article
4. Twig syncs each source independently

Human in the loop:
→ Error-prone
→ Time-consuming
→ Often forgotten
→ Leads to divergence over time
```

### Metadata Conflicts and Merging

Beyond content, metadata can conflict:

**Author Conflicts:**

```
Same document:
→ Confluence: author = john@company.com
→ Notion: author = sarah@company.com
→ Website: author = docs-bot@company.com

Which to use in RAG metadata?
→ First author (John)?
→ Last author (docs-bot)?
→ All authors (John, Sarah, docs-bot)?
→ Source-specific (depends on where chunk came from)?
```

**Tag Conflicts:**

```
Confluence tags: ["api", "authentication", "v2"]
Notion tags: ["auth", "security", "oauth"]
Website categories: ["developers", "guides"]

Merging strategies:
1. Union: ["api", "authentication", "v2", "auth", "security", "oauth", "developers", "guides"]
   → Comprehensive but noisy

2. Intersection: [] (no common tags)
   → Too strict, loses all metadata

3. Normalize and merge: ["api", "authentication", "security"]
   → Requires tag mapping logic

4. Keep source-specific: {confluence: [...], notion: [...], website: [...]}
   → Preserves all, but complex queries
```

### Deletion Conflicts

One source deletes, others don't:

**The Partial Deletion:**

```
User deletes Confluence page (outdated)
→ Twig removes Confluence chunks from vector DB

But:
→ Notion copy still exists
→ Website copy still published
→ Twig keeps those chunks

AI behavior:
→ Query matches Notion/Website chunks
→ AI cites "deleted" content (from other sources)
→ User thinks: "I deleted this!"
→ Confusion about what's authoritative
```

**Cascading Deletion Decision:**

```
Question: Should deleting from one source delete from all?

Option A: Cascade delete
→ Delete Confluence → remove all duplicates
→ Risk: Loses content from other valid sources

Option B: Independent deletion
→ Delete Confluence → only remove Confluence chunks
→ Other sources unaffected
→ Current behavior

Option C: Soft-delete with prompt
→ Detect deletion in one source
→ Notify user: "Also in Notion and Website, delete those too?"
→ User decides
→ Requires UI/workflow changes
```

### Source-of-Truth Ambiguity

No clear canonical source:

**The Authority Problem:**

```
Engineering team:
→ Considers Confluence canonical
→ "If it's not in Confluence, it's not official"

Marketing team:
→ Considers Website canonical
→ "Public docs are the source of truth"

Support team:
→ Uses Zendesk as primary
→ "Zendesk is what customers see"

No organization-wide agreement:
→ Twig doesn't know which to prefer
→ Treats all sources equally
→ Conflicts unresolved
```

### Cross-Source Search and Attribution

Users need to know source of information:

**Source Attribution in Responses:**

```
User query: "API authentication"
Retrieved chunks from:
1. Confluence: "Use Bearer tokens" (2 days old)
2. Website: "Use OAuth 2.0" (1 day old)
3. Notion: "Use API keys" (2 weeks old)

AI response must indicate:
→ "According to the Website (latest): Use OAuth 2.0.
   Note: Confluence mentions Bearer tokens, and
   older Notion docs reference API keys."

Requires:
→ Source metadata in every chunk
→ Timestamp comparison
→ LLM prompt engineering to cite sources
→ UI to display source badges
```

**The Version Confusion:**

```
No version tracking across sources:
→ Confluence: No version field
→ Notion: Version = "1.3" (manual)
→ Website: Git commit hash = "abc123"

Can't automatically determine:
→ Which is newest version semantically
→ Which represents production vs draft
→ Version lineage (is v1.4 based on v1.3?)
```

***

## How to Solve

**Implement content fingerprinting for duplicate detection + configure source priority + track last-updated-at per source + display source attribution in responses + implement periodic cross-source reconciliation.** See [Multi-Source Configuration](https://github.com/thrivapp/twig-help-docs/blob/staging/data/multi-source.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/data-integration/sync-conflicts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
