Vector Index Out of Sync

The Problem

The vector database becomes inconsistent with the source data—documents deleted from sources still appear in search, updated content shows old versions, new documents missing from index.

Symptoms

  • ❌ Deleted documents still returned in search

  • ❌ Updated content shows previous version

  • ❌ New documents added 2 hours ago not searchable

  • ❌ "Document not found" when clicking result

  • ❌ Index rebuild required weekly

Real-World Example

Timeline:
10:00 AM: Delete "Old Product Guide" from Confluence
10:30 AM: Twig sync runs, removes from source DB
11:00 AM: User queries "product guide"

Result: "Old Product Guide" still in top results
→ Vector DB not updated
→ Embedding still exists
→ Points to deleted document

User clicks → "404 Not Found"
Confusing and frustrating experience

Deep Technical Analysis

Async Embedding Pipeline

Vector updates lag behind source changes:

Pipeline Stages:

The Queue Backup:

Deletion Propagation

Removing embeddings is error-prone:

Soft Delete vs Hard Delete:

Orphaned Embeddings:

Update vs Delete+Insert

Updating existing embeddings:

Update in Place:

Chunk-Level Updates:

Vector Database Consistency Models

Different DBs have different guarantees:

Eventual Consistency:

Read-Your-Writes Consistency:

Strong Consistency:

Multi-Index Management

Running multiple indexes simultaneously:

Blue-Green Index Swapping:

The Swap Timing:

Metadata Staleness

Document metadata out of sync:

Metadata Updates:

Metadata-Only Updates:

Concurrent Modification Conflicts

Simultaneous updates cause issues:

Race Condition:

Optimistic Locking:

Cross-Region Replication Lag

Distributed deployments have sync delays:

Multi-Region Vector DB:

The Split-Brain Problem:


How to Solve

Implement idempotent upsert operations (delete+insert) + store document_id metadata with every vector + track document versions for optimistic concurrency + use reconciliation jobs to detect orphaned vectors + accept eventual consistency with status indicators ("indexing..."). See Index Synchronization.

Last updated