GDPR Right to Forget in Vector DB

The Problem

When users request data deletion under GDPR Article 17, removing their data from vector embeddings is technically complex and often incomplete.

Symptoms

  • ❌ Cannot locate user's vectors

  • ❌ Text deleted but embeddings remain

  • ❌ No mapping: user data → vectors

  • ❌ Partial deletion leaves traces

  • ❌ Cannot prove complete erasure

Real-World Example

User requests deletion:
"Delete all my data per GDPR Article 17"

Company deletes:
→ Source documents from document DB ✓
→ User account from auth DB ✓

But vector DB still contains:
→ Embeddings of user's emails
→ Chunks mentioning user's name
→ Context where user participated

How to find and delete these vectors?
→ No direct identifier linking vectors to user
→ Cannot execute complete erasure
→ GDPR violation

Deep Technical Analysis

Vector-to-Source Mapping Problem

Embedding Anonymity:

Metadata Dependency:

Secondary References:

Deletion Strategies

Metadata Filtering:

Re-Embedding After Deletion:

Soft Deletion:

Vector DB Capabilities

Deletion Support by Platform:

Performance Concerns:

Audit Trail

Proving Deletion:


How to Solve

Tag all vectors with user/document IDs at ingestion + implement metadata-based deletion (DELETE WHERE user_id=X) + perform semantic search for residual references + maintain audit log of deletions + consider re-indexing for guaranteed erasure + verify deletion with count queries. See GDPR Compliance.

Last updated