# Vector DB Encryption

## The Problem

Vector databases store embeddings and metadata in unencrypted form, exposing sensitive data if storage is compromised.

### Symptoms

* ❌ Plaintext vectors on disk
* ❌ Unencrypted metadata
* ❌ Backups not encrypted
* ❌ Compliance violations (HIPAA, PCI-DSS)
* ❌ Cannot prove encryption at rest

### Real-World Example

```
Healthcare RAG system:
→ Patient records embedded
→ Vector DB: Pinecone (managed)

Security audit asks:
"Is data encrypted at rest?"

Discovery:
→ Pinecone encrypts automatically (AES-256) ✓
→ But: Metadata (patient names) in plaintext ✗
→ Backup exports unencrypted ✗

Partial encryption = compliance failure
```

***

## Deep Technical Analysis

### Encryption Layers

**At-Rest Encryption:**

```
Storage level:
→ Disk encryption (LUKS, dm-crypt)
→ Encrypts entire volume
→ Protects if disk stolen

Database level:
→ Encrypt specific columns/tables
→ Application-aware encryption
→ Allows encrypted search (limited)
```

**Managed vs Self-Hosted:**

```
Managed (Pinecone, Weaviate Cloud):
→ Encryption at rest: Automatic (usually)
→ Must verify in SLA/documentation
→ Key management: Vendor-controlled
→ Less control, easier to use

Self-hosted (pgvector, Weaviate):
→ Encryption: You configure
→ Must set up explicitly
→ Key management: Your responsibility
→ Full control, more complexity
```

### Key Management

**Encryption Keys:**

```
Where are keys stored?
→ Same server as data: Weak (both stolen together)
→ Separate key management service (AWS KMS, HashiCorp Vault): Strong

Key rotation:
→ How often?
→ Re-encrypt all data with new key?
→ Operational overhead
```

**Customer Managed Keys (CMK):**

```
Some vector DBs support CMK:
→ You provide encryption key
→ Vendor encrypts with your key
→ You can revoke access (data unreadable)

Benefits:
→ Control over data access
→ Can enforce deletion by revoking key
```

### Metadata Encryption

**The Metadata Problem:**

```
Vector itself: High-dimensional numbers
→ Semantic meaning, but not directly readable

Metadata: Plaintext
{
  "patient_name": "John Smith",
  "diagnosis": "diabetes",
  "document_id": "med_record_789"
}

If metadata unencrypted:
→ PII exposed
→ Vector DB breach = privacy breach
```

**Encrypting Metadata:**

```
Challenge: Need to filter by metadata
→ WHERE metadata.patient_name = 'John Smith'
→ If encrypted, cannot search

Solutions:
→ Searchable encryption (complex)
→ Token-based pseudonymization
→ Encrypt only non-searchable fields
```

### Backup Encryption

**Export Security:**

```
Vector DB backups:
→ Often exported as JSON/CSV
→ May be unencrypted by default

Must:
→ Encrypt backup files (GPG, AES)
→ Secure storage (encrypted S3)
→ Key management for backup keys
```

***

## How to Solve

**Enable encryption at rest (AES-256) in vector DB + use disk-level encryption for self-hosted deployments + implement customer-managed keys (CMK) where supported + encrypt metadata fields containing PII + encrypt backups before storage + rotate encryption keys periodically + use key management service (AWS KMS, Vault).** See [Vector Encryption](/rag-scenarios-and-solutions/privacy/key-rotation.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/privacy/key-rotation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
