# HIPAA-Compliant Knowledge Base

## The Problem

Healthcare organizations need RAG systems that handle Protected Health Information (PHI) while meeting strict HIPAA technical safeguards.

### Symptoms

* ❌ PHI in unencrypted vectors
* ❌ No BAA with embedding provider
* ❌ Insufficient audit logging
* ❌ Data at rest not encrypted
* ❌ Cannot demonstrate compliance

### Real-World Example

```
Healthcare company builds RAG:
→ Ingests patient records
→ Uses OpenAI embeddings (cloud API)
→ Stores vectors in Pinecone

HIPAA audit finds:
→ PHI sent to third-party (OpenAI) without BAA
→ Vector DB not configured for encryption at rest
→ No access logs for PHI retrieval
→ Violation: Fines + remediation required
```

***

## Deep Technical Analysis

### HIPAA Technical Safeguards

**Encryption Requirements:**

```
At rest:
→ Vector DB must encrypt storage (AES-256)
→ Document storage encrypted
→ Backup encryption

In transit:
→ TLS 1.2+ for all API calls
→ Embedding API must use HTTPS
→ No PHI in URLs/query params

In use:
→ Memory encryption (if possible)
→ Secure enclave for inference
```

**Access Controls:**

```
Minimum necessary:
→ User sees only PHI needed for their role
→ Chunk-level access control
→ Role-based retrieval filtering

Unique user IDs:
→ Track who accessed what PHI
→ Audit trail for every query
→ Cannot use shared API keys
```

**Audit Logging:**

```
Must log:
→ Every PHI access (timestamp, user, query)
→ Retrieved chunks containing PHI
→ Model responses with PHI
→ Failed access attempts

Retention: 6 years minimum
```

### Embedding Provider Compliance

**BAA Requirements:**

```
If embedding API processes PHI:
→ Must have Business Associate Agreement
→ OpenAI: Requires enterprise plan + BAA
→ Cohere: BAA available
→ Open-source models (Sentence-BERT): Self-host, no BAA needed

Without BAA:
→ HIPAA violation to send PHI
```

**De-identification Strategy:**

```
Option: Remove PHI before embedding:
→ Replace names with [PATIENT]
→ Replace dates with [DATE]
→ Embed de-identified text

Trade-off:
→ HIPAA compliant
→ But: Semantic search less effective
→ Cannot search "John Smith's records"
```

### Vector Database Considerations

**HIPAA-Compliant Options:**

```
Self-hosted:
→ PostgreSQL + pgvector (full control)
→ Weaviate (self-hosted mode)
→ Elasticsearch (on-premise)

Managed with BAA:
→ Pinecone (Enterprise + BAA)
→ AWS OpenSearch (with BAA)

Must verify:
→ BAA coverage
→ Encryption at rest
→ Access logging
→ Data residency (US only)
```

***

## How to Solve

**Use embedding providers with BAA or self-host models + ensure vector DB encrypts at rest (AES-256) + implement comprehensive audit logging (6-year retention) + apply minimum necessary access control + de-identify PHI where possible + execute BAAs with all third-party processors.** See [HIPAA Setup](/rag-scenarios-and-solutions/privacy/hipaa-setup.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/privacy/hipaa-setup.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
