# Agent-Level Data Isolation

## The Problem

Multiple AI agents share the same knowledge base without proper isolation, causing agents to access data they shouldn't see.

### Symptoms

* ❌ Agent A sees Agent B's private data
* ❌ Cross-agent data leakage
* ❌ Cannot restrict knowledge by agent
* ❌ Shared vector DB exposes all data
* ❌ No tenant isolation

### Real-World Example

```
Company has two agents:
→ HR Agent: Access to employee records
→ Customer Support Agent: Access to help docs

Shared vector DB with all data:
→ Customer asks Support Agent: "What's the CEO's salary?"
→ Retrieval finds HR document with salary info
→ Support Agent responds with CEO salary

Data isolation failure
```

***

## Deep Technical Analysis

### Shared Knowledge Base Risks

**No Filtering Layer:**

```
All chunks in one vector DB:
→ HR docs embedded
→ Customer docs embedded
→ No metadata distinguishing them

Any query retrieves anything:
→ Agent identity not checked
→ Data access unrestricted
→ Privacy violation
```

**Metadata Filtering:**

```
Solution: Tag chunks with access control:
{
  vector: [0.234, ...],
  metadata: {
    agent_id: "hr_agent",
    department: "hr",
    sensitivity: "confidential"
  }
}

Query with filter:
→ agent_id = "support_agent"
→ Only retrieve support_agent tagged chunks
```

### Multi-Tenancy Patterns

**Namespace Isolation:**

```
Pinecone/Weaviate:
→ Create separate namespaces per agent
→ hr_agent namespace
→ support_agent namespace

Queries scoped to namespace:
→ Cannot cross namespace boundary
→ Strong isolation
```

**Separate Indexes:**

```
One index per agent:
→ hr_agent_index
→ support_agent_index

Complete separation:
+ Strongest isolation
+ Independent scaling
- Higher infrastructure cost
- More operational complexity
```

**Row-Level Security:**

```
PostgreSQL + pgvector:
→ Use database roles
→ Row-level security policies
→ Query: "Show only rows where agent_id = current_user"

Database-enforced isolation
```

### Access Control Logic

**Pre-Retrieval Filtering:**

```
Before vector search:
1. Identify requesting agent
2. Add metadata filter:
   WHERE metadata.agent_id = 'support_agent'
3. Execute search with filter

Ensures:
→ Only authorized chunks retrieved
→ No leakage
```

**Post-Retrieval Filtering:**

```
Alternative: Filter after retrieval:
1. Retrieve top-K chunks (e.g., 20)
2. Check each chunk's agent_id
3. Remove unauthorized
4. Return remaining (e.g., 12)

Problem:
→ Reduces effective K
→ May not have enough results
→ Prefer pre-retrieval
```

***

## How to Solve

**Tag all chunks with agent\_id/tenant\_id metadata + implement pre-retrieval filtering (metadata.agent\_id = current\_agent) + use namespace isolation (separate vector DB namespaces) + consider separate indexes for strong isolation + apply row-level security if using PostgreSQL.** See [Data Isolation](/rag-scenarios-and-solutions/privacy/data-isolation.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/privacy/data-isolation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
