Agent-Level Data Isolation

The Problem

Multiple AI agents share the same knowledge base without proper isolation, causing agents to access data they shouldn't see.

Symptoms

❌ Agent A sees Agent B's private data
❌ Cross-agent data leakage
❌ Cannot restrict knowledge by agent
❌ Shared vector DB exposes all data
❌ No tenant isolation

Real-World Example

Company has two agents:
→ HR Agent: Access to employee records
→ Customer Support Agent: Access to help docs

Shared vector DB with all data:
→ Customer asks Support Agent: "What's the CEO's salary?"
→ Retrieval finds HR document with salary info
→ Support Agent responds with CEO salary

Data isolation failure

Deep Technical Analysis

Shared Knowledge Base Risks

No Filtering Layer:

All chunks in one vector DB:
→ HR docs embedded
→ Customer docs embedded
→ No metadata distinguishing them

Any query retrieves anything:
→ Agent identity not checked
→ Data access unrestricted
→ Privacy violation

Metadata Filtering:

Solution: Tag chunks with access control:
{
  vector: [0.234, ...],
  metadata: {
    agent_id: "hr_agent",
    department: "hr",
    sensitivity: "confidential"
  }
}

Query with filter:
→ agent_id = "support_agent"
→ Only retrieve support_agent tagged chunks

Multi-Tenancy Patterns

Namespace Isolation:

Pinecone/Weaviate:
→ Create separate namespaces per agent
→ hr_agent namespace
→ support_agent namespace

Queries scoped to namespace:
→ Cannot cross namespace boundary
→ Strong isolation

Separate Indexes:

One index per agent:
→ hr_agent_index
→ support_agent_index

Complete separation:
+ Strongest isolation
+ Independent scaling
- Higher infrastructure cost
- More operational complexity

Row-Level Security:

PostgreSQL + pgvector:
→ Use database roles
→ Row-level security policies
→ Query: "Show only rows where agent_id = current_user"

Database-enforced isolation

Access Control Logic

Pre-Retrieval Filtering:

Before vector search:
1. Identify requesting agent
2. Add metadata filter:
   WHERE metadata.agent_id = 'support_agent'
3. Execute search with filter

Ensures:
→ Only authorized chunks retrieved
→ No leakage

Post-Retrieval Filtering:

Alternative: Filter after retrieval:
1. Retrieve top-K chunks (e.g., 20)
2. Check each chunk's agent_id
3. Remove unauthorized
4. Return remaining (e.g., 12)

Problem:
→ Reduces effective K
→ May not have enough results
→ Prefer pre-retrieval

How to Solve

Tag all chunks with agent_id/tenant_id metadata + implement pre-retrieval filtering (metadata.agent_id = current_agent) + use namespace isolation (separate vector DB namespaces) + consider separate indexes for strong isolation + apply row-level security if using PostgreSQL. See Data Isolation.

PreviousEmbedding Data Residency NextQuery Audit Trail Gaps

Last updated 18 minutes ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagShared Knowledge Base Risks

hashtagMulti-Tenancy Patterns

hashtagAccess Control Logic

hashtagHow to Solve