Chunks Too Small

The Problem

Your AI agent gives incomplete or fragmented answers because document chunks are too small and lack sufficient context.

Symptoms

❌ AI says "I don't have enough information" when answer exists
❌ Answers are partial or cut off mid-sentence
❌ References span multiple chunks but AI only cites one
❌ Code examples split across chunks, missing parts
❌ Tables broken, showing only headers without data

Real-World Example

Your documentation has a comprehensive setup guide,
but when asked "How do I set up the database?",
AI only mentions Step 1 and 2 of 5 steps.

Chunk size: 200 tokens
Setup guide: 800 tokens total
Split into: 4 chunks
AI retrieves: Only chunks 1-2
Result: Incomplete answer

Deep Technical Analysis

The Fundamental Chunking Dilemma

Chunking creates a paradox in RAG systems:

Too Small Chunks:
→ Better semantic precision (exact match to query)
→ But loses context (incomplete information)
→ Retrieves accurate but insufficient pieces

Too Large Chunks:
→ Complete context (full information)
→ But lower semantic precision (diluted signal)
→ Retrieves irrelevant information alongside relevant

The Challenge:
There's no universal optimal chunk size
It depends on: content type, query patterns, retrieval strategy

Why Token-Based Chunking Fails for Technical Content

Token-based splitting assumptions:

Assumption: Language has uniform information density
Reality: Technical docs have variable density

Example - Dense section (API reference):
  "authenticate(token: string): Promise<User>
   Parameters:
   - token: JWT authentication token
   Returns: User object or throws AuthError"
  
  Information: 5 distinct concepts in 50 tokens
  Density: 0.1 concepts/token

Example - Sparse section (introduction):
  "Welcome to our platform. This guide will help you
   get started with the basics and understand the
   fundamental concepts you'll need to know."
  
  Information: 1 concept in 50 tokens
  Density: 0.02 concepts/token

Problem:
Fixed token chunks treat both equally
Dense sections get split mid-concept
Sparse sections waste chunk capacity

The Retrieval Mathematics Problem

Why top-K retrieval fails with small chunks:

Vector similarity search returns top-K chunks:
- Query: "How do I set up authentication?"
- K = 5 (typical)
- Chunk size: 200 tokens

Math:
Setup guide total: 1,000 tokens
Split into: 5 chunks of 200 tokens each

Chunks created:
1. "Introduction to auth + step 1"
2. "Step 2 + step 3"
3. "Step 4 + step 5"  
4. "Step 6 + common errors"
5. "Troubleshooting + FAQ"

Vector similarity scores:
Chunk 1: 0.89 (high - mentions "authentication")
Chunk 4: 0.84 (high - mentions "errors" which query implies)
Chunk 2: 0.76 (medium)
Chunk 5: 0.75 (medium - "troubleshooting" related)
Chunk 3: 0.68 (lowest)

Top-5 retrieval gets: 1, 4, 2, 5, 3
But logical reading order is: 1, 2, 3, 4, 5

AI sees: Step 1 → Error handling → Step 2-3 → Troubleshooting → Step 4-5
Coherence: Destroyed

Semantic Boundary Detection Complexity

The code block problem:

# Chunk 1 (200 tokens)
def authenticate_user(credentials):
    """
    Authenticates user with provided credentials.
    
    Args:
        credentials: Dict with 'username' and 'password'
    
    Returns:
        User object if successful
    
    Raises:
        AuthenticationError: If credentials invalid
    """
    # Validate input format
    if not isinstance(credentials, dict):
        raise ValueError("Credentials must be dict")
    
    if 'username' not in credentials:
        raise ValueError("Missing username")

# ← CHUNK BOUNDARY HERE ← 

# Chunk 2 (200 tokens)  
    if 'password' not in credentials:
        raise ValueError("Missing password")
    
    # Hash password
    hashed_pw = hash_password(credentials['password'])
    
    # Query database
    user = db.query(User).filter(
        User.username == credentials['username'],
        User.password_hash == hashed_pw
    ).first()

Why this breaks:

Function signature in chunk 1, implementation in chunk 2
Chunk 1 semantic: "This is about input validation"
Chunk 2 semantic: "This is about database querying"
Query "How to check database for user?" → Retrieves chunk 2 only
Missing context: What the 'credentials' parameter contains
AI can't reconstruct complete logic flow

The cascade effect:

Missing context in code →
AI makes wrong assumptions about parameters →
Generates incorrect usage examples →
User copies broken code →
Support tickets increase →
Trust in AI decreases

Table Splitting Pathology

Markdown table structure:

| Endpoint | Method | Auth Required | Rate Limit |
|----------|--------|---------------|------------|
| /api/users | GET | Yes | 100/min |
| /api/users/:id | GET | Yes | 100/min |
| /api/users | POST | Yes | 20/min |

← CHUNK BOUNDARY AFTER 150 TOKENS ←

| /api/auth/login | POST | No | 10/min |
| /api/auth/refresh | POST | Yes | 50/min |
| /api/data | GET | Yes | 1000/min |

Retrieval scenarios:

Query: "What's the rate limit for user endpoints?"

Chunk 1 retrieved (has header + first 3 rows):
AI sees: User endpoints have 100/min or 20/min limits
Answer: Partially correct

Chunk 2 retrieved (no header, last 3 rows):
AI sees: Row-like data without context
Can't determine: What these rows mean
Answer: "I don't have clear information"

Both chunks retrieved:
AI must: Recognize these are parts of same table
And: Mentally reconstruct table structure
But: No explicit linking between chunks
Result: May still miss connection

Context Window vs Chunk Size Trade-off

The retrieval stage dilemma:

LLM Context Window: 8,192 tokens (typical)

Allocation:
- System prompt: 500 tokens
- User query: 50 tokens  
- Memory/history: 500 tokens
- Reserved for response: 1,500 tokens
- Available for retrieved content: 5,642 tokens

If chunk size = 200 tokens:
  Max chunks that fit: 5,642 / 200 = 28 chunks
  But top-K usually = 5-10 chunks
  Utilization: 1,000-2,000 / 5,642 = 18-35%
  → Wasting context window capacity

If chunk size = 1,000 tokens:
  Max chunks that fit: 5,642 / 1,000 = 5 chunks
  Top-K = 5 chunks
  Utilization: 5,000 / 5,642 = 89%
  → Efficient use of context

But larger chunks mean:
- Lower retrieval precision (more noise per chunk)
- Potentially less relevant content included
- Higher embedding costs

The Overlap Problem

Overlap seems like solution but creates issues:

Document: 1,000 tokens
Chunk size: 400 tokens
Overlap: 100 tokens

Chunks created:
Chunk 1: tokens 0-399
Chunk 2: tokens 300-699 (100 token overlap with chunk 1)
Chunk 3: tokens 600-999 (100 token overlap with chunk 2)

Storage cost:
Without overlap: 1,000 tokens stored
With overlap: 1,200 tokens stored (20% increase)

Embedding cost:
Without overlap: 3 embeddings
With overlap: 3 embeddings (same)

Retrieval confusion:
Query matches overlapped region (tokens 300-399)
→ Both chunk 1 and chunk 2 score highly
→ Top-5 includes both chunks with similar content
→ Wasted retrieval slots on duplicate information
→ Other relevant unique chunks displaced

Hierarchical Document Structure Loss

How chunking destroys document hierarchy:

Original document structure:
# Authentication Guide
## Prerequisites
### System Requirements
### Dependencies
## Setup
### Installation
#### Linux
#### macOS
#### Windows
### Configuration
#### Basic Config
#### Advanced Config
## Usage
### First Login
### Session Management

After fixed-size chunking:
Chunk 1: "# Authentication Guide\n## Prerequisites\n### System Requirements\nYou need..."
Chunk 2: "...Ubuntu 20.04 or later\n### Dependencies\nInstall these packages..."
Chunk 3: "...npm install auth-lib\n## Setup\n### Installation\n#### Linux..."

Lost information:
- Chunk 2 doesn't know it's about "Prerequisites"
- Chunk 3 doesn't know "Linux" is under "Installation" under "Setup"
- Hierarchical context evaporated
- Headings in middle of chunks lack parent context

Query implications:

Query: "What are the authentication prerequisites?"

Semantic match:
- "prerequisites" appears in chunk 1
- But details span chunks 1, 2
- Chunk 2 has most details but weak keyword match
- Retrieval: Chunk 1 scored higher, retrieved alone
- Result: Incomplete list of prerequisites

How to Solve

Increase chunk size to 1024-2048 tokens for technical content + add 10-20% overlap + configure semantic boundary splitting. See Chunking Configuration.

Why This Problem Showcases RAG Architecture Depth

This isn't just "make chunks bigger" - it reveals:

Semantic search limitations: Vector similarity doesn't understand document flow or logical dependencies
Information density variability: Technical content has non-uniform information distribution
Context reconstruction complexity: LLMs must infer structure from fragments
Trade-off mathematics: Chunk size optimization is multi-objective (precision vs recall vs cost vs context)
Structure preservation: Maintaining hierarchical relationships in flat vector space is fundamentally hard

Understanding these architectural constraints is essential for building production RAG systems.

PreviousMulti-Source Sync Conflicts NextChunks Too Large

Last updated 0 minutes ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagThe Fundamental Chunking Dilemma

hashtagWhy Token-Based Chunking Fails for Technical Content

hashtagThe Retrieval Mathematics Problem

hashtagSemantic Boundary Detection Complexity

hashtagTable Splitting Pathology

hashtagContext Window vs Chunk Size Trade-off

hashtagThe Overlap Problem

hashtagHierarchical Document Structure Loss

hashtagHow to Solve

hashtagWhy This Problem Showcases RAG Architecture Depth