Context Relevance Decay

The Problem

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

Symptoms

❌ More chunks = worse answers
❌ Irrelevant context dilutes good context
❌ LLM distracted by noise
❌ Longer responses, less accurate
❌ High K (top-20) worse than low K (top-5)

Real-World Example

Query: "API rate limit"

K=5 (top 5 chunks):
→ All highly relevant (score 0.80-0.85)
→ Answer: "Rate limit is 1000 req/hour" ✓

K=20 (top 20 chunks):
→ Top 5: Highly relevant (0.80-0.85)
→ Chunks 6-10: Somewhat relevant (0.70-0.75)
→ Chunks 11-20: Marginally relevant (0.60-0.70)

With K=20:
→ LLM sees rate limits + pricing + authentication + errors + ...
→ Context diluted
→ Answer: "Rate limit depends on plan tier and may vary..." (vague)

Deep Technical Analysis

Signal-to-Noise Ratio

Retrieval Score Distribution:

Top-K chunks by score:
→ #1: 0.85 (very relevant)
→ #2: 0.83
→ #3: 0.80
→ #5: 0.75
→ #10: 0.65 (borderline)
→ #20: 0.50 (weak)

As K increases:
→ Signal (high-relevance) diluted
→ Noise (low-relevance) added
→ LLM must filter, sometimes fails

Context Dilution:

LLM attention mechanism:
→ Spreads across all context
→ With 20 chunks, each gets less attention
→ Key info (chunk #2) may be overlooked
→ Distracted by chunk #18 (irrelevant)

Smaller, focused context better

Dynamic K Selection

Query Complexity Heuristic:

Simple factual: K=3-5
→ "What is X?"
→ Need precise answer

Comprehensive: K=10-15
→ "Explain how X works"
→ Need multiple perspectives

Very complex: K=15-20
→ "Compare X, Y, Z and recommend"
→ Need broad coverage

Adjust K based on query type

Score-Based Cutoff:

Instead of fixed K:
→ Retrieve until score < threshold

Example:
→ Retrieve while score > 0.70
→ If top-3 all > 0.70, use K=3
→ If top-10 > 0.70, use K=10

Adaptive to result quality

Two-Stage Retrieval

Broad Then Narrow:

Stage 1: Retrieve K=50 (broad)
Stage 2: Rerank to top-5 (narrow)

Reranking:
→ Use cross-encoder (more accurate)
→ Consider query-document interaction
→ Refine to most relevant

Final context: High-quality, compact

Context Compression

Extractive Summarization:

For lower-ranked chunks (6-20):
→ Extract only most relevant sentences
→ Discard filler

Example:
→ Chunk #15 (1000 tokens) → Extract 100 tokens
→ Preserves key info
→ Reduces dilution

How to Solve

Use dynamic K based on query complexity (3-5 for simple, 10-15 for complex) + implement score-based cutoff (retrieve while score > 0.70) + apply two-stage retrieval (broad search + rerank to top-5) + compress lower-ranked chunks (extract key sentences) + monitor relevance decay (test K=5 vs K=20 accuracy) + prefer precision over recall for most queries. See Context Optimization.

PreviousKnowledge Base Drift NextMulti-Hop Reasoning Failure

Last updated 18 minutes ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagSignal-to-Noise Ratio

hashtagDynamic K Selection

hashtagTwo-Stage Retrieval

hashtagContext Compression

hashtagHow to Solve