Context Relevance Decay

The Problem

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

Symptoms

  • ❌ More chunks = worse answers

  • ❌ Irrelevant context dilutes good context

  • ❌ LLM distracted by noise

  • ❌ Longer responses, less accurate

  • ❌ High K (top-20) worse than low K (top-5)

Real-World Example

Query: "API rate limit"

K=5 (top 5 chunks):
→ All highly relevant (score 0.80-0.85)
→ Answer: "Rate limit is 1000 req/hour" ✓

K=20 (top 20 chunks):
→ Top 5: Highly relevant (0.80-0.85)
→ Chunks 6-10: Somewhat relevant (0.70-0.75)
→ Chunks 11-20: Marginally relevant (0.60-0.70)

With K=20:
→ LLM sees rate limits + pricing + authentication + errors + ...
→ Context diluted
→ Answer: "Rate limit depends on plan tier and may vary..." (vague)

Deep Technical Analysis

Signal-to-Noise Ratio

Retrieval Score Distribution:

Context Dilution:

Dynamic K Selection

Query Complexity Heuristic:

Score-Based Cutoff:

Two-Stage Retrieval

Broad Then Narrow:

Context Compression

Extractive Summarization:


How to Solve

Use dynamic K based on query complexity (3-5 for simple, 10-15 for complex) + implement score-based cutoff (retrieve while score > 0.70) + apply two-stage retrieval (broad search + rerank to top-5) + compress lower-ranked chunks (extract key sentences) + monitor relevance decay (test K=5 vs K=20 accuracy) + prefer precision over recall for most queries. See Context Optimization.

Last updated