# Context Relevance Decay

## The Problem

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

### Symptoms

* ❌ More chunks = worse answers
* ❌ Irrelevant context dilutes good context
* ❌ LLM distracted by noise
* ❌ Longer responses, less accurate
* ❌ High K (top-20) worse than low K (top-5)

### Real-World Example

```
Query: "API rate limit"

K=5 (top 5 chunks):
→ All highly relevant (score 0.80-0.85)
→ Answer: "Rate limit is 1000 req/hour" ✓

K=20 (top 20 chunks):
→ Top 5: Highly relevant (0.80-0.85)
→ Chunks 6-10: Somewhat relevant (0.70-0.75)
→ Chunks 11-20: Marginally relevant (0.60-0.70)

With K=20:
→ LLM sees rate limits + pricing + authentication + errors + ...
→ Context diluted
→ Answer: "Rate limit depends on plan tier and may vary..." (vague)
```

***

## Deep Technical Analysis

### Signal-to-Noise Ratio

**Retrieval Score Distribution:**

```
Top-K chunks by score:
→ #1: 0.85 (very relevant)
→ #2: 0.83
→ #3: 0.80
→ #5: 0.75
→ #10: 0.65 (borderline)
→ #20: 0.50 (weak)

As K increases:
→ Signal (high-relevance) diluted
→ Noise (low-relevance) added
→ LLM must filter, sometimes fails
```

**Context Dilution:**

```
LLM attention mechanism:
→ Spreads across all context
→ With 20 chunks, each gets less attention
→ Key info (chunk #2) may be overlooked
→ Distracted by chunk #18 (irrelevant)

Smaller, focused context better
```

### Dynamic K Selection

**Query Complexity Heuristic:**

```
Simple factual: K=3-5
→ "What is X?"
→ Need precise answer

Comprehensive: K=10-15
→ "Explain how X works"
→ Need multiple perspectives

Very complex: K=15-20
→ "Compare X, Y, Z and recommend"
→ Need broad coverage

Adjust K based on query type
```

**Score-Based Cutoff:**

```
Instead of fixed K:
→ Retrieve until score < threshold

Example:
→ Retrieve while score > 0.70
→ If top-3 all > 0.70, use K=3
→ If top-10 > 0.70, use K=10

Adaptive to result quality
```

### Two-Stage Retrieval

**Broad Then Narrow:**

```
Stage 1: Retrieve K=50 (broad)
Stage 2: Rerank to top-5 (narrow)

Reranking:
→ Use cross-encoder (more accurate)
→ Consider query-document interaction
→ Refine to most relevant

Final context: High-quality, compact
```

### Context Compression

**Extractive Summarization:**

```
For lower-ranked chunks (6-20):
→ Extract only most relevant sentences
→ Discard filler

Example:
→ Chunk #15 (1000 tokens) → Extract 100 tokens
→ Preserves key info
→ Reduces dilution
```

***

## How to Solve

**Use dynamic K based on query complexity (3-5 for simple, 10-15 for complex) + implement score-based cutoff (retrieve while score > 0.70) + apply two-stage retrieval (broad search + rerank to top-5) + compress lower-ranked chunks (extract key sentences) + monitor relevance decay (test K=5 vs K=20 accuracy) + prefer precision over recall for most queries.** See [Context Optimization](/rag-scenarios-and-solutions/accuracy/context-relevance-decay.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/accuracy/context-relevance-decay.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.