# Retrieval Stage Debugging

## The Problem

Cannot trace why specific chunks were or weren't retrieved, making it impossible to debug poor retrieval quality or unexpected results.

### Symptoms

* ❌ No visibility into retrieval scores
* ❌ Cannot see why chunk ranked #1 vs #10
* ❌ Missing chunks unexplained
* ❌ Cannot replay retrieval for debugging
* ❌ Retrieval feels like "black box"

### Real-World Example

```
User reports: "AI said rate limit is 100/hour but docs say 1000/hour"

Investigation needed:
→ What chunks were retrieved?
→ What were their scores?
→ Why was "100/hour" chunk ranked higher than "1000/hour"?
→ Was "1000/hour" chunk retrieved at all?

No retrieval logs:
→ Cannot answer these questions
→ Cannot debug
→ Cannot improve
```

***

## Deep Technical Analysis

### Retrieval Opacity

**Missing Instrumentation:**

```
Typical RAG pipeline:
1. Query → Vector DB
2. Get results
3. Pass to LLM

No logging:
→ Which chunks returned?
→ Similarity scores?
→ Reranking changes?
→ Filters applied?

Black box
```

**Debug Requirements:**

```
Need to log:
→ Query embedding
→ Retrieved chunk IDs + scores
→ Filters applied (metadata, recency)
→ Reranking scores (if used)
→ Final top-K selection
→ Context assembly
```

### Retrieval Trace Format

**Structured Logging:**

```json
{
  "query": "API rate limit",
  "timestamp": "2024-01-15T14:30:00Z",
  "embedding_model": "text-embedding-ada-002",
  "retrieval_stage": {
    "vector_search": {
      "k": 20,
      "results": [
        {
          "chunk_id": "doc_123_chunk_5",
          "score": 0.87,
          "text_preview": "Rate limit is 1000/hour...",
          "metadata": {"published": "2024-01-01"}
        },
        {
          "chunk_id": "doc_456_chunk_12",
          "score": 0.82,
          "text_preview": "Rate limit is 100/hour...",
          "metadata": {"published": "2022-06-15"}
        }
      ]
    },
    "reranking": {
      "model": "cohere-rerank",
      "results": [
        {"chunk_id": "doc_123_chunk_5", "score": 0.95},
        {"chunk_id": "doc_456_chunk_12", "score": 0.73}
      ]
    },
    "final_selection": ["doc_123_chunk_5", ...]
  }
}
```

### Score Analysis

**Similarity Distribution:**

```
Visualize score distribution:
→ Top-5: 0.85-0.90 (excellent)
→ Top-10: 0.75-0.85 (good)
→ Top-20: 0.60-0.75 (marginal)

If all < 0.70:
→ Poor retrieval quality
→ Query-document mismatch
→ Need query expansion or rewriting
```

**Score Gap Detection:**

```
Score between #5 and #6:
→ #5: 0.82
→ #6: 0.78 (small gap)
→ Borderline - both relevant

vs

→ #5: 0.85
→ #6: 0.62 (large gap)
→ Clear relevance threshold
→ Top-5 sufficient

Helps tune K parameter
```

### Replay Capability

**Store Query Context:**

```
Save for each query:
→ Query text
→ Query embedding (frozen)
→ Knowledge base snapshot ID
→ Retrieval config (K, filters)

Enables:
→ Exact replay of retrieval
→ Compare before/after changes
→ Regression testing
```

**A/B Testing:**

```
Test retrieval improvements:
→ Baseline: K=5, no rerank
→ Experiment: K=20, rerank top-5

Compare:
→ Same query set
→ Measure: Precision@5, NDCG
→ Determine if improvement real
```

### Failure Mode Detection

**No Results:**

```
Log when retrieval returns nothing:
→ Query: "..."
→ Results: [] (empty)
→ Reason: All scores < threshold

Alert: Query reformulation needed
```

**Low-Quality Results:**

```
All scores < 0.65:
→ Flag: Poor retrieval
→ Trigger: Fallback to keyword search
→ Log: For analysis
```

***

## How to Solve

**Log retrieval traces (query, chunks, scores) for every request + include reranking scores if applicable + store query embedding for replay + visualize score distributions + implement retrieval replay capability for debugging + monitor low-score queries (< 0.70 threshold) + build retrieval analytics dashboard (score trends, failure modes).** See [Retrieval Debugging](/rag-scenarios-and-solutions/monitoring/retrieval-debugging.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/monitoring/retrieval-debugging.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
