# RAG Retrieval & Accuracy

## Overview

Retrieval accuracy is where RAG systems succeed or fail. You can have perfect data integration, optimal chunking, and high-quality embeddings, but if the wrong documents are retrieved—or the right documents aren't found—your agents will provide poor answers. This section focuses on ensuring your RAG pipeline retrieves the most relevant, accurate, and complete information for every query.

## Why Retrieval Accuracy Matters

Accurate retrieval ensures:

* **Correct answers** - LLM receives the right context to answer questions
* **Complete information** - All relevant facts are provided, not just fragments
* **Trustworthy responses** - Answers grounded in real, cited sources
* **User confidence** - Consistent, reliable performance builds trust

Poor retrieval leads to:

* **Wrong answers** - Irrelevant context leads to incorrect responses
* **"I don't know" responses** - Relevant information exists but isn't found
* **Hallucinations** - LLM fills knowledge gaps with fabricated information
* **Inconsistent quality** - Some queries work perfectly, others fail completely
* **User frustration** - Unreliable agent → low adoption → wasted investment

## Common Retrieval Challenges

### Retrieval Failures

* **No relevant chunks retrieved** - Known information not found
* **Wrong documents ranked highly** - Irrelevant content returned first
* **Incomplete context assembly** - Partial information missing key details
* **Multi-hop reasoning failure** - Cannot connect related pieces of information

### Source Quality

* **Outdated knowledge base** - Stale information leads to wrong answers
* **Conflicting sources** - Contradictory information in retrieved context
* **Source ranking issues** - Lower-quality sources prioritized over authoritative ones
* **Knowledge base drift** - Performance degrades as content changes

### Query Understanding

* **Query-document mismatch** - Query phrasing doesn't match how docs are written
* **Ambiguous query expansion** - Query rewriting makes things worse
* **Context relevance decay** - Initially good results become less relevant

## Solutions in This Section

Browse these guides to improve retrieval accuracy:

* [Wrong Answers from RAG](/rag-scenarios-and-solutions/accuracy/wrong-answers.md)
* [No Relevant Chunks Retrieved](/rag-scenarios-and-solutions/accuracy/no-answer.md)
* [Outdated Knowledge Base](/rag-scenarios-and-solutions/accuracy/stale-data.md)
* [Hallucination Despite Retrieved Context](/rag-scenarios-and-solutions/accuracy/hallucination.md)
* [Incomplete Context Assembly](/rag-scenarios-and-solutions/accuracy/incomplete.md)
* [Conflicting Sources in Context](/rag-scenarios-and-solutions/accuracy/conflicting-info.md)
* [Source Ranking Issues](/rag-scenarios-and-solutions/accuracy/source-priority.md)
* [Query-Document Mismatch](/rag-scenarios-and-solutions/accuracy/query-interpretation.md)
* [Ambiguous Query Expansion](/rag-scenarios-and-solutions/accuracy/ambiguous-questions.md)
* [Knowledge Base Drift](/rag-scenarios-and-solutions/accuracy/factual-drift.md)
* [Context Relevance Decay](/rag-scenarios-and-solutions/accuracy/context-relevance-decay.md)
* [Multi-Hop Reasoning Failure](/rag-scenarios-and-solutions/accuracy/multi-hop-failure.md)

## The Retrieval Pipeline

Understanding the stages helps diagnose where failures occur:

```
User Query
    ↓
Query Enhancement (optional)
    ↓
Embedding / Vector Search
    ↓
Candidate Retrieval (top 20-100)
    ↓
Filtering (permissions, freshness)
    ↓
Reranking (top 5-10)
    ↓
Context Assembly
    ↓
LLM Generation
```

**Common failure points:**

* **Query Enhancement**: Misinterprets intent, adds noise
* **Vector Search**: Embedding doesn't capture query meaning
* **Candidate Retrieval**: Relevant docs not in top candidates
* **Filtering**: Over-aggressive filtering removes good results
* **Reranking**: Wrong docs prioritized
* **Context Assembly**: Important information left out or poorly ordered

## Retrieval Strategies

Different approaches for different scenarios:

### 1. Dense Retrieval (Vector Search)

**How it works:** Embed query and documents, find nearest neighbors

**Strengths:**

* Semantic understanding
* Handles synonyms and paraphrasing
* Language-agnostic

**Weaknesses:**

* Poor with rare terms or exact matches
* Opaque (hard to debug)
* Sensitive to embedding quality

**Best for:** Natural language queries, conceptual questions

### 2. Sparse Retrieval (Keyword Search)

**How it works:** TF-IDF, BM25, or keyword matching

**Strengths:**

* Excellent for exact term matches
* Fast and explainable
* Works well with technical terms

**Weaknesses:**

* No semantic understanding
* Misses synonyms and variations
* Language-specific

**Best for:** Technical queries, product names, error codes

### 3. Hybrid Retrieval

**How it works:** Combine dense + sparse, merge results

**Strengths:**

* Best of both worlds
* More robust across query types
* Handles both semantic and exact matching

**Weaknesses:**

* More complex to implement and tune
* Need to balance weighting

**Best for:** Production RAG systems handling diverse queries

### 4. Multi-Stage Retrieval

**How it works:**

1. Broad retrieval (vector or keyword)
2. Precise reranking (cross-encoder)
3. Optional LLM-based final selection

**Strengths:**

* High recall + high precision
* Best accuracy for critical applications

**Weaknesses:**

* Higher latency
* Increased cost

**Best for:** High-stakes queries where accuracy is critical

## Best Practices

### Query Enhancement

1. **Query expansion** - Add synonyms, related terms (but test carefully)
2. **Query rewriting** - Rephrase for better matching ("How do I..." → "Steps to...")
3. **Spell correction** - Fix typos before search
4. **Intent detection** - Route different query types differently

### Retrieval Configuration

1. **Tune candidate count** - Retrieve enough (20-100) to ensure relevant docs are included
2. **Set appropriate thresholds** - Don't retrieve below a minimum similarity score
3. **Use metadata filters** - Narrow by date, source, topic when applicable
4. **Implement fallbacks** - If vector search fails, try keyword search

### Reranking

1. **Use cross-encoder models** - More accurate than embedding similarity alone
2. **Consider LLM reranking** - Ask LLM to rank relevance (expensive but effective)
3. **Diversity in results** - Don't return 10 chunks from the same document
4. **Recency boost** - Favor newer content when appropriate

### Context Assembly

1. **Order matters** - Put most relevant context first and last (primacy/recency)
2. **Deduplicate** - Remove redundant chunks
3. **Include metadata** - Source, date, confidence helps LLM assess reliability
4. **Stay under token limit** - Truncate intelligently if needed

### Continuous Improvement

1. **Log all queries and retrievals** - Build datasets for analysis
2. **Track retrieval metrics** - Precision, recall, MRR, NDCG
3. **Collect user feedback** - Thumbs up/down on answers
4. **A/B test changes** - Compare retrieval strategies empirically
5. **Monitor edge cases** - Focus on query types with high failure rates

## Retrieval Metrics

Measure these to track accuracy:

### Retrieval Quality

* **Precision\@k** - Of top k results, how many are relevant?
* **Recall\@k** - Of all relevant docs, what % are in top k?
* **MRR (Mean Reciprocal Rank)** - Average of 1/rank of first relevant doc
* **NDCG (Normalized Discounted Cumulative Gain)** - Quality of ranking

### Answer Quality

* **Groundedness** - % of response claims supported by retrieved context
* **Completeness** - Does response fully answer the question?
* **Citation accuracy** - Are citations correct and verifiable?
* **User satisfaction** - Thumbs up/down, ratings

### System Health

* **Zero-result queries** - % of queries with no retrieval
* **Low-confidence retrievals** - % below similarity threshold
* **Retrieval latency** - P50, P95, P99 times
* **Cost per query** - Embedding, vector search, reranking costs

## Advanced Techniques

### Query Decomposition

Break complex questions into sub-queries:

**User:** "How did revenue change from Q1 to Q2, and what caused it?"

**Decomposed:**

1. "What was Q1 revenue?"
2. "What was Q2 revenue?"
3. "What factors affected Q2 revenue?"

Retrieve separately, synthesize answer from combined results.

### Hypothetical Document Embeddings (HyDE)

Instead of embedding the question:

1. Use LLM to generate hypothetical answer
2. Embed the hypothetical answer
3. Search for documents similar to that answer

Works well when questions don't resemble documents.

### Multi-Vector Retrieval

Store multiple embeddings per document:

* Summary embedding (for high-level matches)
* Detailed embedding (for specific facts)
* Question embeddings (for FAQ matching)

Retrieve using appropriate embedding type per query.

### Contextual Retrieval

Prepend each chunk with document context before embedding:

**Original chunk:** "Revenue increased 15%"

**With context:** "\[Q2 2024 Financial Report] Revenue increased 15%"

This preserves context and improves retrieval accuracy.

### Iterative Retrieval

1. Retrieve initial context
2. Generate preliminary answer
3. Identify gaps or follow-up questions
4. Retrieve additional context
5. Generate final answer

Useful for complex, multi-part questions.

## Quick Diagnostics

**Signs your retrieval needs improvement:**

* ✗ Agents say "I don't know" when answer is in knowledge base
* ✗ Retrieved chunks don't seem relevant to the question
* ✗ Correct answer exists but not in top results
* ✗ Responses are vague or incomplete
* ✗ Similar queries get wildly different results
* ✗ Agents hallucinate despite having relevant data
* ✗ High similarity scores but poor answer quality

**Signs your retrieval is working well:**

* ✓ Retrieved chunks are clearly relevant to query
* ✓ Correct information consistently appears in top results
* ✓ Complete answers without hallucination
* ✓ Appropriate "I don't know" when info truly doesn't exist
* ✓ Good performance across diverse query types
* ✓ Citations are accurate and helpful
* ✓ Users report high satisfaction

## Debugging Poor Retrieval

When retrieval fails, investigate systematically:

### Step 1: Is the information in the knowledge base?

* Search manually for the answer
* Check if source document was ingested
* Verify document processed and chunked correctly

### Step 2: Are relevant chunks being retrieved?

* Inspect top 10-20 candidate chunks
* Check similarity scores
* Look at chunk content vs query

### Step 3: Is the query being understood correctly?

* Review query embedding
* Test query variations
* Check query enhancement/rewriting

### Step 4: Is reranking working?

* Compare pre- and post-reranking results
* Check reranker scores
* Test different reranking strategies

### Step 5: Is context assembled well?

* Review final context sent to LLM
* Check ordering and completeness
* Verify token limits not exceeded

**Bottom line**: Retrieval accuracy is the linchpin of RAG. If your retrieval is poor, no amount of prompt engineering or LLM sophistication will save you. Invest time in getting this right—it's the highest-leverage improvement you can make.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/accuracy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
