# Query-Document Mismatch

## The Problem

User queries phrased differently from document language, causing embedding mismatch and retrieval failure despite semantic similarity.

### Symptoms

* ❌ User asks in casual language, docs formal
* ❌ Synonym mismatch ("delete" vs "remove")
* ❌ Question format vs statement format
* ❌ Different terminology conventions
* ❌ Domain jargon vs layperson terms

### Real-World Example

```
Document: "To terminate your subscription, navigate to Account Settings
and select the 'Cancel Subscription' option."

User query: "How do I stop paying for this?"

Embedding mismatch:
→ Query: "stop paying"
→ Doc: "terminate subscription", "cancel"
→ Semantic gap
→ Low similarity score
→ Not retrieved

Document has the answer but isn't found
```

***

## Deep Technical Analysis

### Vocabulary Gap

**Formal vs Casual:**

```
Docs: "Authenticate using OAuth 2.0 authorization code flow"
Query: "How do I log in?"

Embedding distance:
→ "authenticate", "OAuth", "authorization" (technical)
→ "log in" (casual)
→ Semantic link weak in embedding space
```

**Acronyms vs Full Terms:**

```
Docs: "RBAC policy configuration"
Query: "How to set up role-based access control?"

Problem:
→ "RBAC" embedded differently from "role-based access control"
→ Should be same concept
→ But: Model may not know they're equivalent
```

### Question vs Statement

**Format Mismatch:**

```
Doc: "The API rate limit is 1000 requests per hour"
(Statement format)

Query: "What is the API rate limit?"
(Question format)

Embeddings differ:
→ Question words ("what", "how", "when")
→ Not present in statement
→ Reduces similarity
```

**Query Reformulation:**

```
Solution: Rewrite query to statement
→ "What is X?" → "X is"
→ "How to do Y?" → "To do Y"

More likely to match document phrasing
```

### Query Expansion

**Synonym Expansion:**

```
Original query: "delete account"

Expand to:
→ "delete account"
→ "remove account"
→ "close account"
→ "cancel account"
→ "terminate account"

Embed all variations:
→ More likely to match doc phrasing
→ Retrieve if doc uses any synonym
```

**Pros/Cons:**

```
Pros:
+ Better recall
+ Matches more docs

Cons:
- More embeddings = slower
- May retrieve less relevant (noise)
- Balance precision vs recall
```

### Dense vs Sparse Representations

**Hybrid Search:**

```
Semantic (dense):
→ Captures meaning
→ Works for paraphrases

Keyword (sparse):
→ Exact term matches
→ Good for technical terms, IDs

Combine:
→ Semantic finds "log in" → "authenticate"
→ Keyword ensures "OAuth" → "OAuth"
→ Best of both
```

***

## How to Solve

**Implement query expansion with synonyms + use hybrid search (semantic + keyword) + apply query rewriting (question → statement format) + fine-tune embeddings on query-document pairs from your domain + add query understanding layer (detect intent, reformulate) + use cross-encoder reranking to bridge vocabulary gaps.** See [Query Understanding](/rag-scenarios-and-solutions/accuracy/query-interpretation.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/accuracy/query-interpretation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
