Query Understanding Logs

The Problem

Cannot track how user queries are interpreted, expanded, or rewritten, making it impossible to debug retrieval failures or improve query processing.

Symptoms

❌ Don't know if query reformulation helped
❌ Cannot see synonym expansion
❌ Intent classification opaque
❌ Cannot debug "no results" queries
❌ Query rewriting unexplained

Real-World Example

User query: "How do I nuke my account?"

System processing (hidden):
→ Detected intent: Account deletion
→ Expanded: "nuke" → ["delete", "remove", "terminate", "cancel"]
→ Rewritten query: "How to delete my account?"
→ Retrieved chunks successfully

User sees: Correct answer
But: Cannot understand WHY it worked
→ What if "nuke" not in synonym list?
→ No visibility into query processing pipeline

Deep Technical Analysis

Query Processing Pipeline

Stages to Log:

{
  "original_query": "How do I nuke my account?",
  "processing_pipeline": [
    {
      "stage": "intent_classification",
      "result": {
        "intent": "account_deletion",
        "confidence": 0.87
      }
    },
    {
      "stage": "synonym_expansion",
      "expansions": {
        "nuke": ["delete", "remove", "terminate", "cancel", "close"]
      }
    },
    {
      "stage": "query_rewriting",
      "rewritten": "How to delete my account?",
      "method": "template_based"
    },
    {
      "stage": "spell_correction",
      "corrections": []
    }
  ],
  "final_query": "How to delete my account?"
}

Intent Classification Tracking

Intent Detection:

Query: "I can't log in"

Classification:
→ Intent: authentication_issue
→ Sub-intent: login_failure
→ Confidence: 0.92

Log:
{
  "query": "I can't log in",
  "intent": "authentication_issue",
  "sub_intent": "login_failure",
  "confidence": 0.92,
  "alternative_intents": [
    {"intent": "password_reset", "confidence": 0.35},
    {"intent": "account_locked", "confidence": 0.28}
  ]
}

Enables:
→ Route to authentication docs
→ Filter retrieval
→ Provide targeted help

Low-Confidence Intents:

Confidence < 0.70:
→ Ambiguous query
→ Multiple possible intents

Example:
Query: "API"
→ Too generic
→ Intent unclear

Log + alert:
→ Consider asking user to clarify
→ Or: Retrieve broadly

Query Expansion Logging

Synonym Expansion:

Query: "delete account"

Expansions:
→ "delete" → ["remove", "erase", "terminate"]
→ "account" → ["profile", "user", "subscription"]

Expanded query:
"(delete OR remove OR erase OR terminate) (account OR profile OR user)"

Log expansion:
→ Track which synonyms used
→ Measure impact on retrieval

Acronym Expansion:

Query: "How to configure RBAC?"

Expansion:
→ "RBAC" → "Role-Based Access Control"

Expanded:
"How to configure RBAC (Role-Based Access Control)?"

Helps match documents using full term

Query Rewriting Analysis

Template-Based Rewriting:

Pattern: "How do I [action]?"
→ Rewrite: "To [action], follow these steps"

Example:
→ Input: "How do I reset password?"
→ Output: "To reset password, follow these steps"

Matches document phrasing better
→ Log: Which templates applied
→ Measure: Did rewriting improve retrieval?

Spell Correction:

Query: "athentication setup" (typo)

Correction:
→ "athentication" → "authentication"
→ Confidence: 0.95

Corrected query: "authentication setup"

Log:
{
  "original": "athentication",
  "corrected": "authentication",
  "method": "edit_distance",
  "confidence": 0.95
}

Impact Measurement

Retrieval Improvement:

Compare:
→ Original query retrieval: 3 relevant chunks (P@5 = 0.60)
→ Rewritten query retrieval: 5 relevant chunks (P@5 = 1.00)

Improvement: +40pp

Log:
→ Query rewriting helped
→ Validates technique

Failed Queries:

Original: "How to nuke account?"
→ No results (score < 0.60)

After expansion:
→ 5 results (score > 0.75)

Log:
→ Query processing saved failed query
→ Important for quality metrics

How to Solve

Log original + processed queries for every request + track intent classification (intent, confidence) + log query expansions (synonyms, acronyms) + record query rewriting steps + monitor spell corrections + measure retrieval improvement (before vs after processing) + alert on low-confidence intent classification + analyze which query processing techniques help most + build query understanding dashboard. See Query Processing.

PreviousAgent Decision Tracing NextSource Attribution Tracking

Last updated 18 minutes ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagQuery Processing Pipeline

hashtagIntent Classification Tracking

hashtagQuery Expansion Logging

hashtagQuery Rewriting Analysis

hashtagImpact Measurement

hashtagHow to Solve