Response Inconsistency

The Problem

Same query returns different answers on repeated asks, causing user confusion and reducing trust in the AI system.

Symptoms

❌ Different facts for identical queries
❌ Contradictory answers in same session
❌ Unpredictable response quality
❌ Cannot reproduce errors
❌ Users report "AI is unreliable"

Real-World Example

Query: "What's the API rate limit?"

Response 1: "The rate limit is 1000 requests per hour"
Response 2: "You can make up to 100 requests per minute"
Response 3: "Rate limits vary by plan tier"

All from same knowledge base!
Causes: Different retrieved chunks, different model sampling, different timestamp

Deep Technical Analysis

Sources of Inconsistency

Retrieval Variance:

Vector search returns slightly different results:
→ Query embedding has minor variations
→ Reranker scores fluctuate
→ Top-K boundary shifts

Query 1 retrieves: Chunks A, B, C
Query 2 (identical) retrieves: Chunks A, C, D
→ Different context → different answer

Temperature/Sampling:

With temperature > 0:
→ Stochastic token sampling
→ Different paths through text generation
→ Same context → different phrasing

Can cause factual inconsistency if rephrasing drifts

Timestamp Dependencies:

Knowledge base updates:
→ Morning: Old doc present
→ Afternoon: Doc updated
→ Same query, different answers

Time-based inconsistency

How to Solve

Set temperature=0 for deterministic output + implement query deduplication with caching + use consistent retrieval (fixed random seed) + log query-response pairs for debugging + add "last updated" timestamps to responses. See Response Consistency.

PreviousModel Switching Mid-Conversation NextRefusal to Answer

Last updated 1 minute ago

hashtagThe Problem

hashtagSymptoms

hashtagReal-World Example

hashtagDeep Technical Analysis

hashtagSources of Inconsistency

hashtagHow to Solve