Temperature Setting Issues

The Problem

Incorrect temperature settings cause either robotic, repetitive responses or creative but factually incorrect answers, degrading RAG quality.

Symptoms

  • ❌ Identical phrasing for similar queries

  • ❌ Overly formal/robotic responses

  • ❌ Creative but wrong information added

  • ❌ Inconsistent response style

  • ❌ Cannot balance accuracy vs natural language

Real-World Example

Temperature 0.0 (deterministic):
Query 1: "How to authenticate?"
Response: "To authenticate, use the API key in the Authorization header."

Query 2: "What's the auth method?"
Response: "To authenticate, use the API key in the Authorization header."

Exact same wording → robotic

Temperature 1.5 (creative):
Query: "API rate limit?"
Response: "The API implements a sophisticated adaptive rate limiting
system that adjusts based on your usage patterns and account tier,
typically allowing between 800-1200 requests per hour..."

Added details not in context → hallucination

Deep Technical Analysis

Temperature Parameter Mechanics

Controls randomness in token selection:

How It Works:

RAG-Specific Considerations

RAG needs factual accuracy:

Low Temperature (0.0-0.3) Benefits:

High Temperature (0.8-1.5) Risks:

The Sweet Spot

Balancing accuracy and fluency:

Temperature 0.3-0.5:

Context-Dependent Adjustment:

Top-P (Nucleus) Sampling

Alternative to temperature:

Top-P Mechanism:

Combined Settings:


How to Solve

Set temperature=0.3-0.5 for RAG (factual grounding) + use top_p=0.9-0.95 for additional control + adjust per query type (lower for factual, higher for explanatory) + test with eval set to find optimal balance + never exceed 0.7 for production RAG. See Temperature Tuning.

Last updated