Reranking Score Analysis

The Problem

Cannot evaluate reranking effectiveness or debug why reranker changes initial retrieval order, making optimization impossible.

Symptoms

  • ❌ Don't know if reranking helps

  • ❌ Cannot see score changes (retrieval → reranking)

  • ❌ Unexpected rank reversals

  • ❌ No metrics for reranker quality

  • ❌ Cannot compare reranking models

Real-World Example

Initial retrieval (vector search):
→ #1: Chunk A (score: 0.85)
→ #2: Chunk B (score: 0.83)
→ #3: Chunk C (score: 0.80)

After reranking (Cohere Rerank):
→ #1: Chunk C (score: 0.92) ← promoted
→ #2: Chunk A (score: 0.88) ← demoted
→ #3: Chunk B (score: 0.75) ← demoted

Why did Chunk C jump from #3 to #1?
→ No visibility into reranker reasoning
→ Cannot validate if correct

Deep Technical Analysis

Reranking Purpose

Query-Document Interaction:

Example Improvement:

Reranking Metrics

Precision Improvement:

Rank Correlation:

Score Distribution Analysis

Score Spread:

Confidence Calibration:

Debugging Rank Changes

Promotion/Demotion Tracking:

Reranker Explanation:

Cost-Benefit Analysis

Reranking Cost:


How to Solve

Log both vector scores and rerank scores for comparison + track rank changes (promoted/demoted chunks) + measure Precision@K before and after reranking + calculate rank correlation (Spearman) improvement + monitor score distribution spread + test reranker models on eval set + analyze cost vs quality trade-off + investigate large rank changes (±5 positions) for validation. See Reranking Analysis.

Last updated