Cedar - Context-Aware

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

Overview

Cedar improves upon standard RAG by understanding conversation context:

  • Analyzes conversation history (memory)

  • Rewrites ambiguous queries to be more explicit

  • Maintains context across multiple turns

  • Better handles follow-up questions

Performance: ~2-3 seconds Ideal for: Conversational queries, follow-up questions, ambiguous phrasing

How Cedar Works

Processing Flow

User Query: "What about pricing?"

[1] Analyze Conversation Memory
    → Previous: "Tell me about the Enterprise plan"

[2] Rewrite Query with Context
    → Rewritten: "What is the pricing for the Enterprise plan?"

[3] Embed Rewritten Query → Vector [0.15, -0.43, 0.76, ...]

[4] Vector Search (Pinecone/TigrisDB)

[5] Retrieve Top 5-10 Documents

[6] Build Context from Documents

[7] LLM Completion (with context + original query)

Response: "The Enterprise plan costs $299/month..."

Technical Details

Step 1: Memory Analysis

  • Reviews last 3-5 conversation turns

  • Identifies context: entities, topics, intents

  • Determines if query needs clarification

Step 2: Query Rewriting

  • Model: gpt-4o-mini (fast, cost-effective)

  • Prompt: "Rewrite this query based on conversation history"

  • Output: More explicit, self-contained query

Example Rewrites:

Step 3-7: Standard RAG Flow Same as Redwood after query rewriting

Performance Characteristics

Latency Breakdown

Token Usage

Component
Tokens
Notes

Memory Context

100-300

Last 3-5 turns

Rewriting Prompt

50-100

Instructions + query

Rewritten Query

10-30

Output of rewriting

System Prompt

150-300

Agent instructions

Retrieved Context

800-1500

Top 5-10 documents

User Query

10-50

Original question

Response

150-400

Generated answer

Total

~1,800-2,500

Per request

Cost Implications

Per 1,000 Requests (GPT-3.5-turbo):

  • Embedding: ~$0.01

  • Query Rewriting: ~$0.10 ← Additional cost

  • LLM Completion: ~$0.40

  • Vector Search: ~$0.05

  • Total: ~$0.56 (+55% vs Redwood)

When to Use Cedar

✅ Ideal Use Cases

1. Conversational Support Bots

2. Multi-Turn Consultations

3. Contextual Knowledge Queries

4. Ambiguous Questions

❌ Not Ideal For

1. Clear, Self-Contained Questions

2. First-Time Queries (No Context)

3. Cost-Sensitive High-Volume

Configuration

Agent Settings

When using Cedar strategy:

Optimization Tips

1. Tune Memory Length

2. Rewriting Prompt Quality

3. Hybrid Approach

Comparison with Other Strategies

vs. Redwood (Standard)

Metric
Redwood
Cedar
Advantage

Speed

~1.2s

~2.0s

Redwood

Cost

$0.36/1k

$0.56/1k

Redwood

Conversational

❌ Poor

✅ Excellent

Cedar

Follow-ups

❌ Poor

✅ Excellent

Cedar

Clear queries

✅ Perfect

✅ Good

Redwood

Complexity

Low

Medium

Redwood

When to Switch:

  • Redwood → Cedar: When > 30% of queries are follow-ups

  • Cedar → Redwood: When speed is critical and questions are clear

vs. Cypress (Advanced)

Metric
Cedar
Cypress
Advantage

Speed

~2.0s

~3.5s

Cedar

Cost

$0.56/1k

$0.90/1k

Cedar

Accuracy

Good

Excellent

Cypress

Reranking

❌ No

✅ Yes

Cypress

Query Expansion

❌ No

✅ Yes

Cypress

Tier-Based

❌ No

✅ Yes

Cypress

When to Switch:

  • Cedar → Cypress: When accuracy is more important than speed/cost

  • Cypress → Cedar: When speed matters and accuracy is good enough

Real-World Performance

Case Study: E-commerce Customer Support

Setup:

  • 2,000 product FAQs

  • Average conversation: 3-4 turns

  • 50,000 queries/month

  • 70% are follow-up questions

Redwood Results (Before):

  • Average latency: 1.1s ✅

  • User satisfaction: 3.2/5 ❌

  • Common complaint: "Doesn't understand my follow-up"

  • Accuracy: 72%

Cedar Results (After):

  • Average latency: 2.1s (slower but acceptable)

  • User satisfaction: 4.4/5 ✅

  • Accuracy: 89% ✅

  • ROI: +37% fewer support escalations

Case Study: SaaS Documentation Bot

Setup:

  • 5,000 documentation pages

  • Technical audience

  • Average query: "How to configure X"

  • Follow-ups: "What about Y setting?"

Cedar Performance:

  • Latency: 1.9s (p95)

  • Follow-up handling: 94% success rate

  • Cost: $0.53/1k requests

  • User feedback: "Finally understands context!"

Key Insight: Cedar excels when users naturally have multi-turn conversations, even if each individual query takes slightly longer.

Advanced Features

Memory Summarization

When conversation gets long, Cedar automatically summarizes:

Configuration:

Contextual Entity Tracking

Cedar tracks entities across turns:

Intent Preservation

Cedar maintains user intent:

Monitoring Cedar

Key Metrics to Track

1. Rewriting Quality

2. Context Relevance

3. Latency Impact

4. User Satisfaction

Common Issues

Rewriting Too Aggressively:

Missing Context:

Slow Performance:

Best Practices

1. Memory Management

✅ Keep memory focused on relevant context ✅ Clear memory for new topics ✅ Summarize long conversations ❌ Don't include entire conversation verbatim

2. Rewriting Prompts

✅ Be specific about what to preserve ✅ Include examples of good rewrites ✅ Instruct to add context, not change meaning ❌ Don't make rewrites too verbose

3. Testing

✅ Test with real conversation flows ✅ Compare results: original vs rewritten query ✅ A/B test Cedar vs Redwood ❌ Don't test only single-turn queries

4. Monitoring

✅ Track rewriting effectiveness ✅ Monitor latency impact ✅ Measure user satisfaction ✅ Review failed rewrites ❌ Don't assume it's working without data

Migration Guide

From Redwood to Cedar

Step 1: Enable Cedar

Step 2: Monitor Initial Performance

  • First 24 hours: Watch latency

  • Compare accuracy metrics

  • Check user feedback

Step 3: Tune Configuration

Step 4: Gradual Rollout

From Cedar to Cypress

When to upgrade:

  • Accuracy is critical

  • Budget allows for higher costs

  • Willing to accept 3-4s latency

  • Need tier-based source control

Code Examples

Using Cedar via API

Response Format

Next Steps

Last updated