Cedar - Context-Aware

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

Overview

Cedar improves upon standard RAG by understanding conversation context:

Analyzes conversation history (memory)
Rewrites ambiguous queries to be more explicit
Maintains context across multiple turns
Better handles follow-up questions

Performance: ~2-3 seconds Ideal for: Conversational queries, follow-up questions, ambiguous phrasing

How Cedar Works

Processing Flow

User Query: "What about pricing?"
     ↓
[1] Analyze Conversation Memory
    → Previous: "Tell me about the Enterprise plan"
     ↓
[2] Rewrite Query with Context
    → Rewritten: "What is the pricing for the Enterprise plan?"
     ↓
[3] Embed Rewritten Query → Vector [0.15, -0.43, 0.76, ...]
     ↓
[4] Vector Search (Pinecone/TigrisDB)
     ↓
[5] Retrieve Top 5-10 Documents
     ↓
[6] Build Context from Documents
     ↓
[7] LLM Completion (with context + original query)
     ↓
Response: "The Enterprise plan costs $299/month..."

Technical Details

Step 1: Memory Analysis

Reviews last 3-5 conversation turns
Identifies context: entities, topics, intents
Determines if query needs clarification

Step 2: Query Rewriting

Model: gpt-4o-mini (fast, cost-effective)
Prompt: "Rewrite this query based on conversation history"
Output: More explicit, self-contained query

Example Rewrites:

Original: "How do I do that?"
Context: Previous discussion about password reset
Rewritten: "How do I reset my password?"

Original: "What about the other option?"
Context: Comparing Pro vs Enterprise plans
Rewritten: "What features are included in the Pro plan?"

Original: "Is it available?"
Context: Asking about SSO feature
Rewritten: "Is SSO (Single Sign-On) available in the platform?"

Step 3-7: Standard RAG Flow Same as Redwood after query rewriting

Performance Characteristics

Latency Breakdown

Memory Analysis:       ~100ms
Query Rewriting:       ~500ms ← Additional cost vs Redwood
Query Embedding:       ~100ms
Vector Search:         ~200ms
Context Building:      ~50ms
LLM Completion:        ~800ms
Response Streaming:    ~200ms
────────────────────────────
Total:                 ~2.0s

Token Usage

Component

Tokens

Notes

Memory Context

100-300

Last 3-5 turns

Rewriting Prompt

50-100

Instructions + query

Rewritten Query

10-30

Output of rewriting

System Prompt

150-300

Agent instructions

Retrieved Context

800-1500

Cost Implications

Per 1,000 Requests (GPT-3.5-turbo):

Embedding: ~$0.01
Query Rewriting: ~$0.10 ← Additional cost
LLM Completion: ~$0.40
Vector Search: ~$0.05
Total: ~$0.56 (+55% vs Redwood)

When to Use Cedar

✅ Ideal Use Cases

1. Conversational Support Bots

User: "What are your pricing plans?"
Agent: "We have Free, Pro, and Enterprise..."
User: "What's included in the middle one?" ✅
→ Cedar rewrites: "What's included in the Pro plan?"

2. Multi-Turn Consultations

User: "I need help with email integration"
Agent: "We support Gmail and Outlook..."
User: "How do I set up the second one?" ✅
→ Cedar rewrites: "How do I set up Outlook integration?"

3. Contextual Knowledge Queries

User: "Tell me about data sources"
Agent: "We support Google Drive, Confluence..."
User: "How often does it sync?" ✅
→ Cedar rewrites: "How often do data sources sync?"

4. Ambiguous Questions

User: "What's the difference?" ✅
→ Context: Comparing RAG strategies
→ Cedar rewrites: "What's the difference between Redwood and Cedar RAG strategies?"

❌ Not Ideal For

1. Clear, Self-Contained Questions

User: "What is your refund policy?" ❌
→ No rewriting needed, Redwood is faster
→ Cedar adds unnecessary latency

2. First-Time Queries (No Context)

User: "Hello, what can you help with?" ❌
→ No previous conversation to leverage
→ Redwood sufficient

3. Cost-Sensitive High-Volume

10,000+ queries/day with tight budget ❌
→ 55% higher cost than Redwood
→ Consider Redwood for majority, Cedar for subset

Configuration

Agent Settings

When using Cedar strategy:

{
  "strategyCode": "INFER_QUESTION",
  "topK": 5-10,                    // Number of documents
  "temperature": 0.7,              // LLM creativity
  "maxTokens": 500,                // Response length
  "model": "gpt-4o",               // Or gpt-3.5-turbo
  "memoryTurns": 3,                // Conversation turns to remember
  "rewritingModel": "gpt-4o-mini"  // Fast model for rewriting
}

Optimization Tips

1. Tune Memory Length

Too few (1-2):  Misses context
Sweet spot (3-5): Good balance
Too many (10+):  Noise, slower

2. Rewriting Prompt Quality

Good Prompt:
"Rewrite the query to be self-contained based on conversation history.
Focus on entities, topics, and intents mentioned earlier."

Bad Prompt:
"Make query better"  ← Too vague

3. Hybrid Approach

# Use Cedar only when needed
if has_conversation_history and query_is_ambiguous:
    use_cedar()
else:
    use_redwood()  # Faster for clear questions

Comparison with Other Strategies

vs. Redwood (Standard)

Metric

Redwood

Cedar

Advantage

Speed

~1.2s

~2.0s

Redwood

Cost

$0.36/1k

$0.56/1k

Redwood

Conversational

❌ Poor

✅ Excellent

Cedar

Follow-ups

❌ Poor

✅ Excellent

Cedar

Clear queries

✅ Perfect

✅ Good

Redwood

Complexity

Low

Medium

Redwood

When to Switch:

Redwood → Cedar: When > 30% of queries are follow-ups
Cedar → Redwood: When speed is critical and questions are clear

vs. Cypress (Advanced)

Metric

Cedar

Cypress

Advantage

Speed

~2.0s

~3.5s

Cedar

Cost

$0.56/1k

$0.90/1k

Cedar

Accuracy

Good

Excellent

Cypress

Reranking

❌ No

✅ Yes

Cypress

Query Expansion

❌ No

✅ Yes

Cypress

Tier-Based

❌ No

✅ Yes

Cypress

When to Switch:

Cedar → Cypress: When accuracy is more important than speed/cost
Cypress → Cedar: When speed matters and accuracy is good enough

Real-World Performance

Case Study: E-commerce Customer Support

Setup:

2,000 product FAQs
Average conversation: 3-4 turns
50,000 queries/month
70% are follow-up questions

Redwood Results (Before):

Average latency: 1.1s ✅
User satisfaction: 3.2/5 ❌
Common complaint: "Doesn't understand my follow-up"
Accuracy: 72%

Cedar Results (After):

Average latency: 2.1s (slower but acceptable)
User satisfaction: 4.4/5 ✅
Accuracy: 89% ✅
ROI: +37% fewer support escalations

Case Study: SaaS Documentation Bot

Setup:

5,000 documentation pages
Technical audience
Average query: "How to configure X"
Follow-ups: "What about Y setting?"

Cedar Performance:

Latency: 1.9s (p95)
Follow-up handling: 94% success rate
Cost: $0.53/1k requests
User feedback: "Finally understands context!"

Key Insight: Cedar excels when users naturally have multi-turn conversations, even if each individual query takes slightly longer.

Advanced Features

Memory Summarization

When conversation gets long, Cedar automatically summarizes:

Turns 1-10: Full context (tokens: 2000)
Turns 11+:  Summarized (tokens: 300)

Summary Example:
"User is asking about Enterprise plan features,
specifically interested in SSO, API access, and
white-labeling. Budget conscious."

Configuration:

{
  "memoryTurns": 5,           // Remember last 5 turns
  "summarizeAfter": 10,       // Summarize after 10 turns
  "maxMemoryTokens": 500      // Max memory context
}

Contextual Entity Tracking

Cedar tracks entities across turns:

Turn 1: "Tell me about the Enterprise plan"
→ Tracked: product=Enterprise plan

Turn 2: "What's the pricing?"
→ Rewritten: "What is the pricing for the Enterprise plan?"
→ Uses tracked entity

Turn 3: "How about for 100 users?"
→ Rewritten: "What is the pricing for the Enterprise plan for 100 users?"
→ Compounds context

Intent Preservation

Cedar maintains user intent:

User: "I need to integrate with Google"
→ Intent: Integration inquiry
→ Entity: Google

User: "How long does setup take?"
→ Rewritten: "How long does Google integration setup take?"
→ Preserves integration intent

Monitoring Cedar

Key Metrics to Track

1. Rewriting Quality

Metric: % queries actually rewritten
Target: 40-60% (not everything needs rewriting)
Alert:  > 80% (too aggressive) or < 20% (not working)

2. Context Relevance

Metric: % rewritten queries with improved results
Target: > 85%
Measure: Compare retrieval results before/after rewriting

3. Latency Impact

Metric: Average rewriting latency
Target: < 600ms
Alert:  > 1s (investigate)

4. User Satisfaction

Metric: Thumbs up/down on responses
Compare: Sessions with > 3 turns vs single turn
Target:  Multi-turn satisfaction > 85%

Common Issues

Rewriting Too Aggressively:

Symptom: Every query is rewritten
Cause:   Memory window too large
Fix:     Reduce memoryTurns to 3-4

Missing Context:

Symptom: Rewritten queries still ambiguous
Cause:   Important entities not tracked
Fix:     Improve entity extraction prompt

Slow Performance:

Symptom: > 3s response time
Cause:   Rewriting model too large
Fix:     Switch to gpt-4o-mini from gpt-4o

Best Practices

1. Memory Management

✅ Keep memory focused on relevant context ✅ Clear memory for new topics ✅ Summarize long conversations ❌ Don't include entire conversation verbatim

2. Rewriting Prompts

✅ Be specific about what to preserve ✅ Include examples of good rewrites ✅ Instruct to add context, not change meaning ❌ Don't make rewrites too verbose

3. Testing

✅ Test with real conversation flows ✅ Compare results: original vs rewritten query ✅ A/B test Cedar vs Redwood ❌ Don't test only single-turn queries

4. Monitoring

✅ Track rewriting effectiveness ✅ Monitor latency impact ✅ Measure user satisfaction ✅ Review failed rewrites ❌ Don't assume it's working without data

Migration Guide

From Redwood to Cedar

Step 1: Enable Cedar

agent.strategyCode = "INFER_QUESTION"
agent.save()

Step 2: Monitor Initial Performance

First 24 hours: Watch latency
Compare accuracy metrics
Check user feedback

Step 3: Tune Configuration

// Start conservative
memoryTurns: 3
maxMemoryTokens: 300

// Adjust based on performance
if (accuracy < target) memoryTurns++
if (latency > 2.5s) memoryTurns--

Step 4: Gradual Rollout

Day 1-7:   10% traffic
Day 8-14:  50% traffic
Day 15+:   100% traffic (if metrics good)

From Cedar to Cypress

When to upgrade:

Accuracy is critical
Budget allows for higher costs
Willing to accept 3-4s latency
Need tier-based source control

Code Examples

Using Cedar via API

curl -X POST https://api.twig.so/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What about pricing?",
    "agentId": "agent-123",
    "strategyCode": "INFER_QUESTION",
    "sessionId": "session-456",
    "memory": [
      {
        "role": "user",
        "content": "Tell me about the Enterprise plan"
      },
      {
        "role": "assistant",
        "content": "The Enterprise plan includes..."
      }
    ]
  }'

Response Format

{
  "response": "The Enterprise plan costs $299/month for up to 100 users...",
  "rewrittenQuery": "What is the pricing for the Enterprise plan?",
  "sources": [
    {
      "title": "Enterprise Pricing",
      "url": "https://example.com/pricing",
      "relevance": 0.96
    }
  ],
  "metadata": {
    "strategy": "INFER_QUESTION",
    "latency": 2.1,
    "tokensUsed": 2134,
    "rewritingLatency": 0.48,
    "confidence": 0.94
  }
}

Next Steps

Cypress Strategy - Maximum accuracy with reranking
Redwood Strategy - Fastest option for clear queries
Performance Tuning
Cost Optimization

PreviousRedwood - Standard RAG NextCypress - Advanced

Last updated 7 hours ago

hashtagOverview

hashtagHow Cedar Works

hashtagProcessing Flow

hashtagTechnical Details

hashtagPerformance Characteristics

hashtagLatency Breakdown

hashtagToken Usage

hashtagCost Implications

hashtagWhen to Use Cedar

hashtag✅ Ideal Use Cases

hashtag❌ Not Ideal For

hashtagConfiguration

hashtagAgent Settings

hashtagOptimization Tips

hashtagComparison with Other Strategies

hashtagvs. Redwood (Standard)

hashtagvs. Cypress (Advanced)

hashtagReal-World Performance

hashtagCase Study: E-commerce Customer Support

hashtagCase Study: SaaS Documentation Bot

hashtagAdvanced Features

hashtagMemory Summarization

hashtagContextual Entity Tracking

hashtagIntent Preservation

hashtagMonitoring Cedar

hashtagKey Metrics to Track

hashtagCommon Issues

hashtagBest Practices

hashtag1. Memory Management

hashtag2. Rewriting Prompts

hashtag3. Testing

hashtag4. Monitoring

hashtagMigration Guide

hashtagFrom Redwood to Cedar

hashtagFrom Cedar to Cypress

hashtagCode Examples

hashtagUsing Cedar via API

hashtagResponse Format

hashtagNext Steps

Overview

How Cedar Works

Processing Flow

Technical Details

Performance Characteristics

Latency Breakdown

Token Usage

Cost Implications

When to Use Cedar

✅ Ideal Use Cases

❌ Not Ideal For

Configuration

Agent Settings

Optimization Tips

Comparison with Other Strategies

vs. Redwood (Standard)

vs. Cypress (Advanced)

Real-World Performance

Case Study: E-commerce Customer Support

Case Study: SaaS Documentation Bot

Advanced Features

Memory Summarization

Contextual Entity Tracking

Intent Preservation

Monitoring Cedar

Key Metrics to Track

Common Issues

Best Practices

1. Memory Management

2. Rewriting Prompts

3. Testing

4. Monitoring

Migration Guide

From Redwood to Cedar

From Cedar to Cypress

Code Examples

Using Cedar via API

Response Format

Next Steps