Cypress - Advanced

Cypress is the most sophisticated RAG strategy, combining query expansion, tier-based source retrieval, and automatic reranking to deliver the highest accuracy.

Overview

Cypress implements multiple optimization techniques:

  • Query Expansion: Adds synonyms and related terms for better semantic matching

  • Tier-Based Retrieval: Organizes sources by priority while treating all equally in reranking

  • Automatic Reranking: Uses cross-encoder model to improve precision

  • Higher Retrieval Volume: Fetches more candidates (50 vs 10) before filtering

Performance: ~3-4 seconds Ideal for: Complex queries requiring maximum accuracy, diverse terminology, high-stakes decisions

How Cypress Works

Processing Flow

User Query: "password reset"

[1] Analyze Conversation Memory (if available)

[2] Query Expansion for Vector Retrieval
    → "password reset, change password, recover account,
       reset credentials, account recovery, password recovery"

[3] Embed Expanded Query → Vector

[4] Tier 1 Retrieval (Official docs, high-priority)
    → Fetch top 50 results per source

[5] Tier 2 Retrieval (Community content, secondary)
    → Fetch top 50 results per source

[6] Combine All Results (up to 100 documents)

[7] Automatic Reranking (bge-reranker-v2-m3)
    → Rerank to top 10 most relevant

[8] Build Context from Top 10

[9] Rewrite Prompt with Context (for LLM)

[10] LLM Completion

[11] Clean Response (remove artifacts)

Response: "To reset your password..."

Unique Features

1. Query Expansion for Retrieval

Cypress expands queries before vector search:

Original Query:

Expanded Query:

How it works:

  • Uses gpt-4o-mini for fast expansion

  • Adds synonyms and related terms

  • Includes alternative phrasings

  • Adds domain-specific terminology

Why it matters: Improves recall by matching documents that use different terminology than the user's query.

Example Impact:

2. Tier-Based Source Retrieval

Cypress organizes data sources into tiers:

Tier Structure:

Retrieval Process:

  1. Query Tier 1 sources (topK = 50 per source)

  2. Query Tier 2 sources (topK = 50 per source)

  3. Combine all results

  4. Rerank all together (both tiers treated equally)

  5. Top 10 most relevant selected

Important: Both tiers receive equal treatment in reranking. The tier organization is for source management, not quality weighting.

Configuration:

3. Automatic Reranking

After retrieval, Cypress reranks using a sophisticated model:

Reranking Model: bge-reranker-v2-m3

  • Type: Cross-encoder (more accurate than vector similarity)

  • Input: Query + full document text

  • Output: Relevance score (0-1)

  • Method: Considers full semantic relationship

Vector Search vs Reranking:

Performance Impact:

4. Higher Retrieval Volume

Cypress retrieves more candidates:

Mode
topKPerSource
Total Retrieved
Final Output

Standard

50

Up to 100

Top 10 after rerank

Agentic

Agent.topK (default 5)

Variable

All reranked results

Why more is better:

  • More candidates for reranking = better final selection

  • Captures edge cases and variations

  • Reduces chance of missing relevant content

5. Response Cleaning

Cypress includes specialized response cleaning:

Removes:

  • Original prompt text (if echoed)

  • Markdown code block markers

  • Extra whitespace

  • Formatting artifacts

Example:

Cleaned Output: Our pricing plans are...

Memory Analysis: ~100ms Query Expansion: ~500ms ← Unique to Cypress Query Embedding: ~100ms Tier 1 Retrieval (50): ~300ms ← More than Redwood/Cedar Tier 2 Retrieval (50): ~300ms ← Additional tier Reranking (100→10): ~500ms ← Unique to Cypress Context Building: ~50ms Prompt Rewriting: ~400ms LLM Completion: ~800ms Response Cleaning: ~50ms ───────────────────────────────── Total: ~3.1s

Query: "contraindications for medication X" → Cannot afford mistakes → Diverse medical terminology → Need highest precision ✅ Use Cypress

Query: "configure OAuth with SAML SSO" → Multiple concepts → Various terminology (OAuth 2.0, OAuth2, etc.) → Need comprehensive results ✅ Use Cypress

Sources:

  • Official API docs (Tier 1)

  • Engineering blog (Tier 2)

  • Community tutorials (Tier 2) Query: "best practices for API rate limiting" → Benefits from tier organization → Reranking selects best across all sources ✅ Use Cypress

Query: "GDPR data retention requirements" → High-stakes information → Must be accurate and cited → Regulatory compliance ✅ Use Cypress

Query: "machine learning model training" Also needs to match:

  • "ML model development"

  • "training neural networks"

  • "model fine-tuning" → Query expansion helps significantly ✅ Use Cypress

Query: "What are your business hours?" → Straightforward question → No terminology variations → Redwood is 2x faster ❌ Cypress is overkill

100,000+ queries/day, tight budget → Cypress costs 2.5x more than Redwood → Consider hybrid approach ❌ Use Cypress selectively

Need: < 1 second response → Cypress averages 3-4s → Too slow for real-time ❌ Use Redwood instead

Data Source Tier Assignment

Optimization Tips

1. Tune Retrieval Volume

2. Query Expansion Quality

3. Tier Organization

4. Reranking Threshold

Comparison with Other Strategies

Complete Comparison Table

Feature
Redwood
Cedar
Cypress

Performance

Average Latency

1-2s

2-3s

3-4s

Cost per 1k (GPT-4o)

$0.50

$0.70

$0.90

Token Usage

1,500

2,000

2,500

Accuracy

Simple Queries

⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

⭐⭐⭐⭐

Complex Queries

⭐⭐⭐

⭐⭐⭐⭐

⭐⭐⭐⭐⭐

Terminology Variations

⭐⭐

⭐⭐⭐

⭐⭐⭐⭐⭐

Features

Query Rewriting

Query Expansion

Reranking

Tier-Based

Memory Support

Best For

Clear questions

✅ Best

✅ Good

⚠️ Overkill

Conversational

❌ Poor

✅ Best

✅ Excellent

High accuracy

❌ Adequate

⚠️ Good

✅ Best

High volume

✅ Best

⚠️ Good

❌ Expensive

Complex terminology

❌ Limited

⚠️ Good

✅ Best

Migration Paths

Redwood → Cedar: When conversational queries increase Cedar → Cypress: When accuracy becomes critical Cypress → Cedar: When cost/speed matters more than max accuracy

Real-World Performance

Case Study: Medical Knowledge Base

Setup:

  • 50,000 medical articles

  • Complex terminology (anatomical, pharmaceutical)

  • High accuracy requirements

  • Average query: "symptoms of X" or "treatment for Y"

Cedar Results (Before):

  • Latency: 2.3s ✅

  • Accuracy: 82% ⚠️

  • User complaints: "Missing alternative terms"

  • Cost: $0.68/1k

Cypress Results (After):

  • Latency: 3.6s (acceptable)

  • Accuracy: 96% ✅

  • User satisfaction: +41%

  • Cost: $0.89/1k

  • ROI: Worth it for medical accuracy

Key Insight: Query expansion captured terminology variations like:

  • "heart attack" → "myocardial infarction", "cardiac arrest", "MI"

  • "fever" → "pyrexia", "elevated temperature", "hyperthermia"

Case Study: Enterprise Software Documentation

Setup:

  • 10,000 technical docs

  • API references + guides + tutorials

  • 3 document tiers (official, community, archived)

  • Queries: Mix of simple and complex

Strategy Mix (Optimized):

Key Insight: Hybrid approach optimizes for both cost and quality.

Advanced Configuration

Dynamic Strategy Selection

Automatically choose strategy based on query:

Custom Reranking Parameters

Query Expansion Control

Monitoring Cypress

Key Metrics

1. Reranking Effectiveness

2. Query Expansion Impact

3. Tier Distribution

4. Overall Performance

Common Issues

Very Slow Responses (> 5s):

Poor Reranking:

High Costs:

Best Practices

1. Tier Organization

✅ Tier 1: Official, verified content ✅ Tier 2: Supplementary, community content ✅ Review tier assignments quarterly ❌ Don't put everything in Tier 1

2. Query Expansion

✅ Focus on domain-specific terminology ✅ Include common abbreviations ✅ Test expansion quality ❌ Don't over-expand (diminishing returns)

3. Performance Monitoring

✅ Track reranking effectiveness ✅ Monitor latency by query type ✅ Review cost vs quality trade-offs ✅ A/B test against Cedar ❌ Don't assume it's working optimally

4. Hybrid Approach

✅ Use Cypress for critical queries ✅ Use Cedar for conversational ✅ Use Redwood for simple queries ✅ Route intelligently based on context ❌ Don't use one strategy for everything

Code Examples

Using Cypress via API

Response Format

Next Steps

Last updated