Cost Optimization

Reduce AI operational costs while maintaining quality through smart configuration and usage patterns.

Cost Components

Understanding where costs come from:

Component

Cost Driver

Optimization Lever

LLM API Calls

Input + output tokens

Model choice, response length

Embeddings

Number of queries

Caching, deduplication

Vector Search

Query volume

Caching, topK

Reranking

Results reranked (Cypress)

Strategy choice

Data Processing

Documents processed

Incremental updates

Cost Breakdown Example

Per 1,000 Queries:

Redwood (Cheapest)

Embeddings:        $0.01
Vector Search:     $0.05
LLM (GPT-4o-mini): $0.20
─────────────────────────
Total:             $0.26

Cedar (Medium)

Embeddings:        $0.01
Rewriting:         $0.10
Vector Search:     $0.05
LLM (GPT-4o-mini): $0.25
─────────────────────────
Total:             $0.41 (+58%)

Cypress (Highest)

Embeddings:        $0.01
Query Expansion:   $0.15
Vector Search:     $0.08 (higher volume)
Reranking:         $0.06
Rewriting:         $0.10
LLM (GPT-4o):      $0.50
─────────────────────────
Total:             $0.90 (+246%)

Optimization Strategies

1. Model Selection

Cost-Quality Matrix:

GPT-4:           $$$$  ⭐⭐⭐⭐⭐
GPT-4o:          $$$   ⭐⭐⭐⭐⭐
GPT-4o-mini:     $     ⭐⭐⭐⭐
GPT-3.5-turbo:   $     ⭐⭐⭐

Recommendation:

High-volume, simple: GPT-3.5-turbo or GPT-4o-mini
Complex, critical: GPT-4o
Research/analysis: GPT-4

2. Strategy Selection

Annual Cost for 100k queries:

Redwood + GPT-3.5-turbo:    $26
Cedar + GPT-4o-mini:        $41
Cypress + GPT-4o:           $90

Choose based on accuracy requirements.

3. Aggressive Caching

{
  "cache": {
    "enabled": true,
    "ttl": 3600,             // 1 hour (longer = more savings)
    "fuzzyMatching": true,   // Match similar queries
    "minSimilarity": 0.95    // 95% similarity threshold
  }
}

Impact:

Without cache: 100k queries × $0.41 = $41,000
With 40% hit rate: 60k × $0.41 = $24,600
Savings: $16,400/year (40% reduction)

4. Reduce Token Usage

Limit Response Length:

maxTokens: 200  // Was 500

Reduce Context:

topK: 5         // Was 10

Shorter Memory:

memoryTurns: 3  // Was 10

Impact:

Before: 2,500 tokens/query × $0.002 = $0.005
After:  1,200 tokens/query × $0.002 = $0.0024
Savings: 52% per query

5. Smart Routing

Route queries to appropriate strategy:

def route_query(query, history):
    # Simple queries → Cheap strategy
    if is_simple(query) and not history:
        return "REDWOOD"
    
    # Conversational → Medium cost
    elif history:
        return "CEDAR"
    
    # Complex → Worth the cost
    else:
        return "CYPRESS"

# Result: Optimize cost-quality trade-off per query

6. Batch Processing

// Instead of real-time for non-urgent
const queue = createQueue({
  batchSize: 100,
  batchTimeout: 60000  // 1 minute
});

// Process in batches
// Reduce API overhead
// Lower overall cost

Cost Monitoring

Usage Dashboard

Current Month:
├─ Queries: 45,230
├─ Tokens: 68M
├─ Cost: $1,245
├─ Avg Cost/Query: $0.0275
├─ Projected (Month): $1,750
└─ Budget: $2,000 ✅

Set Budgets

{
  "budget": {
    "monthly": 2000,         // $2,000/month
    "alerts": {
      "80percent": true,     // Alert at $1,600
      "90percent": true,     // Alert at $1,800
      "exceeded": true       // Alert if over
    },
    "hardLimit": true,       // Stop at budget
    "limitBehavior": "QUEUE" // Queue requests if limit hit
  }
}

Cost by Component

January 2024 Breakdown:
├─ LLM Calls:      $1,200 (60%)
├─ Embeddings:     $50 (2.5%)
├─ Vector Search:  $150 (7.5%)
├─ Reranking:      $100 (5%)
├─ Data Processing: $500 (25%)
└─ Total:          $2,000

Cost-Saving Tactics

Tactic 1: Tiered Response Quality

// Critical queries: High quality
if (isCritical(query)) {
  return {
    strategyCode: 'CYPRESS',
    model: 'gpt-4o'
  };
}

// Standard queries: Balanced
else if (isStandard(query)) {
  return {
    strategyCode: 'CEDAR',
    model: 'gpt-4o-mini'
  };
}

// Simple queries: Fast & cheap
else {
  return {
    strategyCode: 'REDWOOD',
    model: 'gpt-3.5-turbo'
  };
}

Impact: 40-60% cost reduction vs using highest quality for all.

Tactic 2: Query Deduplication

// Detect duplicate/similar queries
const queryHash = hashQuery(prompt);

if (seenRecently(queryHash, last30Minutes)) {
  return cachedResponse;
}

Tactic 3: Peak/Off-Peak Pricing

// Use cheaper strategy during high volume
const isPeakHours = getCurrentHour() >= 9 && getCurrentHour() <= 17;

const strategy = isPeakHours ? 'REDWOOD' : 'CEDAR';

Tactic 4: Lazy Loading

// Load context progressively
{
  "topK": 3,              // Start with 3 docs
  "expandIfNeeded": true,  // Load more if confidence low
  "confidenceThreshold": 0.85
}

Real-World Examples

Case Study: SaaS Company

Before Optimization:

Queries: 100k/month
Strategy: Cypress for all
Model: GPT-4
Cost: $9,000/month

After Optimization:

Strategy Mix:
- 60% Redwood (simple queries)
- 30% Cedar (conversational)
- 10% Cypress (complex)

Model Mix:
- 70% GPT-3.5-turbo
- 30% GPT-4o

Cache Hit Rate: 45%

New Cost: $2,700/month
Savings: $6,300/month (70% reduction)
Accuracy: Maintained at 87%

Case Study: E-commerce

Optimization:

Cached product queries (60% hit rate)
Redwood for FAQ (70% of queries)
Cedar for complex questions (30%)
GPT-3.5-turbo for product info
GPT-4o for technical support

Results:

Cost: $0.008/query (from $0.025)
68% cost reduction
Response time: 1.3s (from 2.1s)
Accuracy: 86% (from 89%, acceptable trade-off)

Cost Analysis Tools

Cost Calculator

function estimateMonthlyCost(params) {
  const {
    queriesPerDay,
    avgTokensPerQuery,
    modelCostPer1MTokens,
    cacheHitRate,
    strategyOverhead
  } = params;
  
  const uncachedQueries = queriesPerDay * (1 - cacheHitRate);
  const tokensPerDay = uncachedQueries * avgTokensPerQuery;
  const costPerDay = (tokensPerDay / 1000000) * modelCostPer1MTokens;
  const monthlyCost = costPerDay * 30 * (1 + strategyOverhead);
  
  return monthlyCost;
}

// Example
estimateMonthlyCost({
  queriesPerDay: 3000,
  avgTokensPerQuery: 2000,
  modelCostPer1MTokens: 0.50,  // GPT-3.5-turbo
  cacheHitRate: 0.40,
  strategyOverhead: 0.0         // Redwood
});
// Result: ~$54/month

ROI Calculator

const savings = {
  customerSupportTime: 120,      // hours saved/month
  hourlyRate: 50,
  totalSaved: 120 * 50           // $6,000
};

const costs = {
  twigPlatform: 500,             // Subscription
  apiUsage: 300,                 // API costs
  totalCost: 800
};

const roi = (savings.totalSaved - costs.totalCost) / costs.totalCost;
// ROI: 650% ($5,200 net benefit)

Best Practices

1. Start Cheap, Scale Up

✅ Begin with Redwood + GPT-3.5-turbo ✅ Monitor accuracy ✅ Upgrade only if needed ❌ Don't start with most expensive

2. Cache Aggressively

✅ Enable caching ✅ Long TTL for stable content ✅ Fuzzy matching ❌ Don't cache time-sensitive data

3. Monitor Continuously

✅ Track cost trends ✅ Set budget alerts ✅ Review monthly ❌ Don't ignore cost creep

4. Optimize Data Processing

✅ Incremental syncs ✅ Process only changes ✅ Schedule during off-peak ❌ Don't reprocess everything

Troubleshooting

Unexpected High Costs

Investigate:

Check query volume (unexpected spike?)
Review token usage (responses too long?)
Check cache hit rate (caching working?)
Verify strategy mix (using expensive strategies?)
Audit model usage (using GPT-4 too much?)

Budget Exceeded

Immediate Actions:

Enable hard budget limit
Switch to cheaper strategies
Reduce max tokens
Increase cache TTL
Queue non-urgent requests

Next Steps

Performance Tuning - Optimize speed
Analytics Dashboard - Monitor usage
RAG Strategies - Choose cost-effective strategy
Rate Limits - Manage usage

PreviousPerformance Tuning NextAuthentication & Authorization

Last updated 7 hours ago

hashtagCost Components

hashtagCost Breakdown Example

hashtagRedwood (Cheapest)

hashtagCedar (Medium)

hashtagCypress (Highest)

hashtagOptimization Strategies

hashtag1. Model Selection

hashtag2. Strategy Selection

hashtag3. Aggressive Caching

hashtag4. Reduce Token Usage

hashtag5. Smart Routing

hashtag6. Batch Processing

hashtagCost Monitoring

hashtagUsage Dashboard

hashtagSet Budgets

hashtagCost by Component

hashtagCost-Saving Tactics

hashtagTactic 1: Tiered Response Quality

hashtagTactic 2: Query Deduplication

hashtagTactic 3: Peak/Off-Peak Pricing

hashtagTactic 4: Lazy Loading

hashtagReal-World Examples

hashtagCase Study: SaaS Company

hashtagCase Study: E-commerce

hashtagCost Analysis Tools

hashtagCost Calculator

hashtagROI Calculator

hashtagBest Practices

hashtag1. Start Cheap, Scale Up

hashtag2. Cache Aggressively

hashtag3. Monitor Continuously

hashtag4. Optimize Data Processing

hashtagTroubleshooting

hashtagUnexpected High Costs

hashtagBudget Exceeded

hashtagNext Steps