# Cost Optimization

Reduce AI operational costs while maintaining quality through smart configuration and usage patterns.

## Cost Components

Per-query cost breakdown:

| Component         | Cost Factor               | Per 1K Queries | Optimization                    |
| ----------------- | ------------------------- | -------------- | ------------------------------- |
| **LLM API**       | Tokens (in+out)           | $0.20-$1.50    | Model, max\_tokens, temperature |
| **Embeddings**    | Query embedding           | $0.01          | Cache queries                   |
| **Vector Search** | Pinecone queries          | $0.05          | Cache results, lower top\_k     |
| **Reranking**     | Cross-encoder (Cypress)   | $0.06          | Use Cedar/Redwood               |
| **Ingestion**     | Doc processing (one-time) | Variable       | Incremental syncs only          |

**Typical cost range**: $0.26-$0.90 per query (depending on strategy and model)

## Cost Breakdown Example

**Per 1,000 Queries:**

### Redwood (Cheapest)

```
Embeddings:        $0.01
Vector Search:     $0.05
LLM (GPT-4o-mini): $0.20
─────────────────────────
Total:             $0.26
```

### Cedar (Medium)

```
Embeddings:        $0.01
Rewriting:         $0.10
Vector Search:     $0.05
LLM (GPT-4o-mini): $0.25
─────────────────────────
Total:             $0.41 (+58%)
```

### Cypress (Highest)

```
Embeddings:        $0.01
Query Expansion:   $0.15
Vector Search:     $0.08 (higher volume)
Reranking:         $0.06
Rewriting:         $0.10
LLM (GPT-4o):      $0.50
─────────────────────────
Total:             $0.90 (+246%)
```

## Optimization Strategies

### 1. Model Selection

**Cost-Quality Matrix:**

```
GPT-4:           $$$$  ⭐⭐⭐⭐⭐
GPT-4o:          $$$   ⭐⭐⭐⭐⭐
GPT-4o-mini:     $     ⭐⭐⭐⭐
GPT-3.5-turbo:   $     ⭐⭐⭐
```

**Recommendation:**

* **High-volume, simple**: GPT-3.5-turbo or GPT-4o-mini
* **Complex, critical**: GPT-4o
* **Research/analysis**: GPT-4

### 2. Strategy Selection

```
Annual Cost for 100k queries:

Redwood + GPT-3.5-turbo:    $26
Cedar + GPT-4o-mini:        $41
Cypress + GPT-4o:           $90

Choose based on accuracy requirements.
```

### 3. Aggressive Caching

```typescript
{
  "cache": {
    "enabled": true,
    "ttl": 3600,             // 1 hour (longer = more savings)
    "fuzzyMatching": true,   // Match similar queries
    "minSimilarity": 0.95    // 95% similarity threshold
  }
}
```

**Impact:**

```
Without cache: 100k queries × $0.41 = $41,000
With 40% hit rate: 60k × $0.41 = $24,600
Savings: $16,400/year (40% reduction)
```

### 4. Reduce Token Usage

**Limit Response Length:**

```typescript
maxTokens: 200  // Was 500
```

**Reduce Context:**

```typescript
topK: 5         // Was 10
```

**Shorter Memory:**

```typescript
memoryTurns: 3  // Was 10
```

**Impact:**

```
Before: 2,500 tokens/query × $0.002 = $0.005
After:  1,200 tokens/query × $0.002 = $0.0024
Savings: 52% per query
```

### 5. Smart Routing

Route queries to appropriate strategy:

```python
def route_query(query, history):
    # Simple queries → Cheap strategy
    if is_simple(query) and not history:
        return "REDWOOD"
    
    # Conversational → Medium cost
    elif history:
        return "CEDAR"
    
    # Complex → Worth the cost
    else:
        return "CYPRESS"

# Result: Optimize cost-quality trade-off per query
```

### 6. Batch Processing

```typescript
// Instead of real-time for non-urgent
const queue = createQueue({
  batchSize: 100,
  batchTimeout: 60000  // 1 minute
});

// Process in batches
// Reduce API overhead
// Lower overall cost
```

## Cost Monitoring

### Usage Dashboard

```
Current Month:
├─ Queries: 45,230
├─ Tokens: 68M
├─ Cost: $1,245
├─ Avg Cost/Query: $0.0275
├─ Projected (Month): $1,750
└─ Budget: $2,000 ✅
```

### Set Budgets

```typescript
{
  "budget": {
    "monthly": 2000,         // $2,000/month
    "alerts": {
      "80percent": true,     // Alert at $1,600
      "90percent": true,     // Alert at $1,800
      "exceeded": true       // Alert if over
    },
    "hardLimit": true,       // Stop at budget
    "limitBehavior": "QUEUE" // Queue requests if limit hit
  }
}
```

### Cost by Component

```
January 2024 Breakdown:
├─ LLM Calls:      $1,200 (60%)
├─ Embeddings:     $50 (2.5%)
├─ Vector Search:  $150 (7.5%)
├─ Reranking:      $100 (5%)
├─ Data Processing: $500 (25%)
└─ Total:          $2,000
```

## Cost-Saving Tactics

### Tactic 1: Tiered Response Quality

```typescript
// Critical queries: High quality
if (isCritical(query)) {
  return {
    strategyCode: 'CYPRESS',
    model: 'gpt-4o'
  };
}

// Standard queries: Balanced
else if (isStandard(query)) {
  return {
    strategyCode: 'CEDAR',
    model: 'gpt-4o-mini'
  };
}

// Simple queries: Fast & cheap
else {
  return {
    strategyCode: 'REDWOOD',
    model: 'gpt-3.5-turbo'
  };
}
```

**Impact:** 40-60% cost reduction vs using highest quality for all.

### Tactic 2: Query Deduplication

```typescript
// Detect duplicate/similar queries
const queryHash = hashQuery(prompt);

if (seenRecently(queryHash, last30Minutes)) {
  return cachedResponse;
}
```

### Tactic 3: Peak/Off-Peak Pricing

```typescript
// Use cheaper strategy during high volume
const isPeakHours = getCurrentHour() >= 9 && getCurrentHour() <= 17;

const strategy = isPeakHours ? 'REDWOOD' : 'CEDAR';
```

### Tactic 4: Lazy Loading

```typescript
// Load context progressively
{
  "topK": 3,              // Start with 3 docs
  "expandIfNeeded": true,  // Load more if confidence low
  "confidenceThreshold": 0.85
}
```

## Real-World Examples

### Case Study: SaaS Company

**Before Optimization:**

```
Queries: 100k/month
Strategy: Cypress for all
Model: GPT-4
Cost: $9,000/month
```

**After Optimization:**

```
Strategy Mix:
- 60% Redwood (simple queries)
- 30% Cedar (conversational)
- 10% Cypress (complex)

Model Mix:
- 70% GPT-3.5-turbo
- 30% GPT-4o

Cache Hit Rate: 45%

New Cost: $2,700/month
Savings: $6,300/month (70% reduction)
Accuracy: Maintained at 87%
```

### Case Study: E-commerce

**Optimization:**

* Cached product queries (60% hit rate)
* Redwood for FAQ (70% of queries)
* Cedar for complex questions (30%)
* GPT-3.5-turbo for product info
* GPT-4o for technical support

**Results:**

* Cost: $0.008/query (from $0.025)
* 68% cost reduction
* Response time: 1.3s (from 2.1s)
* Accuracy: 86% (from 89%, acceptable trade-off)

## Cost Analysis Tools

### Cost Calculator

```typescript
function estimateMonthlyCost(params) {
  const {
    queriesPerDay,
    avgTokensPerQuery,
    modelCostPer1MTokens,
    cacheHitRate,
    strategyOverhead
  } = params;
  
  const uncachedQueries = queriesPerDay * (1 - cacheHitRate);
  const tokensPerDay = uncachedQueries * avgTokensPerQuery;
  const costPerDay = (tokensPerDay / 1000000) * modelCostPer1MTokens;
  const monthlyCost = costPerDay * 30 * (1 + strategyOverhead);
  
  return monthlyCost;
}

// Example
estimateMonthlyCost({
  queriesPerDay: 3000,
  avgTokensPerQuery: 2000,
  modelCostPer1MTokens: 0.50,  // GPT-3.5-turbo
  cacheHitRate: 0.40,
  strategyOverhead: 0.0         // Redwood
});
// Result: ~$54/month
```

### ROI Calculator

```typescript
const savings = {
  customerSupportTime: 120,      // hours saved/month
  hourlyRate: 50,
  totalSaved: 120 * 50           // $6,000
};

const costs = {
  twigPlatform: 500,             // Subscription
  apiUsage: 300,                 // API costs
  totalCost: 800
};

const roi = (savings.totalSaved - costs.totalCost) / costs.totalCost;
// ROI: 650% ($5,200 net benefit)
```

## Best Practices

### 1. Start Cheap, Scale Up

✅ Begin with Redwood + GPT-3.5-turbo ✅ Monitor accuracy ✅ Upgrade only if needed ❌ Don't start with most expensive

### 2. Cache Aggressively

✅ Enable caching ✅ Long TTL for stable content ✅ Fuzzy matching ❌ Don't cache time-sensitive data

### 3. Monitor Continuously

✅ Track cost trends ✅ Set budget alerts ✅ Review monthly ❌ Don't ignore cost creep

### 4. Optimize Data Processing

✅ Incremental syncs ✅ Process only changes ✅ Schedule during off-peak ❌ Don't reprocess everything

## Troubleshooting

### Unexpected High Costs

**Investigate:**

1. Check query volume (unexpected spike?)
2. Review token usage (responses too long?)
3. Check cache hit rate (caching working?)
4. Verify strategy mix (using expensive strategies?)
5. Audit model usage (using GPT-4 too much?)

### Budget Exceeded

**Immediate Actions:**

1. Enable hard budget limit
2. Switch to cheaper strategies
3. Reduce max tokens
4. Increase cache TTL
5. Queue non-urgent requests

## Next Steps

* [Performance Tuning](/product/monitoring/performance-tuning.md) - Optimize speed
* [Analytics Dashboard](https://github.com/thrivapp/twig-help-docs/blob/staging/monitoring/analytics-dashboard.md) - Monitor usage
* [RAG Strategies](/product/overview-1.md) - Choose cost-effective strategy
* [Rate Limits](/product/developer-api/rate-limits.md) - Manage usage


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/product/monitoring/cost-optimization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
