# Performance Tuning

Optimize your AI agents for speed, accuracy, and cost-effectiveness.

## Performance Metrics

Track in Analytics dashboard:

| Metric             | Target | Measured How                        |
| ------------------ | ------ | ----------------------------------- |
| **p50 Latency**    | <2s    | 50th percentile response time       |
| **p95 Latency**    | <4s    | 95th percentile (worst-case normal) |
| **p99 Latency**    | <6s    | 99th percentile (outliers)          |
| **Accuracy Rate**  | >85%   | % responses marked accurate         |
| **Cost/Query**     | <$0.01 | Total API costs / query count       |
| **Cache Hit Rate** | >40%   | Cached / total queries              |

**Location**: Dashboard → Analytics → Performance tab

## Optimization Tradeoffs

Cannot maximize all three simultaneously:

**Speed** ⟷ **Accuracy** ⟷ **Cost**

Pick 2:

* Speed + Cost → Redwood + GPT-3.5-turbo (accuracy: \~72%)
* Speed + Accuracy → Cedar + GPT-4 + cache (cost: medium)
* Accuracy + Cost → Cypress + GPT-4o-mini (speed: \~3-5s)

## Speed Optimization

### 1. Choose Faster RAG Strategy

| Strategy    | Avg Latency | Best For         |
| ----------- | ----------- | ---------------- |
| **Redwood** | \~1.2s      | Maximum speed    |
| **Cedar**   | \~2.0s      | Balanced         |
| **Cypress** | \~3.5s      | Maximum accuracy |

**Switch to Redwood when:**

* Questions are clear and direct
* Speed is critical
* High query volume

### 2. Reduce topK

```typescript
// Higher topK = slower
topK: 20  → Response time: 2.5s
topK: 10  → Response time: 1.8s
topK: 5   → Response time: 1.2s
```

**Recommendation:** Start with 5-7, increase only if accuracy suffers.

### 3. Use Faster Model

| Model             | Speed  | Quality   | Cost |
| ----------------- | ------ | --------- | ---- |
| **GPT-3.5-turbo** | Fast   | Good      | Low  |
| **GPT-4o-mini**   | Fast   | Better    | Low  |
| **GPT-4o**        | Medium | Excellent | High |
| **GPT-4**         | Slow   | Excellent | High |

**For speed:** Use GPT-3.5-turbo for simple queries, GPT-4o for complex.

### 4. Enable Caching

```typescript
{
  "cache": {
    "enabled": true,
    "ttl": 300,              // 5 minutes
    "keyBy": ["prompt", "agentId"]
  }
}
```

**Impact:** 50-100ms for cached responses vs 1-3s for uncached.

### 5. Optimize Context

```typescript
// Reduce context size
{
  "topK": 5,               // Fewer documents
  "maxContextTokens": 2000, // Limit context size
  "chunkSize": 300         // Smaller chunks
}
```

### 6. Use Streaming

```typescript
// User sees response immediately
stream: true

// First token: ~500ms
// Complete response: ~2s
// Perceived latency: Much faster
```

## Accuracy Optimization

### 1. Choose Better RAG Strategy

**Cypress > Cedar > Redwood** for accuracy.

### 2. Increase topK

```typescript
topK: 5   → Accuracy: 85%
topK: 10  → Accuracy: 89%
topK: 20  → Accuracy: 91%
```

**Diminishing returns** after topK \~15.

### 3. Use Better Model

**GPT-4o** or **GPT-4** for highest quality.

### 4. Improve Instructions

```typescript
// Detailed system prompt
instructions: `
You are an expert [domain] assistant.

When answering:
1. Always cite specific sources
2. Provide step-by-step explanations
3. Include code examples when relevant
4. Verify facts against documentation
5. Admit uncertainty when appropriate
`
```

### 5. Add High-Quality Data Sources

✅ Official documentation ✅ Verified knowledge base ✅ Recent, updated content ❌ Low-quality, outdated content

### 6. Enable Reranking (Cypress)

Reranking improves precision by 20-30%.

### 7. Use Private Data Only

```typescript
configAIUseOnlyPrivateData: true
```

Prevents hallucination from general knowledge.

## Cost Optimization

### 1. Choose Cost-Effective Model

| Model             | Cost per 1M Tokens |
| ----------------- | ------------------ |
| **GPT-3.5-turbo** | $0.50              |
| **GPT-4o-mini**   | $0.15              |
| **GPT-4o**        | $5.00              |
| **GPT-4**         | $30.00             |

**Recommendation:** GPT-4o-mini for most use cases.

### 2. Reduce Token Usage

```typescript
{
  "maxTokens": 300,        // Limit response length
  "topK": 5,               // Fewer documents
  "memoryTurns": 3,        // Less conversation history
  "temperature": 0.3       // More focused (fewer tokens)
}
```

### 3. Aggressive Caching

```typescript
{
  "cache": {
    "enabled": true,
    "ttl": 3600,           // 1 hour (longer cache)
    "fuzzyMatching": true   // Match similar queries
  }
}
```

### 4. Use Redwood Strategy

Redwood is cheapest (single LLM call, no reranking).

### 5. Batch Operations

Process multiple queries together to reduce overhead.

### 6. Smart Routing

```python
def route_query(query):
    if is_simple(query):
        return "REDWOOD"      # Cheapest
    elif is_complex(query):
        return "CYPRESS"      # Worth the cost
    else:
        return "CEDAR"        # Balanced
```

## Balanced Optimization

### The Performance Triangle

```
        Speed
         /\
        /  \
       /    \
      /      \
     /________\
  Cost      Accuracy
```

You can optimize 2 of 3:

* **Speed + Cost**: Use Redwood, GPT-3.5-turbo
* **Speed + Accuracy**: Use Cedar, GPT-4o, caching
* **Cost + Accuracy**: Use Cypress, efficient models, batch

### Recommended Configurations

**High-Volume Support Bot:**

```yaml
Strategy: REDWOOD
Model: gpt-3.5-turbo
topK: 5
Cache: enabled (1 hour)
Goal: Handle 10k+ queries/day cheaply
```

**Technical Documentation:**

```yaml
Strategy: CEDAR
Model: gpt-4o
topK: 10
Cache: enabled (30 min)
Goal: Balance speed and accuracy
```

**Compliance Assistant:**

```yaml
Strategy: CYPRESS
Model: gpt-4o
topK: 10
Reranking: enabled
Goal: Maximum accuracy, cost secondary
```

## Performance Monitoring

### Key Metrics Dashboard

```
Agent: Customer Support
├─ Avg Response Time: 1.8s (target: <2s) ✅
├─ P95 Response Time: 2.4s (target: <3s) ✅
├─ P99 Response Time: 3.1s ⚠️
├─ Cache Hit Rate: 42%
├─ Avg Tokens: 1,523
├─ Cost per Query: $0.0045
└─ Accuracy Score: 89%
```

### Set Performance Targets

```typescript
{
  "targets": {
    "responseTime": {
      "p50": 1.5,
      "p95": 2.5,
      "p99": 4.0
    },
    "accuracy": 0.85,
    "costPerQuery": 0.01,
    "cacheHitRate": 0.40
  }
}
```

### Alerting

```typescript
{
  "alerts": {
    "responseTimeSlow": {
      "threshold": 3.0,
      "duration": "5m",
      "notify": "team@company.com"
    },
    "accuracyDrop": {
      "threshold": 0.80,
      "compare": "baseline",
      "notify": "team@company.com"
    }
  }
}
```

## A/B Testing

Compare configurations to find optimal settings:

```typescript
// Test A: Baseline
const configA = {
  strategyCode: 'CEDAR',
  topK: 10,
  temperature: 0.7
};

// Test B: Optimized for speed
const configB = {
  strategyCode: 'REDWOOD',
  topK: 5,
  temperature: 0.7
};

// Route 50% traffic to each
// Measure: speed, accuracy, cost
// Deploy winner after 1 week
```

## Continuous Optimization

### Weekly Review

1. Check performance metrics
2. Identify bottlenecks
3. Test optimizations
4. Deploy improvements
5. Measure impact

### Monthly Audit

1. Review all configurations
2. Benchmark against baselines
3. Update targets
4. Plan next optimizations

## Tools & Techniques

### Performance Profiling

```typescript
const startTime = Date.now();

const response = await twig.chat.create({
  prompt,
  agentId,
  profile: true  // Enable profiling
});

console.log('Breakdown:', {
  embedding: response.profile.embeddingTime,
  retrieval: response.profile.retrievalTime,
  llm: response.profile.llmTime,
  total: Date.now() - startTime
});
```

### Load Testing

```bash
# Using Apache Bench
ab -n 1000 -c 10 -H "Authorization: Bearer KEY" \
  -p query.json https://api.twig.so/api/chat

# Results show:
# - Requests per second
# - Average latency
# - P50, P95, P99
```

### Cache Analysis

```typescript
const cacheStats = await twig.cache.stats();

console.log('Hit rate:', cacheStats.hitRate);
console.log('Avg savings:', cacheStats.avgTimeSaved);
console.log('Most cached:', cacheStats.topQueries);
```

## Next Steps

* [Cost Optimization](/product/monitoring/cost-optimization.md) - Reduce expenses
* [Analytics Dashboard](https://github.com/thrivapp/twig-help-docs/blob/staging/monitoring/analytics-dashboard.md) - Monitor metrics
* [RAG Strategies](/product/overview-1.md) - Choose optimal strategy
* [Evaluation Framework](/product/monitoring/evals.md) - Measure quality


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/product/monitoring/performance-tuning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
