Core Concepts & Terminology
Technical reference for RAG terminology and Twig implementation details.
RAG (Retrieval-Augmented Generation)
RAG injects retrieved context into the LLM prompt before generation.
RAG Flow in Twig
Query embedding: Convert user query to 1536-dim vector (OpenAI ada-002)
Vector search: Query Pinecone index, return top-k chunks by cosine similarity (threshold: 0.7)
Context injection: Insert chunks into LLM prompt between system prompt and user query
LLM generation: OpenAI API generates response based on injected context
Citation extraction: Parse response, match claims to source chunks by span overlap
Observable behavior: Responses cite specific documents. If retrieval fails (no chunks above threshold), agent responds "I don't have information about that".
Agent
An agent is a configuration record with these fields:
agent_id: Unique identifier (format:
agent_abc123)name: Display name
system_prompt: Instructions prepended to every query
data_source_ids: Array of data sources to query
rag_strategy:
redwood|cedar|cypressmodel:
gpt-4|gpt-3.5-turbo|claude-3-sonnettemperature: Float 0-2 (default: 0.7)
max_tokens: Integer (default: 500)
Storage: PostgreSQL agents table
Observable behavior: Different agents querying same data sources return different responses based on system prompt and strategy.
Data Source
A data source is an ingestion job configuration:
source_type:
file|website|confluence|slack|google_drive| etc.connection_params: OAuth tokens, API keys, URLs
sync_schedule:
hourly|daily|weekly|manualfilters: Include/exclude rules (e.g., file extensions, URL patterns)
Processing stages:
Fetch (download documents)
Parse (extract text)
Chunk (split into 512-token segments with 50-token overlap)
Embed (OpenAI ada-002)
Index (upload vectors to Pinecone)
Status values: pending | processing | active | failed
Observable behavior: Data → [Source Name] → shows chunk count (e.g., "1,234 chunks indexed"). Last sync timestamp displayed.
Vector Embedding
A vector embedding is a 1536-dimensional float array representing text semantics.
Model: OpenAI text-embedding-ada-002
API: POST https://api.openai.com/v1/embeddings
Cost: $0.0001 per 1K tokens
Example:
Distance metric: Cosine similarity (-1 to 1, higher = more similar)
Observable behavior:
"reset password" and "change password" have cosine similarity ~0.85
"reset password" and "pizza delivery" have cosine similarity ~0.10
Semantic Search
Vector search using cosine similarity between query embedding and chunk embeddings.
Algorithm:
Embed query:
q_vec = embed("reset my password")Query Pinecone:
results = index.query(q_vec, top_k=10, filter={org_id: "org_123"})Pinecone returns chunks with similarity scores (0.0-1.0)
Filter chunks with score < 0.7 (configurable threshold)
Retrieval behavior:
Query "How to reset password?" retrieves chunks containing "password recovery", "reset credentials", "forgot password"
Does NOT require exact keyword match
Fails if no chunks score above threshold
Chunking
Document splitting strategy:
Chunk size: 512 tokens (default, configurable: 256-2048)
Overlap: 50 tokens (default, configurable: 0-200)
Splitting: Recursive by paragraph → sentence → token
Example:
Rationale:
Smaller chunks → more precise retrieval, but less context per chunk
Larger chunks → more context, but lower precision
Overlap → prevents concepts split across boundaries
Observable behavior: Data source shows "N chunks indexed" (e.g., 100-page PDF → ~400-600 chunks)
Context Window
Maximum tokens the LLM processes in one request:
GPT-3.5-turbo: 16,384 tokens (~12,000 words)
GPT-4: 8,192 tokens (standard), 32,768 (extended), 128,000 (turbo)
Claude 3.5 Sonnet: 200,000 tokens
Token allocation (typical query):
Observable failure: If total exceeds limit, API returns error:
Token
Text unit for LLM processing:
1 token ≈ 4 characters (English)
1 token ≈ 0.75 words (English)
Examples:
"Hello world!" = 3 tokens
"Retrieval-Augmented Generation" = 6 tokens
"https://example.com" = 5 tokens
Pricing (OpenAI):
GPT-4: $0.03/1K input tokens, $0.06/1K output tokens
GPT-3.5-turbo: $0.001/1K input tokens, $0.002/1K output tokens
Observable behavior: Query cost displayed in Analytics (e.g., "$0.0042 per query")
Temperature
Controls randomness in LLM sampling:
0.0: Deterministic (always picks highest probability token)
0.7: Balanced (default)
1.0: High variability
2.0: Maximum randomness
Observable behavior:
Temperature 0.0: Same query returns identical response every time
Temperature 1.0: Same query returns different phrasing each time (content consistent)
Use cases:
0.0-0.3: Factual Q&A, documentation lookup
0.7-1.0: Creative writing, brainstorming
top_k
Number of chunks retrieved from vector DB:
Redwood: top_k = 5-10
Cedar: top_k = 10
Cypress: top_k = 50 (pre-rerank) → 10 (post-rerank)
Configurable: Agent configuration → Advanced Settings → Top K (range: 1-100)
Tradeoff:
Higher top_k → More context, slower retrieval, higher cost
Lower top_k → Faster, cheaper, but may miss relevant chunks
Observable behavior: Sources panel shows exactly top_k chunks (or fewer if threshold filters some out)
Reranking
Two-stage retrieval: fast vector search → precise cross-encoder scoring.
Implementation (Cypress only):
Vector search: Retrieve top_k=50 chunks (cosine similarity)
Reranker API: Score all 50 chunks using
bge-reranker-v2-m3(cross-encoder)Select top 10 by reranker score
Send to LLM
Reranker model: BAAI/bge-reranker-v2-m3
Latency added: ~200-500ms for 50 chunks
Observable behavior:
Cypress "Sources Used" panel shows higher precision than Redwood
Chunks may have different order than pure vector search would produce
RAG Strategies
Redwood (Standard)
Algorithm:
Embed user query
Vector search (top_k=10)
Filter by threshold (0.7)
Inject into LLM prompt
Latency: 1-2s Accuracy: 72% (internal eval) Cost: ~$0.002 per query
Use when: Questions are clear, single-hop retrieval sufficient
Cedar (Context-Aware)
Algorithm:
LLM rewrites query using conversation history
Embed rewritten query
Vector search (top_k=10)
Filter by threshold (0.7)
Inject into LLM prompt
Latency: 2-3s Accuracy: 78% (internal eval) Cost: ~$0.003 per query (extra LLM call for rewrite)
Use when: Multi-turn conversations, follow-up questions ("What about the other option?")
Observable behavior: Logs show "Rewritten query: [...]" in debug panel
Cypress (Advanced)
Algorithm:
LLM generates 3 query variations
Embed all 3 queries
Vector search each (top_k=50 total, deduplicated)
Rerank with cross-encoder → top 10
Inject into LLM prompt
Latency: 3-5s Accuracy: 85% (internal eval) Cost: ~$0.006 per query
Use when: High accuracy required, complex queries, multi-document synthesis
Observable behavior: Sources panel shows "Retrieved via multi-query expansion"
Agentic Workflow
Multi-step reasoning with tool calling (requires Cypress strategy).
Tools available:
search_knowledge_base(query): Recursive retrievalcalculate(expression): Math evaluationcall_api(endpoint, params): Custom API integration
Flow:
LLM decides if tools needed (function calling)
Execute tool, get result
LLM synthesizes final response
Latency: +1-3s per tool call Enable: Agent Configuration → Advanced → Agentic Mode (toggle)
Observable behavior: Response shows "Used tools: search_knowledge_base, calculate" in debug panel
Session Memory
Conversation history stored per session.
Storage:
Redis cache (key:
session:{session_id}:history)Max 10 turns or 4K tokens (whichever reached first)
Retention: 30 days
Behavior:
Follow-up questions use previous context (e.g., "What about X?" → knows what "what" refers to)
Session ID in API request:
{"session_id": "sess_abc123", "query": "..."}New session: Omit session_id, new one generated
Observable failure: If session expires (>30 days), follow-ups fail. Error: "Session not found"
Interaction
A database record for each query-response pair.
Schema:
Observable behavior: Inbox shows all interactions, filterable by agent/date/feedback
Citation
Source reference in response.
Format:
Extraction: Regex parsing of response to match numbered citations to chunks
Link behavior: Click citation → opens source document URL (if available) or shows chunk text in modal
Observable failure: If LLM doesn't format citations correctly, they don't render as links (appears as plain text)
Knowledge Base (KB)
Human-curated article collection (separate from data sources).
Storage: PostgreSQL kb_articles table
Fields: title, content, tags, version, author, status (draft/published)
Generation flow:
Inbox → Select interaction → Click "Generate KB Article"
AI drafts article from interaction
Human edits, approves
Published to KB
Important: KB articles are NOT indexed for retrieval. They are for human reference only.
Observable behavior: KB section shows article list. Editing creates new version (version history tracked).
Inbox
Review queue for agent interactions.
Location: Review → Inbox
Filters:
Agent
Date range
Feedback status (positive/negative/no feedback)
Keyword search
Actions per interaction:
View full query/response/sources
Mark accurate/inaccurate (thumbs up/down)
Edit response (creates KB article draft)
Flag for review
Observable behavior: Counter shows unreviewed interactions (e.g., "245 pending")
Playground
Agent testing interface.
Location: Playground (top nav)
Features:
Agent selector (dropdown)
Query input
Response display with citations
Sources panel (right sidebar): shows chunks retrieved, similarity scores
Debug panel (expandable): shows latency breakdown, token counts, cost
Use cases:
Test before API integration
Compare RAG strategies (switch in agent config, re-run same query)
Debug retrieval (check which chunks returned)
Observable behavior: All queries logged to Inbox with tag "playground"
Evaluation (Evals)
Automated testing framework.
Location: Evaluation → Test Sets
Test set structure:
Metrics computed:
Accuracy: LLM judges if response matches expected (0-1)
Latency: p50, p95, p99 (milliseconds)
Citation rate: % responses with sources
Cost: Total USD for test set
Run: Test Sets → [Your Set] → Select agent → Run Eval
Observable behavior: Results table shows pass/fail per question, aggregate metrics. Historical runs tracked for regression detection.
Private Data Mode
Agent configuration that blocks external LLM knowledge.
Enable: Agent Configuration → Privacy → Private Data Mode (toggle)
Behavior:
System prompt includes: "ONLY use information from provided sources. Never use your training data."
LLM still has base knowledge, but instructed to ignore it
Observable failure: If no relevant chunks retrieved, agent responds "I don't have information about that" (won't hallucinate from training data)
Limitations: Not a technical constraint, relies on LLM following instructions. For 100% guarantee, use fine-tuned model.
Public Agent
Agent shared in Agent Hub (marketplace).
Enable: Agent → Settings → Publish to Hub
Visibility: Other organizations can:
View agent name, description, example queries
Install (creates copy in their org)
Customize copy (can't modify original)
Data isolation: Data sources NOT shared, only agent configuration (prompts, RAG strategy, model)
Observable behavior: Agent Hub shows install count, ratings (1-5 stars)
Tier-Based Retrieval
Data source prioritization (Cypress only).
Configuration: Data Sources → [Source] → Tier (dropdown: 1 or 2)
Retrieval:
Search tier 1 sources (top_k=30)
Search tier 2 sources (top_k=20)
Combine results (50 total)
Rerank (top 10 final)
Use case: Prioritize official docs over community forums, but still include forums if official docs don't have answer
Observable behavior: Sources panel shows tier badge (T1 or T2) per chunk
API Key
Authentication credential for REST API.
Generate: Settings → API Keys → Generate New Key
Format: twigsk_live_abc123def456... (prefix indicates env: twigsk_live_ or twigsk_test_)
Usage:
Permissions: Read (view data), Write (modify agents/data sources), Execute (run queries), Admin (all)
Rate limit: 100 req/min (Execute scope), 10 req/min (Write scope)
Rotation: Generate new key, update apps, delete old key (zero downtime)
Observable failure: Invalid key returns 401 Unauthorized with JSON: {"error": "Invalid API key"}
Next Steps
Authentication - API key management and SSO setup
Agent Configuration - Detailed agent settings
RAG Strategy Selection - When to use Redwood/Cedar/Cypress
Last updated

