Core Concepts & Terminology

Technical reference for RAG terminology and Twig implementation details.

RAG (Retrieval-Augmented Generation)

RAG injects retrieved context into the LLM prompt before generation.

RAG Flow in Twig

Query embedding: Convert user query to 1536-dim vector (OpenAI ada-002)
Vector search: Query Pinecone index, return top-k chunks by cosine similarity (threshold: 0.7)
Context injection: Insert chunks into LLM prompt between system prompt and user query
LLM generation: OpenAI API generates response based on injected context
Citation extraction: Parse response, match claims to source chunks by span overlap

Observable behavior: Responses cite specific documents. If retrieval fails (no chunks above threshold), agent responds "I don't have information about that".

Agent

An agent is a configuration record with these fields:

agent_id: Unique identifier (format: agent_abc123)
name: Display name
system_prompt: Instructions prepended to every query
data_source_ids: Array of data sources to query
rag_strategy: redwood | cedar | cypress
model: gpt-4 | gpt-3.5-turbo | claude-3-sonnet
temperature: Float 0-2 (default: 0.7)
max_tokens: Integer (default: 500)

Storage: PostgreSQL agents table

Observable behavior: Different agents querying same data sources return different responses based on system prompt and strategy.

Data Source

A data source is an ingestion job configuration:

source_type: file | website | confluence | slack | google_drive | etc.
connection_params: OAuth tokens, API keys, URLs
sync_schedule: hourly | daily | weekly | manual
filters: Include/exclude rules (e.g., file extensions, URL patterns)

Processing stages:

Fetch (download documents)
Parse (extract text)
Chunk (split into 512-token segments with 50-token overlap)
Embed (OpenAI ada-002)
Index (upload vectors to Pinecone)

Status values: pending | processing | active | failed

Observable behavior: Data → [Source Name] → shows chunk count (e.g., "1,234 chunks indexed"). Last sync timestamp displayed.

Vector Embedding

A vector embedding is a 1536-dimensional float array representing text semantics.

Model: OpenAI text-embedding-ada-002 API: POST https://api.openai.com/v1/embeddings Cost: $0.0001 per 1K tokens

Example:

Input: "reset password"
Output: [0.0123, -0.4567, 0.7890, ..., 0.2345] (1536 floats)

Distance metric: Cosine similarity (-1 to 1, higher = more similar)

Observable behavior:

"reset password" and "change password" have cosine similarity ~0.85
"reset password" and "pizza delivery" have cosine similarity ~0.10

Semantic Search

Vector search using cosine similarity between query embedding and chunk embeddings.

Algorithm:

Embed query: q_vec = embed("reset my password")
Query Pinecone: results = index.query(q_vec, top_k=10, filter={org_id: "org_123"})
Pinecone returns chunks with similarity scores (0.0-1.0)
Filter chunks with score < 0.7 (configurable threshold)

Retrieval behavior:

Query "How to reset password?" retrieves chunks containing "password recovery", "reset credentials", "forgot password"
Does NOT require exact keyword match
Fails if no chunks score above threshold

Chunking

Document splitting strategy:

Chunk size: 512 tokens (default, configurable: 256-2048)
Overlap: 50 tokens (default, configurable: 0-200)
Splitting: Recursive by paragraph → sentence → token

Example:

Document (1500 tokens):
├─ Chunk 1: tokens 0-512
├─ Chunk 2: tokens 462-974 (50 token overlap)
└─ Chunk 3: tokens 924-1500

Rationale:

Smaller chunks → more precise retrieval, but less context per chunk
Larger chunks → more context, but lower precision
Overlap → prevents concepts split across boundaries

Observable behavior: Data source shows "N chunks indexed" (e.g., 100-page PDF → ~400-600 chunks)

Context Window

Maximum tokens the LLM processes in one request:

GPT-3.5-turbo: 16,384 tokens (~12,000 words)
GPT-4: 8,192 tokens (standard), 32,768 (extended), 128,000 (turbo)
Claude 3.5 Sonnet: 200,000 tokens

Token allocation (typical query):

System prompt: 200 tokens
Retrieved chunks (10 chunks × 512 tokens): 5,120 tokens
Conversation history: 500 tokens
User query: 50 tokens
Reserved for response: 500 tokens
---
Total: 6,370 tokens (fits in GPT-4 8K)

Observable failure: If total exceeds limit, API returns error:

{"error": "context_length_exceeded", "max": 8192, "actual": 9500}

Token

Text unit for LLM processing:

1 token ≈ 4 characters (English)
1 token ≈ 0.75 words (English)

Examples:

"Hello world!" = 3 tokens
"Retrieval-Augmented Generation" = 6 tokens
"https://example.com" = 5 tokens

Pricing (OpenAI):

GPT-4: $0.03/1K input tokens, $0.06/1K output tokens
GPT-3.5-turbo: $0.001/1K input tokens, $0.002/1K output tokens

Observable behavior: Query cost displayed in Analytics (e.g., "$0.0042 per query")

Temperature

Controls randomness in LLM sampling:

0.0: Deterministic (always picks highest probability token)
0.7: Balanced (default)
1.0: High variability
2.0: Maximum randomness

Observable behavior:

Temperature 0.0: Same query returns identical response every time
Temperature 1.0: Same query returns different phrasing each time (content consistent)

Use cases:

0.0-0.3: Factual Q&A, documentation lookup
0.7-1.0: Creative writing, brainstorming

top_k

Number of chunks retrieved from vector DB:

Redwood: top_k = 5-10
Cedar: top_k = 10
Cypress: top_k = 50 (pre-rerank) → 10 (post-rerank)

Configurable: Agent configuration → Advanced Settings → Top K (range: 1-100)

Tradeoff:

Higher top_k → More context, slower retrieval, higher cost
Lower top_k → Faster, cheaper, but may miss relevant chunks

Observable behavior: Sources panel shows exactly top_k chunks (or fewer if threshold filters some out)

Reranking

Two-stage retrieval: fast vector search → precise cross-encoder scoring.

Implementation (Cypress only):

Vector search: Retrieve top_k=50 chunks (cosine similarity)
Reranker API: Score all 50 chunks using bge-reranker-v2-m3 (cross-encoder)
Select top 10 by reranker score
Send to LLM

Reranker model: BAAI/bge-reranker-v2-m3 Latency added: ~200-500ms for 50 chunks

Observable behavior:

Cypress "Sources Used" panel shows higher precision than Redwood
Chunks may have different order than pure vector search would produce

RAG Strategies

Redwood (Standard)

Algorithm:

Embed user query
Vector search (top_k=10)
Filter by threshold (0.7)
Inject into LLM prompt

Latency: 1-2s Accuracy: 72% (internal eval) Cost: ~$0.002 per query

Use when: Questions are clear, single-hop retrieval sufficient

Cedar (Context-Aware)

Algorithm:

LLM rewrites query using conversation history
Embed rewritten query
Vector search (top_k=10)
Filter by threshold (0.7)
Inject into LLM prompt

Latency: 2-3s Accuracy: 78% (internal eval) Cost: ~$0.003 per query (extra LLM call for rewrite)

Use when: Multi-turn conversations, follow-up questions ("What about the other option?")

Observable behavior: Logs show "Rewritten query: [...]" in debug panel

Cypress (Advanced)

Algorithm:

LLM generates 3 query variations
Embed all 3 queries
Vector search each (top_k=50 total, deduplicated)
Rerank with cross-encoder → top 10
Inject into LLM prompt

Latency: 3-5s Accuracy: 85% (internal eval) Cost: ~$0.006 per query

Use when: High accuracy required, complex queries, multi-document synthesis

Observable behavior: Sources panel shows "Retrieved via multi-query expansion"

Agentic Workflow

Multi-step reasoning with tool calling (requires Cypress strategy).

Tools available:

search_knowledge_base(query): Recursive retrieval
calculate(expression): Math evaluation
call_api(endpoint, params): Custom API integration

Flow:

LLM decides if tools needed (function calling)
Execute tool, get result
LLM synthesizes final response

Latency: +1-3s per tool call Enable: Agent Configuration → Advanced → Agentic Mode (toggle)

Observable behavior: Response shows "Used tools: search_knowledge_base, calculate" in debug panel

Session Memory

Conversation history stored per session.

Storage:

Redis cache (key: session:{session_id}:history)
Max 10 turns or 4K tokens (whichever reached first)
Retention: 30 days

Behavior:

Follow-up questions use previous context (e.g., "What about X?" → knows what "what" refers to)
Session ID in API request: {"session_id": "sess_abc123", "query": "..."}
New session: Omit session_id, new one generated

Observable failure: If session expires (>30 days), follow-ups fail. Error: "Session not found"

Interaction

A database record for each query-response pair.

Schema:

interactions (
  id UUID PRIMARY KEY,
  agent_id UUID,
  session_id VARCHAR,
  query TEXT,
  response TEXT,
  chunks_used JSONB,
  latency_ms INT,
  cost_usd DECIMAL,
  feedback ENUM('positive', 'negative', NULL),
  created_at TIMESTAMP
)

Observable behavior: Inbox shows all interactions, filterable by agent/date/feedback

Citation

Source reference in response.

Format:

Answer text [1] more text [2].

Sources:
[1] Document Name, page 5 (chunk_id: chk_abc123)
[2] Another Doc, section 3 (chunk_id: chk_def456)

Extraction: Regex parsing of response to match numbered citations to chunks

Link behavior: Click citation → opens source document URL (if available) or shows chunk text in modal

Observable failure: If LLM doesn't format citations correctly, they don't render as links (appears as plain text)

Knowledge Base (KB)

Human-curated article collection (separate from data sources).

Storage: PostgreSQL kb_articles table Fields: title, content, tags, version, author, status (draft/published)

Generation flow:

Inbox → Select interaction → Click "Generate KB Article"
AI drafts article from interaction
Human edits, approves
Published to KB

Important: KB articles are NOT indexed for retrieval. They are for human reference only.

Observable behavior: KB section shows article list. Editing creates new version (version history tracked).

Inbox

Review queue for agent interactions.

Location: Review → Inbox

Filters:

Agent
Date range
Feedback status (positive/negative/no feedback)
Keyword search

Actions per interaction:

View full query/response/sources
Mark accurate/inaccurate (thumbs up/down)
Edit response (creates KB article draft)
Flag for review

Observable behavior: Counter shows unreviewed interactions (e.g., "245 pending")

Playground

Agent testing interface.

Location: Playground (top nav)

Features:

Agent selector (dropdown)
Query input
Response display with citations
Sources panel (right sidebar): shows chunks retrieved, similarity scores
Debug panel (expandable): shows latency breakdown, token counts, cost

Use cases:

Test before API integration
Compare RAG strategies (switch in agent config, re-run same query)
Debug retrieval (check which chunks returned)

Observable behavior: All queries logged to Inbox with tag "playground"

Evaluation (Evals)

Automated testing framework.

Location: Evaluation → Test Sets

Test set structure:

{
  "name": "Product FAQ Eval",
  "questions": [
    {"query": "What is pricing?", "expected": "Starts at $99/mo"},
    {"query": "Free trial?", "expected": "14 days"}
  ]
}

Metrics computed:

Accuracy: LLM judges if response matches expected (0-1)
Latency: p50, p95, p99 (milliseconds)
Citation rate: % responses with sources
Cost: Total USD for test set

Run: Test Sets → [Your Set] → Select agent → Run Eval

Observable behavior: Results table shows pass/fail per question, aggregate metrics. Historical runs tracked for regression detection.

Private Data Mode

Agent configuration that blocks external LLM knowledge.

Enable: Agent Configuration → Privacy → Private Data Mode (toggle)

Behavior:

System prompt includes: "ONLY use information from provided sources. Never use your training data."
LLM still has base knowledge, but instructed to ignore it

Observable failure: If no relevant chunks retrieved, agent responds "I don't have information about that" (won't hallucinate from training data)

Limitations: Not a technical constraint, relies on LLM following instructions. For 100% guarantee, use fine-tuned model.

Public Agent

Agent shared in Agent Hub (marketplace).

Enable: Agent → Settings → Publish to Hub

Visibility: Other organizations can:

View agent name, description, example queries
Install (creates copy in their org)
Customize copy (can't modify original)

Data isolation: Data sources NOT shared, only agent configuration (prompts, RAG strategy, model)

Observable behavior: Agent Hub shows install count, ratings (1-5 stars)

Tier-Based Retrieval

Data source prioritization (Cypress only).

Configuration: Data Sources → [Source] → Tier (dropdown: 1 or 2)

Retrieval:

Search tier 1 sources (top_k=30)
Search tier 2 sources (top_k=20)
Combine results (50 total)
Rerank (top 10 final)

Use case: Prioritize official docs over community forums, but still include forums if official docs don't have answer

Observable behavior: Sources panel shows tier badge (T1 or T2) per chunk

API Key

Authentication credential for REST API.

Generate: Settings → API Keys → Generate New Key

Format: twigsk_live_abc123def456... (prefix indicates env: twigsk_live_ or twigsk_test_)

Usage:

curl -H "Authorization: Bearer twigsk_live_abc123..." \
     https://api.twig.so/v1/query

Permissions: Read (view data), Write (modify agents/data sources), Execute (run queries), Admin (all)

Rate limit: 100 req/min (Execute scope), 10 req/min (Write scope)

Rotation: Generate new key, update apps, delete old key (zero downtime)

Observable failure: Invalid key returns 401 Unauthorized with JSON: {"error": "Invalid API key"}

Next Steps

Authentication - API key management and SSO setup

Agent Configuration - Detailed agent settings

RAG Strategy Selection - When to use Redwood/Cedar/Cypress

PreviousQuick Start Guide NextAuthentication

Last updated 1 month ago

hashtagRAG (Retrieval-Augmented Generation)

hashtagRAG Flow in Twig

hashtagAgent

hashtagData Source

hashtagVector Embedding

hashtagSemantic Search

hashtagChunking

hashtagContext Window

hashtagToken

hashtagTemperature

hashtagtop_k

hashtagReranking

hashtagRAG Strategies

hashtagRedwood (Standard)

hashtagCedar (Context-Aware)

hashtagCypress (Advanced)

hashtagAgentic Workflow

hashtagSession Memory

hashtagInteraction

hashtagCitation

hashtagKnowledge Base (KB)

hashtagInbox

hashtagPlayground

hashtagEvaluation (Evals)

hashtagPrivate Data Mode

hashtagPublic Agent

hashtagTier-Based Retrieval

hashtagAPI Key

hashtagNext Steps

RAG (Retrieval-Augmented Generation)

RAG Flow in Twig

Agent

Data Source

Vector Embedding

Semantic Search

Chunking

Context Window

Token

Temperature

top_k

Reranking

RAG Strategies

Redwood (Standard)

Cedar (Context-Aware)

Cypress (Advanced)

Agentic Workflow

Session Memory

Interaction

Citation

Knowledge Base (KB)

Inbox

Playground

Evaluation (Evals)

Private Data Mode

Public Agent

Tier-Based Retrieval

API Key

Next Steps