Core Concepts & Terminology

Technical reference for RAG terminology and Twig implementation details.

RAG (Retrieval-Augmented Generation)

RAG injects retrieved context into the LLM prompt before generation.

RAG Flow in Twig

  1. Query embedding: Convert user query to 1536-dim vector (OpenAI ada-002)

  2. Vector search: Query Pinecone index, return top-k chunks by cosine similarity (threshold: 0.7)

  3. Context injection: Insert chunks into LLM prompt between system prompt and user query

  4. LLM generation: OpenAI API generates response based on injected context

  5. Citation extraction: Parse response, match claims to source chunks by span overlap

Observable behavior: Responses cite specific documents. If retrieval fails (no chunks above threshold), agent responds "I don't have information about that".

Agent

An agent is a configuration record with these fields:

  • agent_id: Unique identifier (format: agent_abc123)

  • name: Display name

  • system_prompt: Instructions prepended to every query

  • data_source_ids: Array of data sources to query

  • rag_strategy: redwood | cedar | cypress

  • model: gpt-4 | gpt-3.5-turbo | claude-3-sonnet

  • temperature: Float 0-2 (default: 0.7)

  • max_tokens: Integer (default: 500)

Storage: PostgreSQL agents table

Observable behavior: Different agents querying same data sources return different responses based on system prompt and strategy.

Data Source

A data source is an ingestion job configuration:

  • source_type: file | website | confluence | slack | google_drive | etc.

  • connection_params: OAuth tokens, API keys, URLs

  • sync_schedule: hourly | daily | weekly | manual

  • filters: Include/exclude rules (e.g., file extensions, URL patterns)

Processing stages:

  1. Fetch (download documents)

  2. Parse (extract text)

  3. Chunk (split into 512-token segments with 50-token overlap)

  4. Embed (OpenAI ada-002)

  5. Index (upload vectors to Pinecone)

Status values: pending | processing | active | failed

Observable behavior: Data → [Source Name] → shows chunk count (e.g., "1,234 chunks indexed"). Last sync timestamp displayed.

Vector Embedding

A vector embedding is a 1536-dimensional float array representing text semantics.

Model: OpenAI text-embedding-ada-002 API: POST https://api.openai.com/v1/embeddings Cost: $0.0001 per 1K tokens

Example:

Distance metric: Cosine similarity (-1 to 1, higher = more similar)

Observable behavior:

  • "reset password" and "change password" have cosine similarity ~0.85

  • "reset password" and "pizza delivery" have cosine similarity ~0.10

Vector search using cosine similarity between query embedding and chunk embeddings.

Algorithm:

  1. Embed query: q_vec = embed("reset my password")

  2. Query Pinecone: results = index.query(q_vec, top_k=10, filter={org_id: "org_123"})

  3. Pinecone returns chunks with similarity scores (0.0-1.0)

  4. Filter chunks with score < 0.7 (configurable threshold)

Retrieval behavior:

  • Query "How to reset password?" retrieves chunks containing "password recovery", "reset credentials", "forgot password"

  • Does NOT require exact keyword match

  • Fails if no chunks score above threshold

Chunking

Document splitting strategy:

  • Chunk size: 512 tokens (default, configurable: 256-2048)

  • Overlap: 50 tokens (default, configurable: 0-200)

  • Splitting: Recursive by paragraph → sentence → token

Example:

Rationale:

  • Smaller chunks → more precise retrieval, but less context per chunk

  • Larger chunks → more context, but lower precision

  • Overlap → prevents concepts split across boundaries

Observable behavior: Data source shows "N chunks indexed" (e.g., 100-page PDF → ~400-600 chunks)

Context Window

Maximum tokens the LLM processes in one request:

  • GPT-3.5-turbo: 16,384 tokens (~12,000 words)

  • GPT-4: 8,192 tokens (standard), 32,768 (extended), 128,000 (turbo)

  • Claude 3.5 Sonnet: 200,000 tokens

Token allocation (typical query):

Observable failure: If total exceeds limit, API returns error:

Token

Text unit for LLM processing:

  • 1 token ≈ 4 characters (English)

  • 1 token ≈ 0.75 words (English)

Examples:

  • "Hello world!" = 3 tokens

  • "Retrieval-Augmented Generation" = 6 tokens

  • "https://example.com" = 5 tokens

Pricing (OpenAI):

  • GPT-4: $0.03/1K input tokens, $0.06/1K output tokens

  • GPT-3.5-turbo: $0.001/1K input tokens, $0.002/1K output tokens

Observable behavior: Query cost displayed in Analytics (e.g., "$0.0042 per query")

Temperature

Controls randomness in LLM sampling:

  • 0.0: Deterministic (always picks highest probability token)

  • 0.7: Balanced (default)

  • 1.0: High variability

  • 2.0: Maximum randomness

Observable behavior:

  • Temperature 0.0: Same query returns identical response every time

  • Temperature 1.0: Same query returns different phrasing each time (content consistent)

Use cases:

  • 0.0-0.3: Factual Q&A, documentation lookup

  • 0.7-1.0: Creative writing, brainstorming

top_k

Number of chunks retrieved from vector DB:

  • Redwood: top_k = 5-10

  • Cedar: top_k = 10

  • Cypress: top_k = 50 (pre-rerank) → 10 (post-rerank)

Configurable: Agent configuration → Advanced Settings → Top K (range: 1-100)

Tradeoff:

  • Higher top_k → More context, slower retrieval, higher cost

  • Lower top_k → Faster, cheaper, but may miss relevant chunks

Observable behavior: Sources panel shows exactly top_k chunks (or fewer if threshold filters some out)

Reranking

Two-stage retrieval: fast vector search → precise cross-encoder scoring.

Implementation (Cypress only):

  1. Vector search: Retrieve top_k=50 chunks (cosine similarity)

  2. Reranker API: Score all 50 chunks using bge-reranker-v2-m3 (cross-encoder)

  3. Select top 10 by reranker score

  4. Send to LLM

Reranker model: BAAI/bge-reranker-v2-m3 Latency added: ~200-500ms for 50 chunks

Observable behavior:

  • Cypress "Sources Used" panel shows higher precision than Redwood

  • Chunks may have different order than pure vector search would produce

RAG Strategies

Redwood (Standard)

Algorithm:

  1. Embed user query

  2. Vector search (top_k=10)

  3. Filter by threshold (0.7)

  4. Inject into LLM prompt

Latency: 1-2s Accuracy: 72% (internal eval) Cost: ~$0.002 per query

Use when: Questions are clear, single-hop retrieval sufficient

Cedar (Context-Aware)

Algorithm:

  1. LLM rewrites query using conversation history

  2. Embed rewritten query

  3. Vector search (top_k=10)

  4. Filter by threshold (0.7)

  5. Inject into LLM prompt

Latency: 2-3s Accuracy: 78% (internal eval) Cost: ~$0.003 per query (extra LLM call for rewrite)

Use when: Multi-turn conversations, follow-up questions ("What about the other option?")

Observable behavior: Logs show "Rewritten query: [...]" in debug panel

Cypress (Advanced)

Algorithm:

  1. LLM generates 3 query variations

  2. Embed all 3 queries

  3. Vector search each (top_k=50 total, deduplicated)

  4. Rerank with cross-encoder → top 10

  5. Inject into LLM prompt

Latency: 3-5s Accuracy: 85% (internal eval) Cost: ~$0.006 per query

Use when: High accuracy required, complex queries, multi-document synthesis

Observable behavior: Sources panel shows "Retrieved via multi-query expansion"

Agentic Workflow

Multi-step reasoning with tool calling (requires Cypress strategy).

Tools available:

  • search_knowledge_base(query): Recursive retrieval

  • calculate(expression): Math evaluation

  • call_api(endpoint, params): Custom API integration

Flow:

  1. LLM decides if tools needed (function calling)

  2. Execute tool, get result

  3. LLM synthesizes final response

Latency: +1-3s per tool call Enable: Agent Configuration → Advanced → Agentic Mode (toggle)

Observable behavior: Response shows "Used tools: search_knowledge_base, calculate" in debug panel

Session Memory

Conversation history stored per session.

Storage:

  • Redis cache (key: session:{session_id}:history)

  • Max 10 turns or 4K tokens (whichever reached first)

  • Retention: 30 days

Behavior:

  • Follow-up questions use previous context (e.g., "What about X?" → knows what "what" refers to)

  • Session ID in API request: {"session_id": "sess_abc123", "query": "..."}

  • New session: Omit session_id, new one generated

Observable failure: If session expires (>30 days), follow-ups fail. Error: "Session not found"

Interaction

A database record for each query-response pair.

Schema:

Observable behavior: Inbox shows all interactions, filterable by agent/date/feedback

Citation

Source reference in response.

Format:

Extraction: Regex parsing of response to match numbered citations to chunks

Link behavior: Click citation → opens source document URL (if available) or shows chunk text in modal

Observable failure: If LLM doesn't format citations correctly, they don't render as links (appears as plain text)

Knowledge Base (KB)

Human-curated article collection (separate from data sources).

Storage: PostgreSQL kb_articles table Fields: title, content, tags, version, author, status (draft/published)

Generation flow:

  1. Inbox → Select interaction → Click "Generate KB Article"

  2. AI drafts article from interaction

  3. Human edits, approves

  4. Published to KB

Important: KB articles are NOT indexed for retrieval. They are for human reference only.

Observable behavior: KB section shows article list. Editing creates new version (version history tracked).

Inbox

Review queue for agent interactions.

Location: Review → Inbox

Filters:

  • Agent

  • Date range

  • Feedback status (positive/negative/no feedback)

  • Keyword search

Actions per interaction:

  • View full query/response/sources

  • Mark accurate/inaccurate (thumbs up/down)

  • Edit response (creates KB article draft)

  • Flag for review

Observable behavior: Counter shows unreviewed interactions (e.g., "245 pending")

Playground

Agent testing interface.

Location: Playground (top nav)

Features:

  • Agent selector (dropdown)

  • Query input

  • Response display with citations

  • Sources panel (right sidebar): shows chunks retrieved, similarity scores

  • Debug panel (expandable): shows latency breakdown, token counts, cost

Use cases:

  • Test before API integration

  • Compare RAG strategies (switch in agent config, re-run same query)

  • Debug retrieval (check which chunks returned)

Observable behavior: All queries logged to Inbox with tag "playground"

Evaluation (Evals)

Automated testing framework.

Location: Evaluation → Test Sets

Test set structure:

Metrics computed:

  • Accuracy: LLM judges if response matches expected (0-1)

  • Latency: p50, p95, p99 (milliseconds)

  • Citation rate: % responses with sources

  • Cost: Total USD for test set

Run: Test Sets → [Your Set] → Select agent → Run Eval

Observable behavior: Results table shows pass/fail per question, aggregate metrics. Historical runs tracked for regression detection.

Private Data Mode

Agent configuration that blocks external LLM knowledge.

Enable: Agent Configuration → Privacy → Private Data Mode (toggle)

Behavior:

  • System prompt includes: "ONLY use information from provided sources. Never use your training data."

  • LLM still has base knowledge, but instructed to ignore it

Observable failure: If no relevant chunks retrieved, agent responds "I don't have information about that" (won't hallucinate from training data)

Limitations: Not a technical constraint, relies on LLM following instructions. For 100% guarantee, use fine-tuned model.

Public Agent

Agent shared in Agent Hub (marketplace).

Enable: Agent → Settings → Publish to Hub

Visibility: Other organizations can:

  • View agent name, description, example queries

  • Install (creates copy in their org)

  • Customize copy (can't modify original)

Data isolation: Data sources NOT shared, only agent configuration (prompts, RAG strategy, model)

Observable behavior: Agent Hub shows install count, ratings (1-5 stars)

Tier-Based Retrieval

Data source prioritization (Cypress only).

Configuration: Data Sources → [Source] → Tier (dropdown: 1 or 2)

Retrieval:

  1. Search tier 1 sources (top_k=30)

  2. Search tier 2 sources (top_k=20)

  3. Combine results (50 total)

  4. Rerank (top 10 final)

Use case: Prioritize official docs over community forums, but still include forums if official docs don't have answer

Observable behavior: Sources panel shows tier badge (T1 or T2) per chunk

API Key

Authentication credential for REST API.

Generate: Settings → API Keys → Generate New Key

Format: twigsk_live_abc123def456... (prefix indicates env: twigsk_live_ or twigsk_test_)

Usage:

Permissions: Read (view data), Write (modify agents/data sources), Execute (run queries), Admin (all)

Rate limit: 100 req/min (Execute scope), 10 req/min (Write scope)

Rotation: Generate new key, update apps, delete old key (zero downtime)

Observable failure: Invalid key returns 401 Unauthorized with JSON: {"error": "Invalid API key"}

Next Steps

Authentication - API key management and SSO setup

Agent Configuration - Detailed agent settings

RAG Strategy Selection - When to use Redwood/Cedar/Cypress

Last updated