# Core Concepts & Terminology

Technical reference for RAG terminology and Twig implementation details.

## RAG (Retrieval-Augmented Generation)

RAG injects retrieved context into the LLM prompt before generation.

### RAG Flow in Twig

1. **Query embedding**: Convert user query to 1536-dim vector (OpenAI ada-002)
2. **Vector search**: Query Pinecone index, return top-k chunks by cosine similarity (threshold: 0.7)
3. **Context injection**: Insert chunks into LLM prompt between system prompt and user query
4. **LLM generation**: OpenAI API generates response based on injected context
5. **Citation extraction**: Parse response, match claims to source chunks by span overlap

**Observable behavior**: Responses cite specific documents. If retrieval fails (no chunks above threshold), agent responds "I don't have information about that".

## Agent

An agent is a configuration record with these fields:

* **agent\_id**: Unique identifier (format: `agent_abc123`)
* **name**: Display name
* **system\_prompt**: Instructions prepended to every query
* **data\_source\_ids**: Array of data sources to query
* **rag\_strategy**: `redwood` | `cedar` | `cypress`
* **model**: `gpt-4` | `gpt-3.5-turbo` | `claude-3-sonnet`
* **temperature**: Float 0-2 (default: 0.7)
* **max\_tokens**: Integer (default: 500)

**Storage**: PostgreSQL agents table

**Observable behavior**: Different agents querying same data sources return different responses based on system prompt and strategy.

## Data Source

A data source is an ingestion job configuration:

* **source\_type**: `file` | `website` | `confluence` | `slack` | `google_drive` | etc.
* **connection\_params**: OAuth tokens, API keys, URLs
* **sync\_schedule**: `hourly` | `daily` | `weekly` | `manual`
* **filters**: Include/exclude rules (e.g., file extensions, URL patterns)

**Processing stages**:

1. Fetch (download documents)
2. Parse (extract text)
3. Chunk (split into 512-token segments with 50-token overlap)
4. Embed (OpenAI ada-002)
5. Index (upload vectors to Pinecone)

**Status values**: `pending` | `processing` | `active` | `failed`

**Observable behavior**: Data → \[Source Name] → shows chunk count (e.g., "1,234 chunks indexed"). Last sync timestamp displayed.

## Vector Embedding

A vector embedding is a 1536-dimensional float array representing text semantics.

**Model**: OpenAI text-embedding-ada-002\
**API**: `POST https://api.openai.com/v1/embeddings`\
**Cost**: $0.0001 per 1K tokens

**Example**:

```
Input: "reset password"
Output: [0.0123, -0.4567, 0.7890, ..., 0.2345] (1536 floats)
```

**Distance metric**: Cosine similarity (-1 to 1, higher = more similar)

**Observable behavior**:

* "reset password" and "change password" have cosine similarity \~0.85
* "reset password" and "pizza delivery" have cosine similarity \~0.10

## Semantic Search

Vector search using cosine similarity between query embedding and chunk embeddings.

**Algorithm**:

1. Embed query: `q_vec = embed("reset my password")`
2. Query Pinecone: `results = index.query(q_vec, top_k=10, filter={org_id: "org_123"})`
3. Pinecone returns chunks with similarity scores (0.0-1.0)
4. Filter chunks with score < 0.7 (configurable threshold)

**Retrieval behavior**:

* Query "How to reset password?" retrieves chunks containing "password recovery", "reset credentials", "forgot password"
* Does NOT require exact keyword match
* Fails if no chunks score above threshold

## Chunking

Document splitting strategy:

* **Chunk size**: 512 tokens (default, configurable: 256-2048)
* **Overlap**: 50 tokens (default, configurable: 0-200)
* **Splitting**: Recursive by paragraph → sentence → token

**Example**:

```
Document (1500 tokens):
├─ Chunk 1: tokens 0-512
├─ Chunk 2: tokens 462-974 (50 token overlap)
└─ Chunk 3: tokens 924-1500
```

**Rationale**:

* Smaller chunks → more precise retrieval, but less context per chunk
* Larger chunks → more context, but lower precision
* Overlap → prevents concepts split across boundaries

**Observable behavior**: Data source shows "N chunks indexed" (e.g., 100-page PDF → \~400-600 chunks)

## Context Window

Maximum tokens the LLM processes in one request:

* **GPT-3.5-turbo**: 16,384 tokens (\~12,000 words)
* **GPT-4**: 8,192 tokens (standard), 32,768 (extended), 128,000 (turbo)
* **Claude 3.5 Sonnet**: 200,000 tokens

**Token allocation** (typical query):

```
System prompt: 200 tokens
Retrieved chunks (10 chunks × 512 tokens): 5,120 tokens
Conversation history: 500 tokens
User query: 50 tokens
Reserved for response: 500 tokens
---
Total: 6,370 tokens (fits in GPT-4 8K)
```

**Observable failure**: If total exceeds limit, API returns error:

```json
{"error": "context_length_exceeded", "max": 8192, "actual": 9500}
```

## Token

Text unit for LLM processing:

* **1 token ≈ 4 characters** (English)
* **1 token ≈ 0.75 words** (English)

**Examples**:

* "Hello world!" = 3 tokens
* "Retrieval-Augmented Generation" = 6 tokens
* "<https://example.com>" = 5 tokens

**Pricing** (OpenAI):

* GPT-4: $0.03/1K input tokens, $0.06/1K output tokens
* GPT-3.5-turbo: $0.001/1K input tokens, $0.002/1K output tokens

**Observable behavior**: Query cost displayed in Analytics (e.g., "$0.0042 per query")

## Temperature

Controls randomness in LLM sampling:

* **0.0**: Deterministic (always picks highest probability token)
* **0.7**: Balanced (default)
* **1.0**: High variability
* **2.0**: Maximum randomness

**Observable behavior**:

* Temperature 0.0: Same query returns identical response every time
* Temperature 1.0: Same query returns different phrasing each time (content consistent)

**Use cases**:

* 0.0-0.3: Factual Q\&A, documentation lookup
* 0.7-1.0: Creative writing, brainstorming

## top\_k

Number of chunks retrieved from vector DB:

* **Redwood**: top\_k = 5-10
* **Cedar**: top\_k = 10
* **Cypress**: top\_k = 50 (pre-rerank) → 10 (post-rerank)

**Configurable**: Agent configuration → Advanced Settings → Top K (range: 1-100)

**Tradeoff**:

* Higher top\_k → More context, slower retrieval, higher cost
* Lower top\_k → Faster, cheaper, but may miss relevant chunks

**Observable behavior**: Sources panel shows exactly top\_k chunks (or fewer if threshold filters some out)

## Reranking

Two-stage retrieval: fast vector search → precise cross-encoder scoring.

**Implementation** (Cypress only):

1. Vector search: Retrieve top\_k=50 chunks (cosine similarity)
2. Reranker API: Score all 50 chunks using `bge-reranker-v2-m3` (cross-encoder)
3. Select top 10 by reranker score
4. Send to LLM

**Reranker model**: `BAAI/bge-reranker-v2-m3`\
**Latency added**: \~200-500ms for 50 chunks

**Observable behavior**:

* Cypress "Sources Used" panel shows higher precision than Redwood
* Chunks may have different order than pure vector search would produce

## RAG Strategies

### Redwood (Standard)

**Algorithm**:

1. Embed user query
2. Vector search (top\_k=10)
3. Filter by threshold (0.7)
4. Inject into LLM prompt

**Latency**: 1-2s\
**Accuracy**: 72% (internal eval)\
**Cost**: \~$0.002 per query

**Use when**: Questions are clear, single-hop retrieval sufficient

### Cedar (Context-Aware)

**Algorithm**:

1. LLM rewrites query using conversation history
2. Embed rewritten query
3. Vector search (top\_k=10)
4. Filter by threshold (0.7)
5. Inject into LLM prompt

**Latency**: 2-3s\
**Accuracy**: 78% (internal eval)\
**Cost**: \~$0.003 per query (extra LLM call for rewrite)

**Use when**: Multi-turn conversations, follow-up questions ("What about the other option?")

**Observable behavior**: Logs show "Rewritten query: \[...]" in debug panel

### Cypress (Advanced)

**Algorithm**:

1. LLM generates 3 query variations
2. Embed all 3 queries
3. Vector search each (top\_k=50 total, deduplicated)
4. Rerank with cross-encoder → top 10
5. Inject into LLM prompt

**Latency**: 3-5s\
**Accuracy**: 85% (internal eval)\
**Cost**: \~$0.006 per query

**Use when**: High accuracy required, complex queries, multi-document synthesis

**Observable behavior**: Sources panel shows "Retrieved via multi-query expansion"

## Agentic Workflow

Multi-step reasoning with tool calling (requires Cypress strategy).

**Tools available**:

* `search_knowledge_base(query)`: Recursive retrieval
* `calculate(expression)`: Math evaluation
* `call_api(endpoint, params)`: Custom API integration

**Flow**:

1. LLM decides if tools needed (function calling)
2. Execute tool, get result
3. LLM synthesizes final response

**Latency**: +1-3s per tool call\
**Enable**: Agent Configuration → Advanced → Agentic Mode (toggle)

**Observable behavior**: Response shows "Used tools: search\_knowledge\_base, calculate" in debug panel

## Session Memory

Conversation history stored per session.

**Storage**:

* Redis cache (key: `session:{session_id}:history`)
* Max 10 turns or 4K tokens (whichever reached first)
* Retention: 30 days

**Behavior**:

* Follow-up questions use previous context (e.g., "What about X?" → knows what "what" refers to)
* Session ID in API request: `{"session_id": "sess_abc123", "query": "..."}`
* New session: Omit session\_id, new one generated

**Observable failure**: If session expires (>30 days), follow-ups fail. Error: "Session not found"

## Interaction

A database record for each query-response pair.

**Schema**:

```sql
interactions (
  id UUID PRIMARY KEY,
  agent_id UUID,
  session_id VARCHAR,
  query TEXT,
  response TEXT,
  chunks_used JSONB,
  latency_ms INT,
  cost_usd DECIMAL,
  feedback ENUM('positive', 'negative', NULL),
  created_at TIMESTAMP
)
```

**Observable behavior**: Inbox shows all interactions, filterable by agent/date/feedback

## Citation

Source reference in response.

**Format**:

```
Answer text [1] more text [2].

Sources:
[1] Document Name, page 5 (chunk_id: chk_abc123)
[2] Another Doc, section 3 (chunk_id: chk_def456)
```

**Extraction**: Regex parsing of response to match numbered citations to chunks

**Link behavior**: Click citation → opens source document URL (if available) or shows chunk text in modal

**Observable failure**: If LLM doesn't format citations correctly, they don't render as links (appears as plain text)

## Knowledge Base (KB)

Human-curated article collection (separate from data sources).

**Storage**: PostgreSQL `kb_articles` table\
**Fields**: title, content, tags, version, author, status (draft/published)

**Generation flow**:

1. Inbox → Select interaction → Click "Generate KB Article"
2. AI drafts article from interaction
3. Human edits, approves
4. Published to KB

**Important**: KB articles are NOT indexed for retrieval. They are for human reference only.

**Observable behavior**: KB section shows article list. Editing creates new version (version history tracked).

## Inbox

Review queue for agent interactions.

**Location**: Review → Inbox

**Filters**:

* Agent
* Date range
* Feedback status (positive/negative/no feedback)
* Keyword search

**Actions per interaction**:

* View full query/response/sources
* Mark accurate/inaccurate (thumbs up/down)
* Edit response (creates KB article draft)
* Flag for review

**Observable behavior**: Counter shows unreviewed interactions (e.g., "245 pending")

## Playground

Agent testing interface.

**Location**: Playground (top nav)

**Features**:

* Agent selector (dropdown)
* Query input
* Response display with citations
* Sources panel (right sidebar): shows chunks retrieved, similarity scores
* Debug panel (expandable): shows latency breakdown, token counts, cost

**Use cases**:

* Test before API integration
* Compare RAG strategies (switch in agent config, re-run same query)
* Debug retrieval (check which chunks returned)

**Observable behavior**: All queries logged to Inbox with tag "playground"

## Evaluation (Evals)

Automated testing framework.

**Location**: Evaluation → Test Sets

**Test set structure**:

```json
{
  "name": "Product FAQ Eval",
  "questions": [
    {"query": "What is pricing?", "expected": "Starts at $99/mo"},
    {"query": "Free trial?", "expected": "14 days"}
  ]
}
```

**Metrics computed**:

* **Accuracy**: LLM judges if response matches expected (0-1)
* **Latency**: p50, p95, p99 (milliseconds)
* **Citation rate**: % responses with sources
* **Cost**: Total USD for test set

**Run**: Test Sets → \[Your Set] → Select agent → Run Eval

**Observable behavior**: Results table shows pass/fail per question, aggregate metrics. Historical runs tracked for regression detection.

## Private Data Mode

Agent configuration that blocks external LLM knowledge.

**Enable**: Agent Configuration → Privacy → Private Data Mode (toggle)

**Behavior**:

* System prompt includes: "ONLY use information from provided sources. Never use your training data."
* LLM still has base knowledge, but instructed to ignore it

**Observable failure**: If no relevant chunks retrieved, agent responds "I don't have information about that" (won't hallucinate from training data)

**Limitations**: Not a technical constraint, relies on LLM following instructions. For 100% guarantee, use fine-tuned model.

## Public Agent

Agent shared in Agent Hub (marketplace).

**Enable**: Agent → Settings → Publish to Hub

**Visibility**: Other organizations can:

* View agent name, description, example queries
* Install (creates copy in their org)
* Customize copy (can't modify original)

**Data isolation**: Data sources NOT shared, only agent configuration (prompts, RAG strategy, model)

**Observable behavior**: Agent Hub shows install count, ratings (1-5 stars)

## Tier-Based Retrieval

Data source prioritization (Cypress only).

**Configuration**: Data Sources → \[Source] → Tier (dropdown: 1 or 2)

**Retrieval**:

1. Search tier 1 sources (top\_k=30)
2. Search tier 2 sources (top\_k=20)
3. Combine results (50 total)
4. Rerank (top 10 final)

**Use case**: Prioritize official docs over community forums, but still include forums if official docs don't have answer

**Observable behavior**: Sources panel shows tier badge (T1 or T2) per chunk

## API Key

Authentication credential for REST API.

**Generate**: Settings → API Keys → Generate New Key

**Format**: `twigsk_live_abc123def456...` (prefix indicates env: `twigsk_live_` or `twigsk_test_`)

**Usage**:

```bash
curl -H "Authorization: Bearer twigsk_live_abc123..." \
     https://api.twig.so/v1/query
```

**Permissions**: Read (view data), Write (modify agents/data sources), Execute (run queries), Admin (all)

**Rate limit**: 100 req/min (Execute scope), 10 req/min (Write scope)

**Rotation**: Generate new key, update apps, delete old key (zero downtime)

**Observable failure**: Invalid key returns `401 Unauthorized` with JSON: `{"error": "Invalid API key"}`

## Next Steps

[Authentication](/getting-started/authentication.md) - API key management and SSO setup

[Agent Configuration](/product/overview/configuration.md) - Detailed agent settings

[RAG Strategy Selection](broken://pages/fE2Uj82OOlNqfNvUfMip) - When to use Redwood/Cedar/Cypress


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/getting-started/core-concepts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
