Platform Overview & Architecture

Twig is a RAG platform that indexes your data sources and uses vector search to retrieve relevant context for LLM responses.

What Twig Provides

  • Data ingestion: Connectors for Confluence, Slack, Google Drive, websites, files (14+ sources)

  • Processing pipeline: Document chunking (default: 512 tokens), embedding generation (OpenAI ada-002), vector indexing

  • Retrieval engine: Semantic search across embeddings, reranking, multi-query expansion

  • LLM orchestration: Context assembly, prompt templating, response generation (GPT-4, GPT-3.5, Claude)

  • Agent configuration: System prompts, RAG strategy selection, data source filtering, model parameters

  • Deployment: REST API, embeddable widgets, browser extensions, Slack/Zendesk/Outlook apps

Core Components

1. Agents

Configuration layer that defines:

  • System prompt (instructions)

  • Data sources to query (can select subset)

  • RAG strategy (Redwood/Cedar/Cypress)

  • Model (GPT-4, GPT-3.5-turbo, Claude)

  • Temperature (0-2, default: 0.7)

  • Max tokens (response length limit)

2. Data Sources

Ingestion connectors:

  • File uploads: PDF, DOCX, TXT, CSV (max 50MB per file)

  • Website crawler: max 10,000 pages per domain

  • OAuth connectors: Confluence, Slack, Google Drive, SharePoint, OneDrive

  • Custom integrations: API endpoints, webhooks

Sync frequency: hourly, daily, weekly, or manual

3. RAG Engine

Query processing:

  • Embedding: Convert query to 1536-dim vector (OpenAI ada-002)

  • Retrieval: Search vector DB, return top-k chunks (k=5-50 based on strategy)

  • Reranking (Cypress only): Re-score with cross-encoder (bge-reranker-v2-m3)

  • Context assembly: Inject chunks into LLM prompt

  • Generation: LLM produces response with citations

4. Knowledge Base

Human-reviewed articles:

  • Generated from agent interactions (opt-in)

  • Manually created via UI

  • Versioned (track edits)

  • Tagged for categorization

  • Not used in retrieval (separate from data sources)

5. Analytics

Metrics tracked:

  • Query count (per day/week/month)

  • Average latency (p50, p95, p99)

  • Accuracy rate (from human feedback)

  • Cost per query (tokens * model pricing)

  • Citation rate (% responses with sources)

Architecture Flow

Key Features

Security

  • SOC 2 Type II: Audited controls for data handling

  • SSO: SAML 2.0, OAuth 2.0 (Google, Microsoft, Okta)

  • RBAC: Roles: Admin, Developer, Viewer. Permissions: manage agents, API access, view analytics.

  • Encryption: AES-256 at rest (RDS, S3), TLS 1.3 in transit

RAG Strategies

  • Redwood: Vector search, top-k=5-10 chunks, latency 1-2s

  • Cedar: Query rewrite (conversation context), top-k=10, latency 2-3s

  • Cypress: Multi-query expansion, top-k=50 pre-rerank → 10 post-rerank, latency 3-5s

Accuracy (measured on internal eval set):

  • Redwood: 72% correct answers

  • Cedar: 78% correct answers

  • Cypress: 85% correct answers

Deployment Options

  • REST API: /api/v1/query endpoint (rate limit: 100 req/min)

  • Embeddable widget: iframe or web component, configurable UI

  • Chrome extension: Sidebar panel, keyboard shortcut (Cmd+Shift+K)

  • Slack bot: Mention @TwigBot in channels, DM support

  • Zendesk app: Native sidebar app, ticket context injection

  • Outlook add-in: Email compose assistance

Feedback Loop

  • Thumbs up/down: Captured per response, stored with query ID

  • Inbox: Review queue for flagged responses, edit/approve workflow

  • KB generation: Convert reviewed responses to KB articles (manual trigger)

  • Evals: Run test sets (question + expected answer), track accuracy over time

Technology Stack

Language Models

  • OpenAI: GPT-4 (8K/32K/128K context), GPT-4o, GPT-3.5-turbo (16K context)

  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus

  • Custom: Bring your own model via API (OpenAI-compatible endpoint)

Vector Database

  • Pinecone: Hosted, serverless pods, 1536 dimensions

  • TigrisDB: Alternative for self-hosted deployments

Embedding Model

  • OpenAI ada-002: 1536 dimensions, $0.0001 per 1K tokens

  • Custom embeddings not yet supported

Infrastructure (Cloud Deployment)

  • Frontend: Next.js 14, React 18, deployed to Vercel

  • Backend API: Node.js, Express, deployed to AWS ECS (Fargate)

  • Database: PostgreSQL 15 on AWS RDS (Multi-AZ)

  • File storage: AWS S3 (encrypted buckets)

  • Queue: AWS SQS for async processing

  • Cache: Redis on ElastiCache (query results, embedding cache)

Use Cases

Customer Support

  • Problem: Support agents search 10+ documents per ticket

  • Solution: Agent retrieves answers from knowledge base in <3s

  • Metrics: 40% reduction in ticket resolution time (measured at 5 customers)

  • Problem: Employees can't find onboarding docs, policies, runbooks

  • Solution: Slack bot answers questions from company wiki

  • Metrics: 2000+ queries/month (avg customer), 80% answer accuracy

Sales Q&A

  • Problem: Sales reps need product specs, pricing, competitor info during calls

  • Solution: Browser extension retrieves from sales playbooks

  • Metrics: 15% faster quote generation (measured at 2 customers)

API Documentation

  • Problem: Developers search API docs for endpoint details

  • Solution: Agent indexes OpenAPI specs, retrieves examples

  • Metrics: 60% fewer support tickets about API usage

Next Steps

Quick Start Guide - Create an agent and run your first query

Core Concepts - Understand RAG, embeddings, and retrieval

Last updated