Getting Started
Add and configure data sources for ingestion
What is a Data Source
A data source is content that Twig indexes for retrieval. Supported types:
Documentation sites and help centers
File uploads (PDF, DOCX, TXT)
Confluence spaces
Slack channels
Google Drive, SharePoint, OneDrive folders
Zendesk articles
Processing flow: Fetch documents → Parse text → Chunk (512 tokens) → Embed (OpenAI ada-002) → Index (Pinecone)
Navigate to Data Sources
Click Data in left navigation
Or: Home → Data Sources → Manage Data Sources
Expected view: List of existing data sources with status (Active, Processing, Failed)

Data Sources Screen
Columns displayed:
Name: Data source name
Type: WEBSITE, FILE, CONFLUENCE, SLACK, etc.
Status: Active (green), Processing (yellow), Failed (red)
Chunks Indexed: Count (e.g., "1,234 chunks")
Last Sync: Timestamp (e.g., "2 hours ago")
Actions: Process, Edit, Delete buttons
Status meanings:
Active: Ingestion complete, available for retrieval
Processing: Currently chunking/embedding/indexing
Failed: Error during processing (click for error log)

Add a New Data Source
Click Add Data Source button (top right)
Supported Types
Website Sitemap
Sitemap.xml URL
10,000 pages
5-30 min
Website Crawler
Base URL
10,000 pages
10-60 min
File Upload
PDF, DOCX, TXT
50MB per file
1-5 min
Zip Upload
.zip with documents
200MB
5-20 min
Confluence Space
OAuth connection
Unlimited
10-60 min
Slack Workspace
OAuth connection
Last 90 days
10-30 min
Google Drive
OAuth connection
Unlimited
10-60 min
Website Sitemap
Select Website Sitemap from modal
Enter sitemap URL:
https://example.com/sitemap.xmlClick Add
Status changes: "Pending" → "Processing" → "Active"
Expected result: Pages crawled count displayed (e.g., "250 pages → 1,200 chunks")
Common errors:
"Sitemap not found (404)" → Verify URL is accessible
"Rate limit exceeded" → Wait 1 hour, crawler resumes automatically
File Upload
Select File Upload
Click Choose Files or drag-and-drop
Select files: PDF, DOCX, TXT (max 50MB each)
Click Upload
Expected result: Each file shows progress bar → "Processing" → "Active"
Supported formats:
PDF: Text-based (not scanned images)
DOCX: Microsoft Word 2007+
TXT: UTF-8 encoding
Confluence Space
Select Confluence
Click Connect to Confluence
Authorize in Confluence OAuth screen
Select spaces to index (checkboxes)
Click Import
Expected result: Space count and page count displayed during processing
Permissions required: Confluence read access for selected spaces
Zip File
Select Zip Upload
Upload .zip file (max 200MB)
Twig extracts and processes each file
Expected result: Shows file count (e.g., "50 files extracted → 200 chunks indexed")
Constraints:
Zip must contain only supported file types (PDF, DOCX, TXT)
Nested folders supported (files flattened during extraction)

How to Verify
Data Sources list shows status "Active" (green)
Chunks count > 0 (e.g., "450 chunks")
Last sync timestamp recent (e.g., "5 minutes ago")
Playground → Query agent → Check "Sources Used" panel shows chunks from this data source
Common Mistakes
Symptom: Status stuck at "Processing" for >30 minutes
Cause: Processing worker stalled or large dataset
Fix: Refresh page. If still processing after 1 hour, contact support with data source ID.
Symptom: Status "Failed" with error message
Cause: Invalid URL, authentication failure, or unsupported file format
Fix: Click data source name → Logs tab → check error message. Common fixes:
"401 Unauthorized" → Reconnect OAuth (Edit → Reconnect)
"Unsupported format" → Convert file to PDF/DOCX
"URL not accessible" → Verify URL works in browser
When This Doesn't Apply
This guide covers standard data source types. For custom integrations (APIs, databases), contact [email protected].
Last updated

