# Confluence Sync Failing

## The Problem

Your Confluence data source shows "Sync Failed" status, and new pages or updates aren't appearing in your AI agent's knowledge base.

### Symptoms

* ❌ Red "Sync Failed" indicator in Data Sources
* ❌ Last successful sync was days/weeks ago
* ❌ Recent Confluence pages don't appear in AI responses
* ❌ Error messages in data source logs
* ❌ "Processing Failed" notifications

### Real-World Example

```
Your support team updated the product documentation in
Confluence yesterday, but your AI agent is still giving
answers based on the old version from 2 weeks ago.

Data Source Status: "Last sync failed 3 days ago"
Error: "Authentication failed: 401 Unauthorized"
```

## Deep Technical Analysis

## Deep Technical Analysis

### Understanding the Confluence API Authentication Model

Confluence uses a token-based authentication system that's fundamentally different from session-based auth. Here's why it fails:

**Token Lifecycle:**

```
Token Creation (Day 0)
→ Stored in Atlassian's auth system
→ Tied to user account + permissions
→ Has expiration metadata (typically 90 days)
→ Can be revoked at any time

Token Validation on Each API Call:
1. Twig sends: Authorization: Bearer {token}
2. Confluence checks:
   - Is token in valid tokens table?
   - Is associated user account active?
   - Does user still have space permissions?
   - Has token expired?
3. Any "no" → 401 Unauthorized
```

**Why This Architecture Causes Problems:**

**1. Token Expiration is Silent**

Confluence doesn't notify when tokens are about to expire. The system simply:

```
Day 89: Token works fine
Day 90: Token expires at midnight (UTC)
Day 91: First API call → 401 error
     → Sync fails
     → No notification sent to token creator
     → Data becomes stale
```

**2. Permission Changes Aren't Reflected in Token**

Confluence's permission model:

```
Permissions are checked at:
- Space level (can view space?)
- Page level (can view page?)
- Group membership level (in allowed groups?)

Token stores: User ID only, not permissions

Flow:
Admin revokes space access at 10am
→ Token still contains valid user ID
→ Next API call checks permissions in real-time
→ User no longer has access
→ Returns 403 Forbidden
→ But token itself is "valid"
```

This creates a gray area where token authentication succeeds but authorization fails.

**3. Account Lifecycle Events**

```
Events that invalidate tokens:
- User deactivation (immediate)
- Password change (in some Confluence configs)
- Admin-forced logout (immediate)
- User deletion (immediate)
- Org-wide security policy changes (immediate)

What happens:
- Tokens don't have a "check if user still active" flag
- Each API call must query user status
- If user inactive → 401
- If user re-activated → Token doesn't auto-restore
- Must generate new token
```

### Rate Limiting Architecture

**How Confluence Rate Limits Work:**

```
Confluence API uses token bucket algorithm:

Bucket capacity: 400 requests (Pro plan)
Refill rate: 400 requests/minute
Burst allowance: Can use full capacity instantly

Example flow:
Time 0:00 → Bucket: 400/400 (full)
Time 0:01 → 100 requests sent → Bucket: 300/400
Time 0:02 → 200 requests sent → Bucket: 100/400
Time 0:03 → 150 requests sent → Bucket: 0/400 + 50 rejected (429)
Time 1:00 → Bucket refills → 400/400

The 50 rejected requests must be retried
```

**Why Large Spaces Hit Limits:**

Calculating API calls needed:

```
Space with 10,000 pages:
- Initial fetch: 1 call per page = 10,000 calls
- Each page metadata: 1 call = 10,000 calls  
- Each page content: 1 call = 10,000 calls
- Each page attachments list: 1 call = 10,000 calls
Total: 40,000 API calls for full sync

At 400 req/min: 100 minutes minimum
Any retry due to network: Adds more calls
Multiple spaces: Multiplies the problem

If sync runs hourly: Will hit rate limit every time
```

**The Hidden Rate Limit Factor:**

Confluence has two rate limit layers:

```
Layer 1: Per-token (400/min for Pro)
Layer 2: Per-IP (undocumented, but exists)

If your org has multiple integrations hitting Confluence
from same IP → All share the IP-level limit

This causes:
- Unpredictable 429 errors
- Successful syncs suddenly fail
- No visibility into total org API usage
```

### Incremental Sync Complexity

**Why "Only Sync Changes" is Hard:**

Confluence doesn't expose a simple "what changed since timestamp X" API. Instead:

```
To get changes, must:
1. List all pages (get IDs + last modified timestamp)
2. Compare with local cache of timestamps
3. Fetch only pages where timestamp > last sync
4. Check for deleted pages (separate API call)
5. Check for permission changes (not in API)

Problems:
- Step 1 still requires listing 10,000 pages
- Deleted pages don't show in list (orphaned in your DB)
- Renamed pages appear as new (creates duplicates)
- Moved pages change URL (breaks citations)
```

**The Atomicity Problem:**

```
Sync process isn't atomic:
1. Start listing pages → Get 5,000 IDs
2. While processing:
   - User deletes page 2,500
   - User adds new page
3. Complete listing → Get remaining 5,000
4. New page not in list (added between API calls)
5. Deleted page causes 404 on fetch

Result: Incomplete sync, reported as "success"
```

### Workspace Hierarchy Complexity

**Multi-Space Challenges:**

```
Confluence spaces can be:
- Public (anyone in org)
- Restricted (specific groups only)
- Private (specific users only)

Single API token:
- Tied to one user
- User may have access to:
  * Space A: Full access
  * Space B: View only
  * Space C: No access (invisible to API)
  * Space D: Access revoked mid-sync

Problem:
Listing spaces API only returns spaces with access
→ Can't detect when access is removed
→ Old pages stay in knowledge base
→ Stale data problem
```

### Network and Timing Issues

**Confluence Cloud Architecture:**

```
Confluence Cloud uses Atlassian's CDN:
- Requests may hit different edge locations
- Each edge has its own rate limit bucket
- Rate limit state not synchronized across edges

This causes:
Request 1 → Edge A (Sydney) → 200 OK
Request 2 → Edge A → 200 OK  
Request 3 → Edge B (Tokyo) → 429 Rate Limit
           (Edge B doesn't know about Edge A's usage)

Twig sees: Inconsistent 429 errors that don't match our rate calculations
```

**Confluence Server vs Cloud:**

```
Confluence Server (self-hosted):
- Uses different auth (personal access tokens or basic auth)
- Different rate limits (admin-configured)
- May be behind corporate firewall
- May require VPN
- May have custom SSL certificates

Confluence Cloud:
- OAuth or API tokens
- Fixed rate limits
- Public internet access
- Atlassian-managed SSL

Twig must detect which type and configure accordingly
```

## How to Solve

**Regenerate API token with service account + verify permissions still exist + reduce sync frequency to match update cadence.** See [Data Sources Documentation](/product/data-integrations/confluence.md) for configuration details.

## Why This Problem Showcases System Complexity

This isn't just "token expired" - it reveals the architectural complexity of:

1. **Distributed authentication systems** where token validity and user permissions are checked separately
2. **Multi-layered rate limiting** with per-token, per-IP, and per-edge limits
3. **Eventually consistent APIs** where listing and fetching can be out of sync
4. **Permission model synchronization** between Confluence and external systems
5. **Incremental state tracking** without native "changes since X" support

Understanding these nuances is crucial for building reliable integrations with enterprise collaboration platforms.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/data-integration/confluence-sync.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
