# Google Drive Connection Issues

## The Problem

Your Google Drive data source fails to connect, sync stops midway, or only some files appear in your knowledge base despite having hundreds of documents.

### Symptoms

* ❌ "Connection Failed" status in Data Sources
* ❌ Only partial document sync (e.g., 50 out of 500 files)
* ❌ "Insufficient permissions" errors
* ❌ Shared drives not appearing
* ❌ Sync works then randomly fails

### Real-World Example

```
Your company has 15 shared drives with 2,000+ documents.
After connecting Google Drive, only 127 documents sync.

Data Source Status: "Partial Sync - Permission Errors"
Error: "Cannot access shared drive: teamdocs@company.com"
```

***

## Deep Technical Analysis

### The Google Drive Permission Hierarchy

Google Drive has a fundamentally complex permission model that's designed for consumer file sharing, not programmatic knowledge extraction:

**Permission Layers:**

```
OAuth Scope Level
→ Controls what API can access (drive.readonly, drive.metadata, etc.)

User Account Level
→ Which user granted the OAuth token
→ Only sees files that specific user has access to

File/Folder Level
→ "Can view", "Can comment", "Can edit"
→ Inherited from parent folders

Shared Drive Level
→ Separate permission model
→ "Member", "Content Manager", "Manager"
→ Requires explicit membership

Organization Level
→ Domain-wide delegation (admin-only)
→ Service account access
```

**The Core Problem:**

When you connect Google Drive with a regular user account (e.g., <john@company.com>), the Twig integration can **only** access files that John has permission to view. This creates:

1. **Invisible Files**: 1,500 files exist in shared drives that John isn't a member of → never synced
2. **Permission Drift**: John leaves team → loses access to "Marketing" shared drive → those files disappear from knowledge base
3. **Personal vs Shared**: Personal files mixed with company docs → potential PII/personal data ingestion

### OAuth Scope vs Actual Access

Google's OAuth model creates a second layer of confusion:

```
OAuth Flow:
1. User clicks "Connect Google Drive"
2. Google shows permission screen:
   "Twig wants to: View and download all your Google Drive files"
3. User clicks "Allow"

What Actually Happens:
→ Twig receives token with scope: drive.readonly
→ Token is tied to user: john@company.com
→ API calls use this token

Reality Check:
Token scope = "can read drive files"
Actual access = "can only read files john@company.com can access"

The token doesn't grant access, it just permits API usage
for files the user already has access to.
```

**Why This Causes Sync Failures:**

Twig attempts to enumerate all files in the organization by:

1. Listing all shared drives
2. For each shared drive, list all folders
3. For each folder, list all files

But at step 1, if John isn't a member of "Engineering" shared drive, the API returns:

```json
{
  "error": {
    "code": 403,
    "message": "The user does not have sufficient permissions for shared drive: 0AHzF..."
  }
}
```

Twig must then decide:

* Skip this shared drive? (lose 300 docs)
* Fail the entire sync? (show error to user)
* Mark as partial sync? (confusing status)

### The Shared Drive Membership Problem

Shared drives (formerly "Team Drives") have their own membership system:

```
Regular Folder Sharing:
→ Permissions inherited from parent
→ Can share individual files
→ Relatively simple hierarchy

Shared Drive:
→ Completely separate permission boundary
→ Must be explicitly added as member
→ Membership required to even SEE the drive exists
```

**Discovery Problem:**

```
Standard API Call: drive.drives.list()

Returns:
[
  { "id": "0AHzF1", "name": "Marketing" },  // John is member
  { "id": "0BXkK2", "name": "Sales" }       // John is member
]

Missing:
- "Engineering" drive (exists, but John not member)
- "Legal" drive (exists, but John not member)
- "Finance" drive (exists, but John not member)

From API perspective: These drives don't exist.
From organization perspective: You're missing 60% of company knowledge.
```

### The Service Account vs User Account Trade-off

Google offers two authentication methods for programmatic access:

**User Account (OAuth):**

```
Pros:
+ Easy to set up (just click "Connect")
+ Respects user's existing permissions
+ No admin involvement needed

Cons:
- Limited to single user's access
- Permission drift as user changes roles
- Can't access shared drives user isn't in
- Token expires/revokes if user leaves company
```

**Service Account (Domain-Wide Delegation):**

```
Pros:
+ Can impersonate any user in organization
+ Access all shared drives
+ Stable, doesn't break when employees leave
+ Better for enterprise knowledge management

Cons:
- Requires Google Workspace admin privileges
- Complex setup (admin console, API scopes, delegation)
- Potential security concern (very broad access)
- Most users can't configure this themselves
```

**The Catch-22:**

Most Twig users aren't Google Workspace admins, so they can't set up service accounts. They use OAuth with their own user account, leading to incomplete knowledge bases.

### File Metadata Propagation Delays

Google Drive's API has eventual consistency for file metadata:

```
Scenario:
1. Document created in shared drive at 10:00 AM
2. Twig runs sync at 10:05 AM
3. Document doesn't appear in API response

Why?
→ Google Drive indexes files asynchronously
→ New files may not appear in list() calls for 5-15 minutes
→ File exists, but not yet in search index
```

**Sync Timing Problem:**

```
Full Sync (1000 files):
→ Takes 3-4 minutes to enumerate all files
→ During sync, user adds 5 new documents
→ These new docs won't appear until NEXT sync
→ But user expects them immediately

Incremental Sync (using changeToken):
→ Query changes since last sync
→ Changes can be delayed 1-15 minutes in appearing
→ User sees "sync completed" but new doc not in agent
→ Appears as "sync failure" from user perspective
```

### API Rate Limiting and Quota Exhaustion

Google Drive API has strict rate limits:

```
Quota Limits (per project, per user):
→ 1,000 queries per 100 seconds per user
→ 10,000 queries per 100 seconds per project

File Enumeration Math:
- List shared drives: 1 request
- List folders in drive (100 folders): 100 requests
- List files in each folder (10 files each): 100 requests
- Download file metadata for each: 1,000 requests
Total: 1,201 requests for 1,000 files

Time Required:
1,201 requests ÷ 10 requests/sec = 120 seconds minimum
```

**The Throttling Cascade:**

```
Large Organization Sync:
→ 5,000 files across 10 shared drives
→ Requires ~6,000 API requests
→ Hits quota limit at request 1,000
→ Google returns 429 (Rate Limit Exceeded)
→ Twig must exponential backoff and retry
→ Sync takes 15-20 minutes instead of 2 minutes
→ User sees "sync hanging" or timeout errors
```

### File Format Conversion Complexity

Google Drive native formats (Docs, Sheets, Slides) require conversion:

```
Google Doc (stored format):
→ Proprietary Google format
→ Not directly downloadable as text
→ Must use export API: files.export()

Export API:
→ Convert to: PDF, DOCX, RTF, HTML, TXT, etc.
→ Each format has trade-offs:
  - PDF: Preserves formatting but hard to parse
  - HTML: Good structure but messy markup
  - TXT: Clean but loses all formatting
```

**The Conversion Problem:**

```
API Call Sequence for Each Google Doc:
1. files.list() → get file ID and metadata
2. files.get(fields='mimeType') → check if native format
3. files.export(mimeType='text/html') → convert to HTML
4. Parse HTML, extract text, preserve some structure
5. Chunk the text for embeddings

For 500 Google Docs:
→ 500 × 3 API calls = 1,500 requests
→ Plus rate limiting
→ Plus conversion time
→ Total: 10-15 minutes
```

**Format Ambiguity:**

```
File: "Product Roadmap"
Type: Google Sheets

Question: How to extract for RAG?
- Export as CSV? (loses formatting, only first sheet)
- Export as Excel? (binary format, need parser)
- Export as PDF? (need OCR-like extraction)
- Export as HTML? (messy tables, lots of markup)

No perfect answer → different choices for different files
```

***

## How to Solve

**Use a service account with domain-wide delegation + add shared drive membership explicitly + implement exponential backoff for rate limits.** See [Google Drive Integration](/product/data-integrations/google-drive.md) for configuration.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/data-integration/google-drive.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
