# Context Assembly Logic Issues

## The Problem

Agent assembles retrieved chunks into context incorrectly—wrong order, missing connections, or redundant information—leading to confused LLM responses.

### Symptoms

* ❌ Context chunks in illogical order
* ❌ Related chunks separated
* ❌ Duplicate information included
* ❌ Missing transitional context
* ❌ LLM confused by disjointed context

### Real-World Example

```
Retrieved chunks (by similarity score):
1. Chunk A: "API authentication uses OAuth 2.0" (score: 0.90)
2. Chunk B: "Rate limit is 1000/hour" (score: 0.85)
3. Chunk C: "OAuth requires client_id and client_secret" (score: 0.82)
4. Chunk D: "Token expiration is 1 hour" (score: 0.80)

Assembled context (score order):
"API authentication uses OAuth 2.0. Rate limit is 1000/hour.
OAuth requires client_id and client_secret. Token expiration is 1 hour."

Problem:
→ OAuth concept split (A, then B unrelated, then C continues OAuth)
→ Disjointed flow
→ LLM struggles to connect

Better assembly (logical grouping):
"API authentication uses OAuth 2.0. OAuth requires client_id and
client_secret. Token expiration is 1 hour. [Separate topic:] Rate
limit is 1000/hour."
```

***

## Deep Technical Analysis

### Assembly Strategies

**Score-Based (Naive):**

```
Sort by similarity score descending:
→ Highest score first
→ Ignores logical flow

Pros:
+ Simple
+ Most relevant first

Cons:
- May fragment related concepts
- No coherence
```

**Topic-Based Clustering:**

```
Group chunks by topic:
1. Cluster chunks (semantic similarity)
2. Order clusters by relevance
3. Within cluster: logical order

Example:
→ Cluster 1: OAuth (chunks A, C, D)
→ Cluster 2: Rate limits (chunk B)

Assembled:
→ All OAuth together, then rate limits

More coherent
```

**Document-Preserving:**

```
Keep chunks from same document together:
→ Doc 1, Chunk 3
→ Doc 1, Chunk 5
→ Doc 1, Chunk 8

Maintains document's narrative flow
Avoids fragmenting explanations
```

### Redundancy Detection

**Semantic Deduplication:**

```
Check similarity between chunks:
→ Chunk A: "Rate limit is 1000/hour"
→ Chunk E: "API allows 1000 requests per hour"

Cosine similarity: 0.94 (very high)
→ Redundant
→ Keep only higher-scored chunk

Reduces context bloat
```

**Extractive Summarization:**

```
If chunks overlap significantly:
→ Extract unique information from each
→ Combine into single summary chunk

Example:
→ Chunk A: "OAuth 2.0 for auth. Use client_id."
→ Chunk C: "OAuth requires client_id and client_secret."

Combined:
"OAuth 2.0 authentication requires client_id and client_secret."

Denser context
```

### Transition Injection

**Topic Boundaries:**

```
Insert transitions between topics:

"[Authentication:]
API uses OAuth 2.0...

[Rate Limiting:]
API enforces 1000 requests/hour..."

Helps LLM understand topic shifts
Clearer structure
```

**Hierarchical Headers:**

```
From document structure:
→ Section: "API Authentication"
  → Subsection: "OAuth 2.0"
    → Content chunks

Preserve headers:
"# API Authentication
## OAuth 2.0
API uses OAuth 2.0..."

Context hierarchy preserved
```

### Context Ordering

**Recency Preference:**

```
If multiple chunks on same topic:
→ Prefer recent over old

Example:
→ Chunk 2022: "Rate limit 100/hour"
→ Chunk 2024: "Rate limit 1000/hour"

Order: 2024 first
→ LLM sees current info first
→ Less likely to cite outdated
```

**Importance Ranking:**

```
Not just similarity, but importance:
→ Core concepts first
→ Details later

Query: "How to authenticate?"
→ First: Overview of OAuth
→ Then: Specific parameters
→ Then: Troubleshooting

Progressive detail
```

***

## How to Solve

**Cluster chunks by topic before assembly + order clusters by relevance + preserve document order within clusters + detect and remove redundant chunks (cosine > 0.90) + inject topic transition markers + maintain document hierarchy (sections, subsections) + prefer recent chunks over outdated + test context assembly quality with LLM eval (coherence score).** See [Context Assembly](/rag-scenarios-and-solutions/agent/context-assembly.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/agent/context-assembly.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
