# Nested Lists Broken

## The Problem

Multi-level nested lists lose their hierarchical structure during chunking, making step-by-step procedures and hierarchical information incomprehensible.

### Symptoms

* ❌ Sub-items separated from parents
* ❌ Indentation levels lost
* ❌ Numbered lists restart incorrectly
* ❌ Cannot determine item relationships
* ❌ Multi-step procedures broken

### Real-World Example

```
Original nested list:

1. Configure API access
   a. Generate API key in dashboard
   b. Store key securely
      i. Use environment variables
      ii. Never commit to git
   c. Test connection
2. Set up webhooks
   - Create webhook endpoint
   - Configure URL in settings
      - Use HTTPS only
      - Add authentication header

Chunk boundary here ↓

Chunk 1:
1. Configure API access
   a. Generate API key in dashboard
   b. Store key securely
      i. Use environment variables

Chunk 2:
      ii. Never commit to git
   c. Test connection
2. Set up webhooks
   - Create webhook endpoint

Lost: "ii" disconnected from parent "b", context unclear
```

***

## Deep Technical Analysis

### List Hierarchy Representation

Nested lists have parent-child relationships:

**Hierarchical Structure:**

```
Level 0: Main items (1, 2, 3)
  Level 1: Sub-items (a, b, c)
    Level 2: Sub-sub-items (i, ii, iii)
      Level 3: Sub-sub-sub-items (α, β, γ)

Relationships:
→ 1.b.ii belongs to 1.b which belongs to 1
→ Cannot understand "ii" without knowing parent context
```

**The Context Loss Problem:**

```
Chunk contains only:
"ii. Never commit to git"

Without parent context:
→ What does "ii" refer to?
→ "Never commit to git" - commit what?
→ Why is this important?

Full context needed:
"1. Configure API access > b. Store key securely > ii. Never commit to git"

Hierarchy provides meaning
```

### Markdown Indentation Detection

Nested lists use indentation:

**Format Variations:**

```markdown
2-space indent:
- Parent
  - Child
    - Grandchild

4-space indent:
- Parent
    - Child
        - Grandchild

Tab indent:
- Parent
	- Child
		- Grandchild
```

**Detection Challenges:**

```
Mixed indentation:
- Parent (0 spaces)
  - Child (2 spaces)
      - Grandchild (6 spaces) ← Inconsistent jump

Or:
- Parent
   - Child (3 spaces) ← Not multiple of 2
      - Grandchild (6 spaces)

Parser must:
→ Detect indent size (2 vs 4 vs tab)
→ Handle inconsistencies
→ Infer hierarchy from relative indents
```

**The Whitespace Ambiguity:**

```
Spaces vs tabs:
- Parent (space-indented)
	- Child (tab-indented)
  - Another child (space-indented)

Is "Another child" at same level as "Child"?
→ Depends on tab width (4 or 8 spaces?)
→ Visual appearance may differ from structure
```

### Numbered List State

Numbered lists have sequential state:

**Numbering Types:**

```
1. First
2. Second
3. Third

vs.

1. First
1. Second  
1. Third  ← Auto-renumbered by renderer

vs.

a. First
b. Second
c. Third

vs.

i. First
ii. Second
iii. Third
```

**State Tracking:**

```
Parser must track:
→ Current number/letter
→ Numbering scheme (1, a, i, I, A)
→ Nesting level
→ Reset points (when does numbering restart?)

Chunk boundary issues:
Chunk 1 ends at: "2. Second item"
Chunk 2 starts at: "3. Third item"

Chunk 2 isolated:
→ "Why does it start at 3?"
→ Missing items 1 and 2
→ Incomplete sequence
```

### Mixed List Types

Lists can mix ordered and unordered:

**Hybrid Structure:**

```markdown
1. Main step
   - Detail point
   - Another detail
2. Next main step
   a. Sub-step A
   b. Sub-step B
      - Implementation note
      - Another note
3. Final step
```

**Type Transitions:**

```
Complexity:
→ Ordered (1, 2, 3)
  → Unordered (-, -)
→ Ordered (a, b)
  → Unordered (-, -)

Each transition must be tracked:
→ What type is parent?
→ What type is current level?
→ Numbering vs bullets

Chunking must preserve:
"Step 2 > Sub-step b > Implementation note (bullet)"
```

### Semantic Meaning of Lists

Different lists serve different purposes:

**Procedural Steps:**

```markdown
1. Start the server
2. Configure settings
3. Test connection
```

**Semantic requirement:** → ORDER MATTERS → Must do step 1 before step 2 → Cannot rearrange

If chunked separately: → Chunk with step 2 is incomplete → User doesn't know prerequisites → Procedure fails

**Feature Lists:**

```markdown
Features:
- Fast performance
- Easy to use
- Highly scalable
```

**Semantic requirement:** → Order doesn't matter → Each item independent → Can present in any order

Chunking impact: → Less critical if split → But: Loses grouping under "Features"

**Decision Trees:**

```markdown
If error occurs:
1. Check API key
   - Valid? Proceed to step 2
   - Invalid? Generate new key
2. Check rate limits
   - Within limits? Proceed to step 3
   - Exceeded? Wait and retry
3. Check server status
```

**Semantic requirement:** → Conditional logic → Decision branches → Cannot isolate one branch

Chunking breaks: → Decision tree structure → Conditional relationships → User can't follow logic

### Multi-Paragraph List Items

List items can contain multiple paragraphs:

**Complex Item Structure:**

```markdown
1. First step: Do something important

   This requires careful attention. Make sure you:
   - Check all prerequisites
   - Verify permissions
   - Back up data

   Once complete, move to step 2.

2. Second step: Do the next thing
```

**Chunking Challenge:**

```
How to keep together:
→ Item number/bullet
→ First paragraph
→ Sub-lists within item
→ Additional paragraphs
→ All belong to same list item

Naive chunker sees:
→ Line: "1. First step: Do something important"
→ Blank line
→ Paragraph: "This requires careful attention..."
→ Sub-list: "- Check all prerequisites"

May think these are separate blocks:
→ List item
→ Regular paragraph (not in list)
→ New list (not a sub-list)

Structure lost
```

### List Continuation After Interruption

Lists can resume after other content:

**Interrupted Lists:**

```markdown
1. First item
2. Second item

> Note: Pay attention to the following steps

3. Third item (continues list!)
4. Fourth item
```

**Parser Challenge:**

```
Is "3. Third item" a new list or continuation?

Markdown interpretation:
→ Blank line + blockquote = list interruption
→ "3. Third item" continues original list

HTML output:
<ol>
  <li>First item</li>
  <li>Second item</li>
</ol>
<blockquote>Note...</blockquote>
<ol start="3">
  <li>Third item</li>
  <li>Fourth item</li>
</ol>

Chunking implications:
→ Items 1-2 in one chunk
→ Items 3-4 in another chunk
→ Must track: This is list continuation, not new list
→ "start=3" attribute matters
```

### Nested List Flattening Strategies

Converting hierarchical lists to linear text:

**Strategy 1: Flatten with Prefixes:**

```
Original:
1. Config
   a. API key
   b. Endpoint
2. Deploy

Flattened:
"1. Config - a. API key"
"1. Config - b. Endpoint"  
"2. Deploy"

Pros: Preserves context
Cons: Repetitive, verbose
```

**Strategy 2: Breadcrumb Style:**

```
"1. Config > a. API key"
"1. Config > b. Endpoint"
"2. Deploy"

Pros: Clear hierarchy, compact
Cons: Non-standard notation
```

**Strategy 3: Natural Language:**

```
"Step 1 (Config): Sub-step a (API key)"
"Step 1 (Config): Sub-step b (Endpoint)"
"Step 2: Deploy"

Pros: Readable, explicit
Cons: Very verbose, more tokens
```

**Strategy 4: Indented Text:**

```
"1. Config
     a. API key
     b. Endpoint
2. Deploy"

Pros: Preserves visual structure
Cons: Indentation may not render correctly in all contexts
```

***

## How to Solve

**Implement indent-level tracking + keep list items with all their children together + flatten nested lists with breadcrumb notation (parent > child) + preserve numbering continuity + add semantic labels ("Step 1a", "Sub-item").** See [List Structure Preservation](/rag-scenarios-and-solutions/chunking/nested-lists.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/chunking/nested-lists.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
