Footnotes and References Lost

The Problem

Footnotes, endnotes, and citations are separated from their reference markers during chunking, losing critical context and academic/legal citations.

Symptoms

  • ❌ Reference markers ([1], *, †) appear without footnotes

  • ❌ Footnotes separated from main text

  • ❌ "See note 5" but note 5 not in chunk

  • ❌ Academic citations incomplete

  • ❌ Legal references missing source

Real-World Example

Original document:

The API rate limit¹ is 1000 requests per hour for free users².

────────────────────────
¹ Rate limits reset at midnight UTC
² Enterprise plans have higher limits

Chunk boundary falls between text and footnotes ↓

Chunk 1:
"The API rate limit¹ is 1000 requests per hour for free users²."

Chunk 2:
"¹ Rate limits reset at midnight UTC
² Enterprise plans have higher limits"

User sees Chunk 1: "What does ¹ mean?"
AI cannot resolve reference (footnote in different chunk)

Deep Technical Analysis

Footnote Types and Formats

Different notation systems:

Numbering Systems:

Placement Variations:

Detection Challenges:

Marker-to-Note Matching

Associating references with definitions:

Matching Algorithm:

The Multiple Reference Problem:

Cross-Chapter Footnotes:

Academic Citations

Scholarly documents use formal citations:

Citation Formats:

Inline vs Bibliography:

Citation Clustering:

Legal documents have specific citation requirements:

Legal Citation Format:

The String Citation:

Abbreviated Citations:

Footnote Content Length

Footnotes vary from brief to extensive:

Short Footnotes:

Long Footnotes:

The Inclusion Decision:

Inline Notes vs Margin Notes

Different annotation styles:

Inline Parenthetical:

Margin Notes:

Reference Loops and Nested Notes

Complex referencing structures:

Footnote Referencing Footnote:

Circular References:

Embedding and Retrieval Impact

Footnotes affect semantic search:

Footnote Content in Embeddings:

Citation Noise:


How to Solve

Detect footnote markers (superscripts, brackets) + match to footnote definitions at page/section end + inline short footnotes (<50 words) directly + link long footnotes as metadata + resolve "Id." and abbreviated citations + strip citation brackets from embeddings but store separately. See Footnote Handling.

Last updated