# Broken Cross-References

## The Problem

Links and references between documents break during ingestion, causing AI to cite non-existent pages or fail to follow related content.

### Symptoms

* ❌ "See section 3.2" - but section not linked
* ❌ Hyperlinks become plain text
* ❌ "Click here" with no actual link
* ❌ Cross-document references lost
* ❌ Cannot navigate related content

### Real-World Example

```
Source HTML documentation:
"For authentication details, see <a href="/docs/auth">Authentication Guide</a>"

After ingestion:
"For authentication details, see Authentication Guide"
→ Link lost, just plain text

AI response:
"See Authentication Guide for details"
→ User: "Where is Authentication Guide?"
→ No way to navigate
```

***

## Deep Technical Analysis

### Link Extraction Failure

**HTML to Text Conversion:**

```
HTML: <a href="/docs/setup">setup instructions</a>

Naive extraction: "setup instructions"
→ Link URL lost

Better extraction:
"setup instructions (/docs/setup)"
→ Preserve URL in text

Or metadata:
{
  text: "setup instructions",
  link: "/docs/setup",
  link_type: "internal"
}
```

**Relative vs Absolute URLs:**

```
Relative: href="/docs/auth"
→ Needs base URL to resolve
→ Without base: Broken link

Absolute: href="https://example.com/docs/auth"
→ Self-contained
→ But: May be external (outside knowledge base)

Must normalize to absolute
```

### Internal Reference Resolution

**Section References:**

```
"See section 3.2 for details"
→ Implicit reference
→ Which document's section 3.2?

Without context:
→ Cannot resolve
→ Link broken

Need: Document structure metadata
```

**Anchor Links:**

```
"<a href="#troubleshooting">Jump to troubleshooting</a>"
→ Same-page anchor
→ Page context lost after chunking

Chunk 5: "Jump to troubleshooting"
→ Where is "troubleshooting" section?
→ In chunk 12 of same document

Need: Intra-document link mapping
```

### Citation Accuracy

**"See Also" Links:**

```
Documentation: "See also: Rate Limiting, Authentication"
→ Related topics listed

After ingestion:
→ Just plain text
→ No links to those topics

AI can mention them:
→ But cannot provide direct access
→ User must search manually
```

**Page Numbers:**

```
PDF: "See page 42 for details"
→ Page numbers lost in text extraction
→ PDF converted to continuous text

"See page 42" meaningless without page structure
→ Need: Map page numbers to chunk IDs
```

### Link Preservation Strategies

**Markdown Format:**

```
Store as Markdown with links:
"For details, see [Authentication Guide](/docs/auth)"

Benefits:
→ Links preserved
→ Can render as HTML
→ AI can cite with link

Metadata:
{
  markdown: "...with [link](url)...",
  links: ["/docs/auth"]
}
```

**Hyperlink Metadata:**

```
Each chunk:
{
  text: "...",
  outbound_links: [
    {url: "/docs/auth", anchor_text: "Authentication Guide"},
    {url: "#section-3", anchor_text: "section 3"}
  ],
  inbound_links: [...]
}

Enables:
→ Link graph construction
→ Related content discovery
```

***

## How to Solve

**Preserve links during extraction (convert to Markdown or metadata) + resolve relative URLs to absolute + extract and store hyperlink metadata with chunks + implement document graph (cross-references) + map PDF page numbers to chunk IDs + include source URLs in AI citations + test link integrity post-ingestion.** See [Link Management](/rag-scenarios-and-solutions/data-quality/broken-links.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.twig.so/rag-scenarios-and-solutions/data-quality/broken-links.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
