Skip to main content
Version: 2.22

Preprocessors

Module haystack_experimental.components.preprocessors.md_header_level_inferrer

MarkdownHeaderLevelInferrer

Infers and rewrites header levels in Markdown text to normalize hierarchy.

First header → Always becomes level 1 (#) Subsequent headers → Level increases if no content between headers, stays same if content exists Maximum level → Capped at 6 (######)

Usage example

python
from haystack import Document
from haystack_experimental.components.preprocessors import MarkdownHeaderLevelInferrer

# Create a document with uniform header levels
text = "## Title
## Subheader
Section
## Subheader
More Content"
doc = Document(content=text)

# Initialize the inferrer and process the document
inferrer = MarkdownHeaderLevelInferrer()
result = inferrer.run([doc])

# The headers are now normalized with proper hierarchy
print(result["documents"][0].content)
> # Title
## Subheader
Section
## Subheader
More Content

MarkdownHeaderLevelInferrer.__init__

python
def __init__()

Initializes the MarkdownHeaderLevelInferrer.

MarkdownHeaderLevelInferrer.run

python
@component.output_types(documents=list[Document])
def run(documents: list[Document]) -> dict

Infers and rewrites the header levels in the content for documents that use uniform header levels.

Arguments:

  • documents: list of Document objects to process.

Returns:

dict: a dictionary with the key 'documents' containing the processed Document objects.