Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Chunkr integrates Vision-Language Models (VLMs) to provide enhanced content generation, intelligent table processing, formula extraction, and custom segment analysis using visual understanding.

Overview

VLM processing in Chunkr allows you to:
  • Generate enhanced content using fine-tuned models (strategy: LLM)
  • Add custom LLM-powered analysis to any segment type
  • Process complex tables with better structure understanding
  • Extract mathematical formulas as LaTeX
  • Generate custom descriptions for images and diagrams

Generation Strategies

Auto Strategy

Uses heuristic-based generation - fast and efficient:
{
  "segment_processing": {
    "Text": {
      "strategy": "Auto"
    }
  }
}
Best for: Standard text, lists, simple formatting

LLM Strategy

Uses Chunkr’s fine-tuned Vision-Language Models:
{
  "segment_processing": {
    "Table": {
      "strategy": "LLM"
    }
  }
}
Best for: Tables, formulas, pictures, complex layouts
LLM strategy uses the page image as context to generate more accurate structured content

Default VLM Configuration

Some segment types use LLM strategy by default:
Segment TypeDefault StrategyReason
TableLLMBetter structure understanding
FormulaLLMLaTeX extraction from images
PictureLLMVisual content description
PageLLMFull page understanding

Custom LLM Prompts

Add custom LLM-powered analysis to any segment using the llm field:
Python
import requests

response = requests.post(
    "https://api.chunkr.ai/api/v1/task/parse",
    headers={"Authorization": "YOUR_API_KEY"},
    json={
        "file": "https://example.com/document.pdf",
        "segment_processing": {
            "Picture": {
                "format": "Markdown",
                "strategy": "Auto",
                "llm": "Describe this image in detail, focusing on key visual elements and any text present."
            },
            "Table": {
                "format": "Html",
                "strategy": "LLM",
                "llm": "Extract the data and create a summary of key insights from this table."
            },
            "Text": {
                "format": "Markdown",
                "strategy": "Auto",
                "llm": "Summarize this section in 2-3 sentences."
            }
        }
    }
)
The LLM output is stored in segment.llm and can be included in chunks via embed_sources.

Extended Context

Use the full page image as context for LLM generation:
{
  "segment_processing": {
    "Table": {
      "format": "Html",
      "strategy": "LLM",
      "extended_context": true
    }
  }
}
Default: false
Extended context provides better results but increases processing time and token usage

Configuring LLM Models

Chunkr supports global LLM configuration to control which models are used for VLM processing.

LLM Processing Configuration

Python
{
    "llm_processing": {
        "model_id": "gpt-4o",
        "fallback_strategy": "Default",
        "max_completion_tokens": 2048,
        "temperature": 0.0
    }
}
{
  "llm_processing": {
    "model_id": "gpt-4o"
  }
}
Specify which model to use. Check documentation for available models.

Image Cropping

Control when segment images are cropped and stored:
{
  "crop_image": "Auto"
}
Default for most segments - Only crop when needed for post-processing
Cropped images are available in segment.image as presigned URLs.

Complete VLM Example

Python
import requests
import time

# Advanced VLM configuration
config = {
    "file": "https://example.com/research-paper.pdf",
    "segmentation_strategy": "LayoutAnalysis",
    "ocr_strategy": "All",
    "high_resolution": true,  # Better image quality for VLM
    "llm_processing": {
        "model_id": "gpt-4o",
        "fallback_strategy": "Default",
        "temperature": 0.0
    },
    "segment_processing": {
        "Table": {
            "format": "Html",
            "strategy": "LLM",
            "crop_image": "All",
            "extended_context": true,
            "llm": "Convert this table to structured HTML and provide a brief summary of the data.",
            "embed_sources": ["Content", "LLM"]
        },
        "Picture": {
            "format": "Markdown",
            "strategy": "Auto",
            "crop_image": "All",
            "llm": "Describe this figure, including any charts, graphs, or diagrams. Explain what it illustrates.",
            "embed_sources": ["Content", "LLM"]
        },
        "Formula": {
            "format": "Markdown",
            "strategy": "LLM",
            "crop_image": "All",
            "llm": "Extract this formula as LaTeX and explain what it represents.",
            "embed_sources": ["Content", "LLM"]
        },
        "Text": {
            "format": "Markdown",
            "strategy": "Auto",
            "llm": "Provide a concise summary of this section.",
            "embed_sources": ["Content"]
        }
    }
}

# Create task
response = requests.post(
    "https://api.chunkr.ai/api/v1/task/parse",
    headers={"Authorization": "YOUR_API_KEY"},
    json=config
)
task = response.json()
task_id = task["task_id"]

# Poll for completion
while True:
    response = requests.get(
        f"https://api.chunkr.ai/api/v1/task/{task_id}",
        headers={"Authorization": "YOUR_API_KEY"}
    )
    task = response.json()
    
    if task["status"] == "Succeeded":
        break
    elif task["status"] == "Failed":
        raise Exception("Task failed")
    
    time.sleep(2)

# Access VLM-processed content
for chunk in task["output"]["chunks"]:
    for segment in chunk["segments"]:
        print(f"\nSegment Type: {segment['segment_type']}")
        print(f"Content: {segment['content'][:200]}...")
        
        # Access custom LLM output if configured
        if segment.get('llm'):
            print(f"LLM Analysis: {segment['llm'][:200]}...")
        
        # Access cropped image if available
        if segment.get('image'):
            print(f"Image URL: {segment['image']}")

Accessing VLM Output

VLM-generated content is available in multiple fields:
Python
for segment in chunk["segments"]:
    # Primary content (based on format and strategy)
    content = segment["content"]  # HTML or Markdown
    
    # Original OCR text
    text = segment["text"]
    
    # Custom LLM output (if configured)
    if "llm" in segment and segment["llm"]:
        llm_analysis = segment["llm"]
    
    # Specific format outputs (backward compatibility)
    html = segment["html"]
    markdown = segment["markdown"]
    
    # Cropped image URL (if available)
    if "image" in segment and segment["image"]:
        image_url = segment["image"]

Embed Sources with VLM

Control which VLM outputs are included in chunk embeddings:
Python
{
    "segment_processing": {
        "Table": {
            "format": "Html",
            "strategy": "LLM",
            "llm": "Summarize key data points from this table.",
            "embed_sources": ["Content", "LLM"]
        }
    }
}
Result: The chunk’s embed field will contain:
  1. The HTML table structure (Content)
  2. The LLM summary (LLM)
Order matters! Sources appear in the embed field in the order specified.

Use Cases

{
    "Table": {
        "format": "Html",
        "strategy": "LLM",
        "extended_context": true,
        "embed_sources": ["Content"]
    }
}
VLM provides better table structure understanding, especially for complex or merged cells.
{
    "Formula": {
        "format": "Markdown",
        "strategy": "LLM",
        "crop_image": "All",
        "llm": "Extract this formula as LaTeX.",
        "embed_sources": ["Content", "LLM"]
    }
}
Convert formula images to LaTeX for better searchability and rendering.
{
    "Picture": {
        "format": "Markdown",
        "strategy": "Auto",
        "crop_image": "All",
        "llm": "Describe this image in detail for a text-based search system.",
        "embed_sources": ["LLM"]
    }
}
Make images searchable by generating detailed text descriptions.
{
    "Text": {
        "format": "Markdown",
        "strategy": "Auto",
        "llm": "Summarize this section in 2-3 sentences, highlighting key points.",
        "embed_sources": ["LLM", "Content"]
    }
}
Provide summaries alongside full text for multi-level retrieval.

Performance Considerations

VLM processing increases latency and cost. Use strategically for best results.
Tips:
  • Use strategy: LLM only for complex content (tables, formulas, pictures)
  • Use strategy: Auto for simple text segments
  • Set extended_context: false unless you need full page context
  • Configure max_completion_tokens appropriately to control costs
  • Use temperature: 0.0 for consistent, deterministic output

Error Handling with VLM

Control how VLM errors are handled:
{
  "error_handling": "Continue"
}
Options:
  • Fail: Stop processing on any error (default)
  • Continue: Continue processing despite LLM refusals or failures
Use Continue for fault-tolerant processing when some VLM failures are acceptable

Best Practices

  1. Use VLM strategically - Only apply to segments that benefit from visual understanding
  2. Write clear prompts - Be specific about what you want in custom llm prompts
  3. Enable high resolution - Set high_resolution: true for better VLM input quality
  4. Test and iterate - Experiment with different prompts and configurations
  5. Monitor costs - VLM processing can be expensive at scale
  6. Choose appropriate models - Different models have different strengths and costs

Next Steps