Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart Guide
This guide will help you make your first API request to Chunkr and process a document. We’ll cover both local deployment and using the Cloud API.This quickstart assumes you have Chunkr running locally. If you haven’t installed it yet, check out the Installation Guide.
Prerequisites
Chunkr is Running
Ensure your Chunkr services are up and running:You should see all services in “Up” state.
LLM Configuration
Make sure you’ve configured at least one LLM in
models.yaml. See the Installation Guide for details.Making Your First Request
Using the Web UI
The easiest way to get started is using the built-in web interface:Configure Processing
Choose your processing options:
- OCR Strategy:
All(process all pages) orAuto(selective) - Segmentation Strategy:
LayoutAnalysis(detailed) orPage(simple) - High Resolution: Enable for better quality (adds ~7s per page)
Using the API
For programmatic access, use the REST API. Here’s how to process a document:For local development without authentication, you can omit the
Authorization header. Authentication is required when deploying to production.Using Base64 Encoded Files
You can also upload files directly as base64:Python
Polling for Results
Document processing is asynchronous. Use the task ID to check status and retrieve results:Understanding the Response
The task response contains rich structured data:Response Fields Explained
Response Fields Explained
- task_id: Unique identifier for tracking this task
- status: Current state (
Starting,Processing,Succeeded,Failed) - output: Array of pages, each containing segments
- segments: Individual layout elements (Title, Text, Table, Picture, etc.)
- content: Generated HTML or Markdown based on configuration
- text: Raw OCR-extracted text
- bbox: Bounding box coordinates (left, top, width, height)
- segment_type: Element type (Title, SectionHeader, Text, ListItem, Table, Picture, Caption, Formula, Footnote, PageHeader, PageFooter)
Configuration Options
OCR Strategy
Controls how OCR is applied:All(default): Process all pages with OCR (~0.5s penalty per page)Auto: Selective OCR only where needed; uses existing text layer when available
Segmentation Strategy
Controls layout analysis:LayoutAnalysis(default): Detect all layout elements with bounding boxesPage: Treat each page as a single segment (faster, less detailed)
Additional Options
Common Use Cases
RAG/LLM Pipeline
Extract chunks for embedding and retrieval:Table Extraction
Extract structured tables with LLM enhancement:Image Description
Generate descriptions for images using VLM:Next Steps
API Reference
Explore all API endpoints and parameters
Configuration
Learn about advanced configuration options
Installation
Deploy Chunkr to production
Examples
See more code examples and use cases
Troubleshooting
Task fails immediately
Task fails immediately
Check that:
- Your LLM is configured in
models.yaml - The file URL is accessible or base64 is valid
- All required services are running (
docker compose ps)
Processing is very slow
Processing is very slow
Consider:
- Using
AutoOCR strategy instead ofAll - Disabling
high_resolutionif not needed - Deploying with GPU support (see Installation)
- Scaling up worker replicas in
compose.yaml
LLM processing fails
LLM processing fails
Verify:
- API key is valid in
models.yaml - LLM endpoint is reachable
- Rate limits aren’t exceeded
- Model supports the OpenAI-compatible format