Chunkr’s document processing API converts PDFs, PowerPoint presentations, Word documents, and images into structured, RAG-ready chunks with layout analysis, OCR, and semantic processing.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Upload a Document
Send a POST request to The API returns a task object with a
/api/v1/task/parse with your document:task_id for polling.Configuration Options
Segmentation Strategy
Controls how the document is analyzed and segmented.- LayoutAnalysis
- Page
Default strategy - Analyzes document layout and detects different element types:Detects:
- Title, SectionHeader, Text, ListItem
- Table, Picture, Caption
- Formula, Footnote
- PageHeader, PageFooter
OCR Strategy
Controls optical character recognition processing.- All
- Auto
Default - Processes all pages with OCR:
Adds ~0.5 seconds per page latency
High Resolution Processing
Enables high-resolution images for better quality cropping and post-processing:Advanced Examples
Complete Configuration
Python
Updating Task Configuration
You can update a completed task to reprocess with different settings:Task must have status
Succeeded or Failed to be updatedPython
Deleting Tasks
Python
Canceling Tasks
Cancel a task that hasn’t started processing:Python
Task must have status
Starting to be cancelledError Handling
Error Handling Strategy
Control how errors are handled during processing:- Fail
- Continue
Default - Stops processing on any error:
Common Error Responses
| Status Code | Error | Description |
|---|---|---|
| 400 | Bad Request | Invalid configuration or file format |
| 404 | Not Found | Task not found or expired |
| 413 | Payload Too Large | File size exceeds limits |
| 429 | Too Many Requests | Usage limit exceeded |
| 500 | Internal Server Error | Processing failed |
Response Structure
See core/src/routes/task.rs:20-48 for complete response schema.Best Practices
Choose appropriate strategies
Choose appropriate strategies
- Use
LayoutAnalysisfor complex documents with tables, images, and varied layouts - Use
Pagestrategy for simple text-only documents - Use
AutoOCR strategy to optimize speed when documents have good text layers
Optimize for performance
Optimize for performance
- Set
high_resolution: falsefor documents without important images - Use reasonable
target_lengthvalues (512-1024 tokens) - Configure
expires_into automatically clean up old tasks
Handle task lifecycle
Handle task lifecycle
- Poll tasks with exponential backoff to avoid rate limits
- Store task IDs for later retrieval
- Delete tasks when no longer needed to free resources
Next Steps
- Learn about custom chunking strategies
- Configure VLM processing for enhanced content generation
- Review the migration guide for API changes