Welcome to Chunkr

Chunkr is a production-ready service for document layout analysis, OCR, and semantic chunking. Convert PDFs, PPTs, Word docs, and images into RAG/LLM-ready chunks with structured output.

Quickstart

Get up and running with your first API request in minutes

Installation

Set up Chunkr locally with Docker and GPU/CPU support

API Reference

Explore the complete API documentation

GitHub

View the open-source repository and contribute

Key Features

Layout Analysis

Automatically detect and segment layout elements including:

Tables with structure preservation
Images and figures
Headers and sections
Lists and captions
Formulas and equations
Text paragraphs

OCR + Bounding Boxes

Optical Character Recognition with precise bounding box coordinates:

Configurable OCR strategies (All or Auto)
High-resolution image processing
Text layer extraction from native PDFs
Support for scanned documents

Structured Output

Get your documents in multiple formats:

HTML with semantic markup
Markdown for documentation
JSON with coordinates and metadata
Configurable per segment type

Vision-Language Model Processing

LLM-powered content enhancement:

Table structure extraction
Image description generation
Content summarization
Semantic chunking for RAG

Open Source vs Cloud API

The open-source version uses community models and is perfect for development and testing. For production workloads with higher accuracy and enterprise reliability, check out the Chunkr Cloud API.

Feature	Open Source	Cloud API
Layout Analysis	Open-source models	Proprietary in-house models
OCR Accuracy	Community OCR engines	Optimized OCR stack
VLM Processing	Basic open VLMs	Enhanced proprietary VLMs
Excel Support	❌	✅ Native parser
Infrastructure	Self-hosted	Fully managed cloud
Support	Discord community	Dedicated support

The open-source release uses the AGPL-3.0 license. For commercial use without AGPL compliance, contact mehul@chunkr.ai.

Document Types Supported

Chunkr processes a wide range of document formats:

PDF - Native and scanned PDFs with full layout analysis
PowerPoint - PPT and PPTX presentations
Word - DOC and DOCX documents
Images - PNG, JPG, TIFF, and other image formats

Excel support is available exclusively in the Cloud API.

Architecture Overview

Chunkr is built with a modern microservices architecture:

Server - FastAPI-based REST API (Rust/Actix-Web)
Task Queue - Redis-backed job processing
Segmentation - YOLO-based layout detection with GPU acceleration
OCR - DocTR (Document Text Recognition) engine
Storage - MinIO for object storage, PostgreSQL for metadata
Web UI - React-based interface for testing and visualization

All services are containerized and orchestrated with Docker Compose for easy deployment.

Use Cases

RAG Pipelines

Extract and chunk documents for retrieval-augmented generation systems

Document Processing

Automate extraction of structured data from unstructured documents

Content Migration

Convert legacy documents to modern formats (HTML, Markdown)

Search Indexing

Extract text and metadata for full-text search engines

Community & Support

Join our community and get help:

Discord - Join the community
GitHub Issues - Report bugs
Email - mehul@chunkr.ai
Schedule a call - Book 30 minutes

Next Steps

Quick Start

Follow our quickstart guide to make your first API request

Installation

Set up Chunkr locally with our installation guide

Explore the API

Dive into the API reference to learn about all available features

Getting Started

Core Concepts

Configuration

Deployment

Guides

Introduction

Welcome to Chunkr

Quickstart

Installation

API Reference

GitHub

Key Features

Open Source vs Cloud API

Document Types Supported

Architecture Overview

Use Cases

RAG Pipelines

Document Processing

Content Migration

Search Indexing

Community & Support

Next Steps

Getting Started

Core Concepts

Configuration

Deployment

Guides

Documentation Index

​Welcome to Chunkr

Quickstart

Installation

API Reference

GitHub

​Key Features

​Open Source vs Cloud API

​Document Types Supported

​Architecture Overview

​Use Cases

RAG Pipelines

Document Processing

Content Migration

Search Indexing

​Community & Support

​Next Steps

Welcome to Chunkr

Key Features

Open Source vs Cloud API

Document Types Supported

Architecture Overview

Use Cases

Community & Support

Next Steps