Quickstart Guide

This guide will help you make your first API request to Chunkr and process a document. We’ll cover both local deployment and using the Cloud API.

This quickstart assumes you have Chunkr running locally. If you haven’t installed it yet, check out the Installation Guide.

Prerequisites

Chunkr is Running

Ensure your Chunkr services are up and running:

docker compose ps

You should see all services in “Up” state.

Access the Services

Verify you can access:

API: http://localhost:8000
Web UI: http://localhost:5173

LLM Configuration

Make sure you’ve configured at least one LLM in models.yaml. See the Installation Guide for details.

Making Your First Request

Using the Web UI

The easiest way to get started is using the built-in web interface:

Open the Web UI

Navigate to http://localhost:5173 in your browser

Upload a Document

Click the upload area and select a PDF, Word doc, PowerPoint, or image file

Configure Processing

Choose your processing options:

OCR Strategy: All (process all pages) or Auto (selective)
Segmentation Strategy: LayoutAnalysis (detailed) or Page (simple)
High Resolution: Enable for better quality (adds ~7s per page)

View Results

Watch your document process in real-time and explore the structured output

Using the API

For programmatic access, use the REST API. Here’s how to process a document:

curl -X POST http://localhost:8000/api/v1/task/parse \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "file": "https://example.com/document.pdf",
    "ocr_strategy": "Auto",
    "segmentation_strategy": "LayoutAnalysis",
    "high_resolution": true
  }'

For local development without authentication, you can omit the Authorization header. Authentication is required when deploying to production.

Using Base64 Encoded Files

You can also upload files directly as base64:

Python

import base64
import requests

# Read and encode file
with open('document.pdf', 'rb') as f:
    file_data = base64.b64encode(f.read()).decode('utf-8')
    file_base64 = f"data:application/pdf;base64,{file_data}"

# Create task with base64 file
response = requests.post(
    "http://localhost:8000/api/v1/task/parse",
    json={
        "file": file_base64,
        "file_name": "document.pdf",
        "ocr_strategy": "Auto",
        "segmentation_strategy": "LayoutAnalysis"
    }
)

task = response.json()

Polling for Results

Document processing is asynchronous. Use the task ID to check status and retrieve results:

curl http://localhost:8000/api/v1/task/{task_id} \
  -H "Authorization: Bearer YOUR_API_KEY"

Understanding the Response

The task response contains rich structured data:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "Succeeded",
  "created_at": "2026-03-02T10:00:00Z",
  "finished_at": "2026-03-02T10:00:15Z",
  "file_name": "document.pdf",
  "page_count": 5,
  "output": [
    {
      "page_number": 1,
      "segments": [
        {
          "segment_id": "seg_001",
          "segment_type": "Title",
          "content": "<h1>Document Title</h1>",
          "text": "Document Title",
          "bbox": {
            "left": 100,
            "top": 50,
            "width": 400,
            "height": 60
          },
          "confidence": 0.98
        },
        {
          "segment_id": "seg_002",
          "segment_type": "Text",
          "content": "<p>This is the document content...</p>",
          "text": "This is the document content...",
          "bbox": {...}
        }
      ]
    }
  ]
}

Response Fields Explained

task_id: Unique identifier for tracking this task
status: Current state (Starting, Processing, Succeeded, Failed)
output: Array of pages, each containing segments
segments: Individual layout elements (Title, Text, Table, Picture, etc.)
content: Generated HTML or Markdown based on configuration
text: Raw OCR-extracted text
bbox: Bounding box coordinates (left, top, width, height)
segment_type: Element type (Title, SectionHeader, Text, ListItem, Table, Picture, Caption, Formula, Footnote, PageHeader, PageFooter)

Configuration Options

OCR Strategy

Controls how OCR is applied:

All (default): Process all pages with OCR (~0.5s penalty per page)
Auto: Selective OCR only where needed; uses existing text layer when available

Segmentation Strategy

Controls layout analysis:

LayoutAnalysis (default): Detect all layout elements with bounding boxes
Page: Treat each page as a single segment (faster, less detailed)

Additional Options

{
  "high_resolution": true,        // Use high-res images (~7s per page)
  "expires_in": 3600,             // Task expiration in seconds
  "error_handling": "Fail",       // "Fail" or "Continue" on errors
  "chunk_processing": {           // Configure semantic chunking
    "target_length": 512
  },
  "segment_processing": {         // Per-segment format configuration
    "table": {
      "format": "Markdown",
      "strategy": "LLM"           // Use LLM for table extraction
    },
    "picture": {
      "format": "Html",
      "strategy": "LLM"           // Generate image descriptions
    }
  }
}

High-resolution processing significantly improves quality but adds ~7 seconds per page. Use it for documents requiring precise extraction.

Common Use Cases

RAG/LLM Pipeline

Extract chunks for embedding and retrieval:

# Process with semantic chunking
response = requests.post(
    "http://localhost:8000/api/v1/task/parse",
    json={
        "file": file_url,
        "chunk_processing": {
            "target_length": 512
        },
        "segment_processing": {
            "text": {"format": "Markdown"}
        }
    }
)

# Extract chunks for embedding
task = wait_for_task(response.json()['task_id'])
for page in task['output']:
    for segment in page['segments']:
        # Embed segment['content'] or segment['text']
        pass

Table Extraction

Extract structured tables with LLM enhancement:

response = requests.post(
    "http://localhost:8000/api/v1/task/parse",
    json={
        "file": file_url,
        "segment_processing": {
            "table": {
                "format": "Markdown",
                "strategy": "LLM"  # AI-enhanced structure
            }
        }
    }
)

Image Description

Generate descriptions for images using VLM:

response = requests.post(
    "http://localhost:8000/api/v1/task/parse",
    json={
        "file": file_url,
        "segment_processing": {
            "picture": {
                "format": "Html",
                "strategy": "LLM"  # Generate descriptions
            }
        }
    }
)

Next Steps

API Reference

Explore all API endpoints and parameters

Configuration

Learn about advanced configuration options

Installation

Deploy Chunkr to production

Examples

See more code examples and use cases

Troubleshooting

Task fails immediately

Check that:

Your LLM is configured in models.yaml
The file URL is accessible or base64 is valid
All required services are running (docker compose ps)

Processing is very slow

Consider:

Using Auto OCR strategy instead of All
Disabling high_resolution if not needed
Deploying with GPU support (see Installation)
Scaling up worker replicas in compose.yaml

LLM processing fails

Verify:

API key is valid in models.yaml
LLM endpoint is reachable
Rate limits aren’t exceeded
Model supports the OpenAI-compatible format

Getting Started

Core Concepts

Configuration

Deployment

Guides

Quickstart

Quickstart Guide

Prerequisites

Making Your First Request

Using the Web UI

Using the API

Using Base64 Encoded Files

Polling for Results

Understanding the Response

Configuration Options

OCR Strategy

Segmentation Strategy

Additional Options

Common Use Cases

RAG/LLM Pipeline

Table Extraction

Image Description

Next Steps

API Reference

Configuration

Installation

Examples

Troubleshooting

Getting Started

Core Concepts

Configuration

Deployment

Guides

Documentation Index

​Quickstart Guide

​Prerequisites

​Making Your First Request

​Using the Web UI

​Using the API

​Using Base64 Encoded Files

​Polling for Results

​Understanding the Response

​Configuration Options

​OCR Strategy

​Segmentation Strategy

​Additional Options

​Common Use Cases

​RAG/LLM Pipeline

​Table Extraction

​Image Description

​Next Steps

API Reference

Configuration

Installation

Examples

​Troubleshooting

Quickstart Guide

Prerequisites

Making Your First Request

Using the Web UI

Using the API

Using Base64 Encoded Files

Polling for Results

Understanding the Response

Configuration Options

OCR Strategy

Segmentation Strategy

Additional Options

Common Use Cases

RAG/LLM Pipeline

Table Extraction

Image Description

Next Steps

Troubleshooting