Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Installation Guide

Chunkr runs as a collection of Docker services orchestrated with Docker Compose. This guide covers installation for GPU-accelerated deployments, CPU-only systems, and Mac ARM devices.

Prerequisites

1

Install Docker

Install Docker Desktop or Docker Engine:Verify installation:
docker --version
docker compose version
2

Install NVIDIA Container Toolkit (GPU Only)

For GPU acceleration, install the NVIDIA Container Toolkit:
Skip this step if you’re using CPU-only or Mac ARM deployment.
# Add the NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker
Full installation guide
3

Verify GPU Access (GPU Only)

Test that Docker can access your GPU:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
You should see your GPU information.

Quick Installation

1

Clone the Repository

git clone https://github.com/lumina-ai-inc/chunkr.git
cd chunkr
2

Set Up Environment

# Copy environment template
cp .env.example .env

# Copy LLM models template
cp models.example.yaml models.yaml
3

Configure LLM Models

Edit models.yaml with your LLM configuration. See LLM Configuration below.
4

Start Services

# Recommended: Uses NVIDIA GPUs for faster processing
docker compose up -d
First startup downloads several GB of models and may take 10-15 minutes.
5

Verify Installation

Check that all services are running:
docker compose ps
All services should show “Up” status. Access:

LLM Configuration

Chunkr requires at least one LLM for vision-language model processing. You can configure multiple models with fallbacks. The models.yaml file supports multiple LLM providers with advanced options:
models.yaml
models:
  # OpenAI Configuration
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-your-openai-key-here"
    default: true
    rate-limit: 200  # requests per minute (optional)

  # Google AI Studio Configuration
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "your-google-ai-key-here"
    fallback: true

  # OpenRouter Configuration
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "your-openrouter-key-here"

  # Self-hosted LLM (Ollama, vLLM, etc.)
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:11434/v1/chat/completions
    api_key: ""  # Leave empty if not required
  • Exactly one model must have default: true
  • Exactly one model must have fallback: true (can be the same as default)
  • Use id to reference models in API requests
  • rate-limit is optional and sets requests per minute cap

Using Environment Variables (Basic)

For simple single-LLM setups, use environment variables in .env:
.env
LLM__KEY=sk-your-api-key-here
LLM__MODEL=gpt-4o
LLM__URL=https://api.openai.com/v1/chat/completions
Environment variables are overridden by models.yaml. If you use models.yaml, remove or comment out the LLM__* variables.

Common LLM Providers

- id: gpt-4o
  model: gpt-4o
  provider_url: https://api.openai.com/v1/chat/completions
  api_key: "sk-your-key-here"
  default: true
Get API Key | Documentation

Service Architecture

Chunkr consists of multiple containerized services:
  • server: Main API server (Rust/Actix-Web) on port 8000
  • task: Background worker pool (30 replicas for GPU, 10 for CPU)
  • web: React-based UI on port 5173
  • postgres: Database for metadata and task state
  • redis: Queue and cache for job processing
  • minio: S3-compatible object storage for files
  • segmentation: YOLO-based layout detection (6 replicas)
    • GPU: Uses NVIDIA GPU acceleration
    • CPU: Optimized for multi-core processing
  • ocr: DocTR OCR engine (3 replicas)
    • GPU: CUDA-accelerated inference
    • CPU: Uses smaller model variant
  • keycloak: Authentication and user management (port 8080)
  • adminer: Database admin UI (port 8082)
  • nginx: Load balancer for processing services

Port Mappings

ServicePortDescription
Web UI5173React application
API8000REST API endpoint
Segmentation8001Layout detection service
OCR8002Text recognition service
Keycloak8080Authentication
Adminer8082Database UI
PostgreSQL5432Database
Redis6379Cache/Queue
MinIO9000Object storage
MinIO Console9001Storage admin UI

GPU vs CPU Performance

Performance comparison for a typical 10-page PDF:
ConfigurationProcessing TimeHardware Requirements
GPU~20-30 secondsNVIDIA GPU with 8GB+ VRAM
CPU~60-120 seconds8+ CPU cores, 16GB+ RAM
Mac ARM~45-90 secondsM1/M2/M3 with 16GB+ RAM
GPU acceleration provides 3-4x speedup for segmentation and OCR operations.

Scaling Configuration

Adjusting Worker Replicas

Edit compose.yaml to scale processing:
compose.yaml
services:
  task:
    deploy:
      replicas: 30  # Reduce for less memory usage
  
  segmentation-backend:
    deploy:
      replicas: 6   # Scale based on GPU count
  
  ocr-backend:
    deploy:
      replicas: 3   # Scale based on available resources

Resource Limits

For production, add resource constraints:
services:
  task:
    deploy:
      replicas: 30
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G

Stopping and Managing Services

# GPU deployment
docker compose down

# CPU deployment
docker compose -f compose.yaml -f compose.cpu.yaml down

# Mac ARM deployment
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml down

Troubleshooting

Check Docker daemon:
sudo systemctl status docker
View startup errors:
docker compose logs
Common issues:
  • Port conflicts (8000, 5173, etc. already in use)
  • Insufficient memory (requires 16GB+ for full stack)
  • Missing .env or models.yaml files
Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
Check NVIDIA Container Toolkit:
nvidia-ctk --version
Restart Docker after toolkit install:
sudo systemctl restart docker
Reduce worker replicas in compose.yaml:
task:
  deploy:
    replicas: 10  # Down from 30
Use CPU deployment if GPU memory is limited:
docker compose -f compose.yaml -f compose.cpu.yaml up -d
Monitor resource usage:
docker stats
Test LLM endpoint manually:
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"test"}]}'
Check models.yaml syntax:
# Validate YAML
python -c "import yaml; yaml.safe_load(open('models.yaml'))"
View server logs:
docker compose logs -f server
Ensure using Mac compose override:
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml up -d
Reduce concurrent tasks:
  • Decrease replicas for task, segmentation-backend, ocr-backend
  • Process documents sequentially instead of parallel
Allocate more resources to Docker Desktop:
  • Open Docker Desktop → Settings → Resources
  • Increase CPUs to 8+ and Memory to 16GB+

Production Deployment

The default configuration is designed for development. For production:
  1. Enable authentication: Configure Keycloak properly
  2. Use HTTPS: Set up reverse proxy with SSL/TLS
  3. Secure secrets: Use Docker secrets or environment encryption
  4. Configure backups: Back up PostgreSQL and MinIO data
  5. Monitor resources: Set up alerts for CPU, memory, disk usage
  6. Rate limiting: Configure per-model rate limits in models.yaml
  7. Task expiration: Set appropriate expires_in values

Environment Variables Reference

Key configuration options in .env:
# Database
PG__URL=postgresql://postgres:postgres@postgres:5432/chunkr

# Redis
REDIS__URL=redis://redis:6379

# Object Storage
AWS__ENDPOINT=http://minio:9000
AWS__ACCESS_KEY=minioadmin
AWS__SECRET_KEY=minioadmin

# LLM Configuration Path
LLM__MODELS_PATH=./models.yaml

# Worker URLs
WORKER__GENERAL_OCR_URL=http://ocr:8000
WORKER__SEGMENTATION_URL=http://segmentation:8000
WORKER__SERVER_URL=http://localhost:8000

# Authentication
AUTH__KEYCLOAK_URL=http://keycloak:8080

Next Steps

Quickstart

Make your first API request

API Reference

Explore the complete API

Configuration

Advanced configuration options

Examples

Code examples and use cases