Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Endpoint

POST /api/v1/task/parse

Authentication

This endpoint requires API key authentication via the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

The request body must be JSON with the following schema:
file
string
required
The file to be uploaded. Can be a URL or a base64 encoded file.
file_name
string
The name of the file to be uploaded. If not set, a name will be generated.
ocr_strategy
enum
default:"All"
Controls the Optical Character Recognition (OCR) strategy:
  • All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)
  • Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present, the bounding boxes from the text layer are used.
segmentation_strategy
enum
default:"LayoutAnalysis"
Controls the segmentation strategy:
  • LayoutAnalysis: Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.
  • Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
high_resolution
boolean
default:true
Whether to use high-resolution images for cropping and post-processing. (Latency penalty: ~7 seconds per page)
expires_in
integer
The number of seconds until task is deleted. Expired tasks cannot be updated, polled, or accessed via web interface.
error_handling
enum
default:"Fail"
Controls how errors are handled during processing:
  • Fail: Stops processing and fails the task when any error occurs
  • Continue: Attempts to continue processing despite non-critical errors (e.g., LLM refusals)
chunk_processing
object
Controls the settings for chunking and post-processing of each chunk.
segment_processing
object
Controls the post-processing of each segment type. Allows generation of HTML and Markdown from Chunkr models.
llm_processing
object
Controls the LLM used for the task.

Response

task_id
string
The unique identifier for the task
status
enum
The status of the task: Starting, Processing, Succeeded, Failed, or Cancelled
created_at
string
The date and time when the task was created and queued (ISO 8601 format)
started_at
string
The date and time when the task started processing (ISO 8601 format)
finished_at
string
The date and time when the task was finished (ISO 8601 format)
expires_at
string
The date and time when the task will expire (ISO 8601 format)
message
string
A message describing the task’s status or any errors that occurred
task_url
string
The presigned URL of the task
configuration
object
The task configuration including all processing settings and the input file URL
output
object
Output data (only present when task is complete)

Status Codes

  • 200: Task created successfully
  • 400: Bad request (invalid file, invalid base64 data, unsupported file type)
  • 429: Usage limit exceeded
  • 500: Internal server error

Examples

curl -X POST https://api.chunkr.ai/api/v1/task/parse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file": "https://example.com/document.pdf",
    "ocr_strategy": "Auto",
    "segmentation_strategy": "LayoutAnalysis",
    "high_resolution": true,
    "chunk_processing": {
      "target_length": 512,
      "ignore_headers_and_footers": true,
      "tokenizer": "Word"
    }
  }'

Notes

  • The returned task will typically be in a Starting or Processing state
  • Use the GET /task/ endpoint to poll for completion
  • Tasks have a status progression: StartingProcessingSucceeded or Failed
  • If a task expires, it cannot be accessed, updated, or viewed via the web interface