Document Segmentation Engine Overview
The Document Segmentation Engine analyzes PDF documents and extracts specific page ranges based on semantic descriptions, explicit page ranges, or table of contents entries. It supports multi-stage filtering with optimized processing for large documents.Engine Inputs
The Document Segmentation Engine Configuration has the following parameters:- page_description: required. A semantic description of the pages to select. Supports multiple formats:
- Semantic search: Natural language description of target pages (e.g., “Any page containing financial data”)
- @PAGERANGE prefix: Explicit page ranges (e.g.,
@PAGERANGE: 3, 5-15) - @TOC prefix: Table of contents entries (e.g.,
@TOC: Chapter 1, Section 2) - Can combine multiple formats in one query
- pdf_file: required. PDF file to select pages from. Can be file uploads, URLs, or file IDs.
- model: optional. The AI model to use (defaults to tier 1 image model).