Documentation Index
Fetch the complete documentation index at: https://docs.roe-ai.com/llms.txt
Use this file to discover all available pages before exploring further.
Document Insights Engine Overview
The Document Insights Engine extracts structured information, insights, and data from PDF documents using AI. It processes documents in batches, supports schema-based extraction, source tracing, and handles multi-page documents efficiently.Engine Inputs
The Document Insights Engine Configuration has the following parameters:- instructions: required. Instructions for the AI describing what to extract from the PDF. Can also use alias
instruction. - pdf_files: required. PDF files to extract insights from. Can be file uploads, URLs, or file IDs. Supports multiple files (comma/newline separated). Can also use alias
pdf_file. - batch_size: optional. Number of PDF pages to process in each batch (default: 10, range: 1-50). Larger batches are faster but may affect quality.
- model: optional. The AI model to use (default:
gpt-4.1-2025-04-14). - temperature: optional. Controls randomness in output (default: 0.0). Range: 0.0 (deterministic) to 1.0 (most random).
- use_source: optional. Trace the sources of extracted data (default: False). Adds cost when enabled.
- convert_to_images: optional. Convert PDF pages to images and send to the model along with text (default: True). Improves extraction for visual documents.
- output_schema: optional. JSON schema defining the structure of data to extract. Follows the standard JSON schema specification.
Engine Output
The output will be a JSON value matching the structure specified in the output_schema (if defined). Without a schema, returns extracted content in a default format.Example Usage
Create an Agent
Click on the “Add Agent” button in the top right corner of the Agents page.
Enter a name and an optional description of your Agent.

Configure the engine
$ starts a template string
- instructions: $instructions
- pdf_files: $pdf_files
- batch_size: 10
- model: gpt-4.1-2025-04-14
- temperature: 0.0
- convert_to_images: True
- output_schema: Copy and paste the JSON schema below:

Run a job
Create a new job and provide:
- instructions: “Extract all rows from the ASSETS table in the balance sheet”
- pdf_files: Upload or provide a URL to a PDF document


