PDF Extraction Lite
This Engine extracts data from PDF documents into your desired structure in lite mode fashion.
PDF Extraction Lite Engine Inputs
PDF Extraction Lite Configuration
The PDF Extraction Lite Engine Configuration has four parameters that take in values:
- instruction: optional. A string used to prompt the Agent during job execution.
- page_filter: optional. A string describing which PDF pages to consider during extraction. Otherwise, considers all pages.
- pdf_file: required. The PDF document to extract from.
- output_schema: optional. Defines the exact structure of the JSON output that the extracted data will populate. Follows the standard JSON schema specification.
See Template Strings for dynamic parameter configuration.
PDF Extraction Lite Output
The output will always be a JSON value of the structure specified in the output_schema (if you defined it).
PDF Extraction Lite Example
Let’s run through an example using this engine together.
Create an Agent
Click on the “Add Agent” button in the top right corner of the Agents page.
Enter a name and an optional description of your Agent.
Select the PDF Extraction Lite Engine
Configure the engine as follows
-
instruction: $instruction
-
page_filter: $page_filter
-
pdf_file: $pdf_file
-
output_schema: Copy and paste the JSON schema below (hit Use Text).
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"items": {
"type": "object",
"properties": {
"row_name": {
"type": "string",
"description": "The name of a row"
},
"value_1": {
"type": "string",
"description": "The monetary value of the row for September 30, 2023"
},
"value_2": {
"type": "string",
"description": "The monetary value of the row for March 30, 2024"
},
"type": {
"type": "string",
"description": "Current asset, Non-current asset"
}
},
"description": "A row in the \"Assets\" section"
},
"description": "All entries under the \"Assets\" section in the \"CONDENSED CONSOLIDATED BALANCE SHEETS\""
}
You can click Use Widget to then view the JSON schema in the UI.
Create the Agent
Hit the Create button. Now, let’s run it on a PDF through the UI.
View the Agent you just created
Create a new Agent job
Fill in the Agent inputs
Leave instruction empty.
page_filter: “Only process the “CONDENSED CONSOLIDATED BALANCE SHEETS (Unaudited)” page.”
pdf_file: Download this PDF file
Here are the filled-in Agent inputs:
Run the job
Hit the Create button at the bottom to start the PDF extraction lite job.
View the Results
Click View of the respective job to view its status and results.
Scroll down the Agent Job Details page and you’ll see the job outputs.