Document Insights

Document Insights Engine Inputs

Document Insights Configuration

The Document Insights Engine Configuration has four parameters that take in values:

instruction: optional. A string used to prompt the Agent during job execution.
pdf_file: required. The PDF document to extract from.
Model: required. The model to use.
output_schema: optional. Defines the exact structure of the JSON output that the extracted data will populate. Follows the standard JSON schema specification.

See Template Strings for dynamic parameter configuration.

Document Insights Output

The output will always be a JSON value of the structure specified in the output_schema (if you defined it).

Document Insights Example

Let’s run through an example using this engine together.

Create an Agent

Click on the “Add Agent” button in the top right corner of the Agents page.

Enter a name and an optional description of your Agent.

Select the Document Insights Engine

Configure the engine as follows

$ starts a template string

instruction: $instruction
page_filter: $page_filter
pdf_file: $pdf_file
output_schema: Copy and paste the JSON schema below (hit Use Text).

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "row_name": {
        "type": "string",
        "description": "The name of a row"
      },
      "value_1": {
        "type": "string",
        "description": "The monetary value of the row for September 30, 2023"
      },
      "value_2": {
        "type": "string",
        "description": "The monetary value of the row for March 30, 2024"
      },
      "type": {
        "type": "string",
        "description": "Current asset, Non-current asset"
      }
    },
    "description": "A row in the \"Assets\" section"
  },
  "description": "All entries under the \"Assets\" section in the \"CONDENSED CONSOLIDATED BALANCE SHEETS\""
}

If you look at page 8 of the PDF file (linked below), the output_schema is essentially extracting every row under the ASSETS table. You should take a moment to try to understand this.

You can click Use Widget to then view the JSON schema in the UI.

Create the Agent

Hit the Create button. Now, let’s run it on a PDF through the UI.

View the Agent you just created

Create a new Agent job

Fill in the Agent inputs

Leave instruction empty.page_filter: “Only process the “CONDENSED CONSOLIDATED BALANCE SHEETS (Unaudited)” page.”pdf_file: Download this PDF fileHere are the filled-in Agent inputs:

Run the job

Hit the Create button at the bottom to start the Document Insights job.

View the Results

Click View of the respective job to view its status and results.

Scroll down the Agent Job Details page and you’ll see the job outputs.

Notice that, as expected, the JSON output only contains the information we wanted in the output_schema for the ASSETS table.

Get Started

Agents

VolansDB

Data Resources

Enterprise

Use Cases

Document Insights Engine Inputs

Document Insights Output

Document Insights Example

Get Started

Agents

VolansDB

Data Resources

Enterprise

Use Cases

​Document Insights Engine Inputs

​Document Insights Output

​Document Insights Example

Document Insights Engine Inputs

Document Insights Output

Document Insights Example