Text Extraction
This Engine extracts data from any text source into your desired structure.
Text Extraction Engine Inputs
Text Extraction Configuration
The Text Extraction Engine Configuration has four parameters that take in values:
- instruction: optional. A string used to prompt the Agent during job execution.
- text: required. The text input to extract from.
- model: required. The model to use for extraction.
- output_schema: optional. Defines the exact structure of the JSON output that the extracted data will populate. Follows the standard JSON schema specification.
See Template Strings for dynamic parameter configuration.
Text Extraction Output
The output will always be a JSON value of the structure specified in the output_schema (if you defined it).
Text Extraction Example
Let’s run through an example using this engine together.
Create an Agent
Click on the “Add Agent” button in the top right corner of the Agents page.
Enter a name and an optional description of your Agent.
Select the Text Extraction Engine
Remove the model input from Agent Input Definition
Remove the model Agent input
Configure the engine as follows
-
instruction: $instruction
-
text: $text
-
model: gpt-4o
-
output_schema: Copy and paste the JSON schema below (hit Use Text) or refer to the image below for using the UI widget to define the JSON schema.
{
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Title of the text"
},
"author": {
"type": "string",
"description": "Author of the text"
},
"content": {
"type": "string",
"description": "Summary of the content of the text"
}
},
"description": "important information about the text"
}
Defining output_schema using the UI Widget
Create the Agent
Hit the Create button. Now, let’s run it on text input through the UI.
View the Agent you just created
Create a new Agent job
Fill in the Agent inputs
Paste in the following text for the text input field:
Title: The Role of AI in Managing Unstructured Data in Modern Data Warehouses
Author: GPT
In the rapidly evolving field of artificial intelligence, data plays a pivotal role. The ability to extract, classify, and retrieve information from diverse data sources such as documents, webpages, videos, images, and audio is crucial for developing intelligent systems. Advanced AI models, like those developed by Roe AI, enable seamless integration and utilization of unstructured data within data warehouses.
Data warehouses traditionally handle structured data, but the growing volume of unstructured data requires more sophisticated solutions. By leveraging AI-powered SQL, Roe AI provides tools that not only store but also process and analyze unstructured data. This technology allows users to perform complex queries, automate data classification, and enhance retrieval-augmented generation (RAG) processes.
The implications of such technology are vast, impacting various industries from healthcare to finance. For example, in healthcare, AI can extract relevant patient information from medical records, aiding in faster diagnosis and personalized treatment plans. In finance, AI-driven data extraction can streamline regulatory compliance and fraud detection by analyzing large volumes of transactions and communications.
In conclusion, the integration of AI with data warehouses signifies a major advancement in data management and utilization. Companies that adopt these technologies can unlock valuable insights from their unstructured data, driving innovation and efficiency in their operations.
Here are the filled-in Agent inputs:
Run the job
Hit the Create button at the bottom to start the text extraction job.
View the Results
Click View of the respective job to view its status and results.
Scroll down the Agent Job Details page and you’ll see the job outputs.