URL Website Extraction
This Engine extracts structured information from a webpage with the given URL.
URL Website Extraction Engine Inputs
URL Website Extraction Configuration
The URL Website Extraction Engine Configuration has four parameters that take in values:
- instruction: optional. A string used to prompt the Agent during webpage extraction.
- url: required. The URL of the webpage to extract from.
- model: required. The model to use for extraction.
- output_schema: optional. Defines the exact structure of the JSON output that the extracted data will populate. Follows the standard JSON schema specification. This field tells the model exactly what to extract from the webpage.
See Template Strings for dynamic parameter configuration.
URL Website Extraction Output
The output will always be a JSON value of the structure specified in the output_schema (if you defined it).
URL Website Extraction Example
Let’s run through an example using this engine together.
Create an Agent
Click on the “Add Agent” button in the top right corner of the Agents page.
Select the URL Website Extraction Engine
Configure the engine as follows
-
instruction: ${instruction}
-
url: ${url}
-
model: gpt-4o-2024-08-06
-
output_schema: Copy and paste the JSON schema below (hit Use Text).
You can click Use Widget to then view the JSON schema in the UI.
Create the Agent
Hit the Create Agent button. Now, let’s run it on a URL through the UI.
View the Agent you just created
Create a new Agent job
Fill in the Agent inputs
Leave instruction empty.
url: https://en.wikipedia.org/wiki/Europa_Clipper
Here are the filled-in Agent inputs:
Run the job
Hit the Create button at the bottom to start the URL Website Extraction job.
View the Results
Click on the job to view its status and results.
You’ll see the results on the right side.