> ## Documentation Index
> Fetch the complete documentation index at: https://docs.roe-ai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Web Insights

> Extracts insights and structured information from URLs

## Web Insights Engine Overview

The Web Insights Engine extracts structured information, insights, and content from websites by crawling URLs. It supports vision mode for visual analysis, custom extraction instructions, and schema-based output formatting. Can handle Reddit URLs and Amazon product pages through specialized extraction.

## Engine Inputs

The Web Insights Engine Configuration has the following parameters:

### Core Parameters

* **url**: *required.* The website URL you want to extract information from.
* **instruction**: *required (in AI extraction mode).* Instructions for AI to analyze and extract data from the webpage.
* **output\_schema**: *required (in AI extraction mode).* JSON schema defining the structure of data to extract. Follows the standard [JSON schema specification](https://json-schema.org/).
* **model**: *optional.* The AI model to use (default: `gpt-5.1-2025-11-13`).
* **vision\_mode**: *optional.* Enable visual analysis of the webpage screenshot in addition to text content (default: False).

### Crawl Configuration

The **crawl\_config** object provides advanced settings for how the website should be crawled:

* **crawling\_only**: *optional.* Skip AI extraction and return raw scraped content (default: False). When enabled, `instruction` and `output_schema` are not required.
* **min\_wait\_time\_sec**: *optional.* Time to wait for page to load in seconds (default: 0, range: 0-60).
* **save\_html**: *optional.* Save the original HTML source code (default: False).
* **save\_markdown**: *optional.* Save a clean, formatted markdown version of the webpage content (default: False).
* **save\_screenshot**: *optional.* Capture a visual screenshot of the webpage (default: False).
* **collect\_web\_analytics**: *optional.* Capture network request/response data in HAR format (default: False).

See [Template Strings](/agents/input-definition#template-strings) for dynamic parameter configuration.

## Engine Output

The output will be a JSON value matching the structure specified in the **output\_schema**. When `crawling_only` is enabled, returns the raw scraped content instead.

## Web Insights Example

Let's run through an example using this engine together.

<Steps>
  <Step title="Create an Agent">
    Click on the "Add Agent" button in the
    top right corner of the Agents page.

    <Frame>
      <img src="https://mintcdn.com/roeai/qeWYCF2quzHQHhsD/images/add-agent.png?fit=max&auto=format&n=qeWYCF2quzHQHhsD&q=85&s=b3e1ec9b816ed1e57cb1ecfa53ff4288" width="1920" height="1045" data-path="images/add-agent.png" />
    </Frame>
  </Step>

  <Step title="Select the Web Insights Engine" />

  <Step title="Configure the engine as follows">
    <Info>\$ starts a template string</Info>

    * **url**: \${url}
    * **instruction**: \${instruction}
    * **model**: gpt-5.1-2025-11-13
    * **vision\_mode**: False
    * **output\_schema**: Copy and paste the JSON schema below (hit **Use Text**).

    ```json theme={null}
    {
      "type": "object",
      "properties": {
        "moon": {
          "type": "string",
          "description": "Which moon is the article talking about"
        },
        "planet": {
          "type": "string",
          "description": "Which planet does the moon in question belong to"
        },
        "summary": {
          "type": "string",
          "description": "Summary of the article"
        }
      }
    }
    ```

    You can click **Use Widget** to then view the JSON schema in the UI.

    <Frame>
      <img src="https://mintcdn.com/roeai/KH5NetBx7zzOcueq/images/use-widget.png?fit=max&auto=format&n=KH5NetBx7zzOcueq&q=85&s=0c487196b29eae93000e0c7e185c2ebd" width="1202" height="448" data-path="images/use-widget.png" />
    </Frame>
  </Step>

  <Step title="Create the Agent">
    Hit the **Create Agent** button. Now, let's run it on a URL through the UI.
  </Step>

  <Step title="View the Agent you just created" />

  <Step title="Create a new Agent job">
    <Frame>
      <img src="https://mintcdn.com/roeai/wyVOyeWPONjXHsrt/images/agent-new-job.png?fit=max&auto=format&n=wyVOyeWPONjXHsrt&q=85&s=3517e85ca469e22edd5d35f7861c8763" width="1942" height="674" data-path="images/agent-new-job.png" />
    </Frame>
  </Step>

  <Step title="Fill in the Agent inputs">
    Leave **instruction** empty.

    **url**: [https://en.wikipedia.org/wiki/Europa\_Clipper](https://en.wikipedia.org/wiki/Europa_Clipper)

    Here are the filled-in Agent inputs:

    <Frame>
      <img src="https://mintcdn.com/roeai/KH5NetBx7zzOcueq/images/url-extraction-engine/url-extraction-job-inputs.png?fit=max&auto=format&n=KH5NetBx7zzOcueq&q=85&s=2468bdd2ce64c5606d29a9b9bff8e215" width="1442" height="1622" data-path="images/url-extraction-engine/url-extraction-job-inputs.png" />
    </Frame>
  </Step>

  <Step title="Run the job">
    Hit the **Create** button at the bottom to start the Web Insights
    job.
  </Step>

  <Step title="View the Results">
    Click on the job to view its status and results.

    <Frame>
      <img src="https://mintcdn.com/roeai/KH5NetBx7zzOcueq/images/url-extraction-engine/view-job.png?fit=max&auto=format&n=KH5NetBx7zzOcueq&q=85&s=88fe82a8de5b520c37f686561f2d57d4" width="2876" height="370" data-path="images/url-extraction-engine/view-job.png" />
    </Frame>

    You'll see the results on the right side.

    <Frame>
      <img src="https://mintcdn.com/roeai/KH5NetBx7zzOcueq/images/url-extraction-engine/results.png?fit=max&auto=format&n=KH5NetBx7zzOcueq&q=85&s=0f0337db0df0760a48a73e85ca38d32a" width="3062" height="1936" data-path="images/url-extraction-engine/results.png" />
    </Frame>

    <Info>Notice that, as expected, the **JSON output only contains the information we defined in the output\_schema**.</Info>
  </Step>
</Steps>
