Upload Your Unstructured Data

1

Navigate to the Roe Dataset page

2

Upload a file to the default dataset

For the purpose of this quickstart, we will use the same invoice PDF file as invoice.pdf file in the invoice-extraction-example dataset. Go to that dataset and download the file first. Then upload to the default dataset.

View the default dataset


Upload your files

3

View your uploaded files

Once uploaded, your files will be listed here

View Your Data

Click on the Workspace tab to go to ROE SQL workspace. A default table called dataset_default has been created for you. Any new files uploaded to the default dataset will automatically be added to this table as well. You can check the table by clicking dataset_default table in the table list.

Or you can run the following SQL query to view the data in a new worksheet tab:

SELECT * FROM dataset_default;

Process Your Data

Now that you have uploaded your data, you can start processing it using the built-in functions. For example, you can extract information from the invoice PDF with the following SQL query:

-- You're shown three example worksheets for three different use cases. Take a moment to read through each one!
-- This worksheet extracts information from the invoice PDF as specified by the return format requested. Try running it!

SELECT name, file, extract_from(
  'returns following structure: {
    from: <from company name>,
    recipient: <recipient company name>,
    line_items: [
      {
        name: <item name>
        quantity: <item quantity>
        cost: <item total cost>
      },
      <more line items>
    ],
    subtotal: <amount before tax>
    tax: <tax amount>
    total: <total amount due>
  }', -- extract_from() uses the Default Extraction agent, so you don't need to specify an agent
  file
) FROM dataset_examples WHERE dataset='invoice-extraction-example'; -- You can run on all datasets in the table or a specific one (what we do here)

The result of the extract_from function is a reference to the async job. Once the job is completed, you can view the extracted data in the worksheet by clicking on the Job ID.

Next Steps

Learn More About ROE VolansDB SQL

We are based on clickhouse dialect, learn more about clickhouse SQL dialect in their official documentation.

Also check out our examples to see how you can use SQL to process your data.

Learn More About ROE VolansDB SQL Functions

extract_from is just one of the many functions you can use to process your data. We have a variety of functions available to help you process your data with ROE AI Agents. You can learn more about the functions available in the Functions section.