Skip to main content
This guide walks you through the complete flow: creating a pipe, configuring an extraction pipeline, uploading a document, and retrieving the results.

Overview

The basic extraction flow is:
  1. Create a pipe with an extraction pipeline
  2. Upload a document (via dashboard, email, or API)
  3. The pipeline processes the document and extracts data
  4. Receive results via callback webhook or view them in the UI

Step 1: Create a pipe

Go to the Pipes page and click New pipe. Give it a descriptive name like “Invoice Extraction”.

Step 2: Build the pipeline

In the Pipeline tab, add these nodes:
  1. Upload trigger: to accept documents from the UI
  2. Extract action: configure with a schema defining the fields you want
  3. Callback output: to receive results at a webhook URL
Connect them in sequence and click Save.

Example schema

For invoice extraction, you might define:
{
  "fields": [
    { "name": "vendor_name", "type": "string", "description": "Name of the vendor or supplier" },
    { "name": "invoice_number", "type": "string", "description": "Invoice or receipt number" },
    { "name": "invoice_date", "type": "date", "description": "Date the invoice was issued" },
    { "name": "total_amount", "type": "number", "description": "Total amount due" },
    { "name": "currency", "type": "string", "description": "Currency code (e.g., USD, EUR)" },
    {
      "name": "line_items",
      "type": "array",
      "description": "Individual line items on the invoice",
      "items": [
        { "name": "description", "type": "string" },
        { "name": "quantity", "type": "number" },
        { "name": "unit_price", "type": "number" },
        { "name": "amount", "type": "number" }
      ]
    }
  ]
}

Step 3: Activate and upload

Click Activate to enable the pipe, then go to the Files tab and upload a document.

Step 4: View results

Navigate to the Runs page to see the processing results. Click the run to inspect each step’s output, including the extracted data.

Using the API

You can also upload documents programmatically using the HTTP trigger. First, add an HTTP trigger node to your pipeline and create an API key. The response includes the run ID, which you can use to track processing status. Configure a callback output to receive results automatically when processing completes. See the HTTP trigger guide for detailed API examples.