Skip to main content
The extraction schema defines what structured data the AI extracts from your documents. A well-designed schema produces more accurate and consistent results.

Schema structure

A schema uses JSON Schema Draft-07 format. The root must be an object type with properties defining each field:
{
  "type": "object",
  "properties": {
    "vendor_name": {
      "type": "string",
      "description": "Name of the vendor or supplier"
    },
    "total_amount": {
      "type": "number",
      "description": "Total amount due including tax"
    }
  },
  "required": ["vendor_name", "total_amount"]
}
The required array lists fields that should always be extracted. Fields not listed in required may return null if not found in the document.

Field types

TypeDescriptionExample value
stringText value"Acme Corp"
numberNumeric value (integer or decimal)1250.00
booleanTrue or falsetrue
objectNested object with its own propertiesSee below
arrayList of nested objectsSee below
There is no date type. For date fields, use "type": "string" with a description indicating the expected format (e.g., “Date in ISO 8601 format YYYY-MM-DD”).

Nested fields and arrays

For repeating data like line items, use the array type with an items object that defines the structure of each element:
{
  "line_items": {
    "type": "array",
    "description": "Individual line items on the invoice",
    "items": {
      "type": "object",
      "properties": {
        "description": { "type": "string", "description": "Item description" },
        "quantity": { "type": "number", "description": "Quantity ordered" },
        "unit_price": { "type": "number", "description": "Price per unit" },
        "amount": { "type": "number", "description": "Line total" }
      }
    }
  }
}
Note that items is a single schema object (not an array) describing the shape of each array element.

Table extraction

For documents with tabular data (invoices, purchase orders, statements), use arrays to capture table rows. The AI identifies table structures and maps columns to your defined fields. Tips for table extraction:
  • Name fields to match common column headers
  • Include a description that mentions the column header name if it differs from the field name
  • Test with documents that have varying table formats

Best practices

Write descriptive field descriptions

The description field is used by the AI to understand what to extract. Be specific:
// Good
"total": { "type": "number", "description": "Grand total amount including tax and shipping" }

// Less effective
"total": { "type": "number", "description": "Total" }

Use specific field names

Choose field names that clearly indicate the data:
// Good
"invoice_date": { "type": "string", "description": "Invoice issue date (ISO 8601)" }
"due_date": { "type": "string", "description": "Payment due date (ISO 8601)" }

// Ambiguous
"date1": { "type": "string" }
"date2": { "type": "string" }

Add extraction instructions

Use the extract action’s Instructions field to provide context the AI can use:
  • “Dates should be in ISO 8601 format (YYYY-MM-DD)”
  • “If a field is not found in the document, return null”
  • “The total should include tax. If tax is listed separately, add it to the subtotal”

Start simple, iterate

Begin with a small number of high-value fields. Test with real documents, review the results, and gradually add more fields as you confirm accuracy.

Handle missing data

Not every document contains every field. The AI returns null for fields it cannot find. Design your downstream processing to handle missing values gracefully.

Example schemas

Invoice

{
  "type": "object",
  "properties": {
    "vendor_name": { "type": "string", "description": "Name of the vendor or company issuing the invoice" },
    "invoice_number": { "type": "string", "description": "Invoice or reference number" },
    "invoice_date": { "type": "string", "description": "Date the invoice was issued (ISO 8601)" },
    "due_date": { "type": "string", "description": "Payment due date (ISO 8601)" },
    "subtotal": { "type": "number", "description": "Subtotal before tax" },
    "tax_amount": { "type": "number", "description": "Total tax amount" },
    "total_amount": { "type": "number", "description": "Grand total including tax" },
    "currency": { "type": "string", "description": "Currency code (e.g., USD, EUR, GBP)" },
    "line_items": {
      "type": "array",
      "description": "Invoice line items",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string", "description": "Item or service description" },
          "quantity": { "type": "number", "description": "Quantity" },
          "unit_price": { "type": "number", "description": "Price per unit" },
          "amount": { "type": "number", "description": "Line total" }
        }
      }
    }
  },
  "required": ["vendor_name", "invoice_number", "total_amount"]
}

Receipt

{
  "type": "object",
  "properties": {
    "merchant_name": { "type": "string", "description": "Name of the store or merchant" },
    "transaction_date": { "type": "string", "description": "Date of purchase (ISO 8601)" },
    "total": { "type": "number", "description": "Total amount paid" },
    "payment_method": { "type": "string", "description": "Payment method used (cash, credit card, etc.)" },
    "items": {
      "type": "array",
      "description": "Purchased items",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string", "description": "Item name" },
          "price": { "type": "number", "description": "Item price" }
        }
      }
    }
  },
  "required": ["merchant_name", "total"]
}