Skip to main content
The extract node uses AI to pull structured data from documents based on a schema you define. It is the core action node for document processing in DocPipe. You define the fields you want, and the AI extracts them from the document content.

Configuration

FieldTypeRequiredDescription
EngineselectYesExtraction engine: Engine 1 or Engine 2
Schemaschema editorYesDefines the fields to extract (name, type, description)
PrecisionselectNoProcessing precision: Fast, Standard, or Advanced
Extraction hintsstringNoNatural language instructions to guide the AI extraction
JSON pathstringNoJSONPath expression to extract a subset of the output
Chunk sizenumberNoNumber of pages per chunk for large documents
Chunk overlapnumberNoNumber of overlapping pages between chunks for context preservation

Engines

  • Engine 1: Faster, lower cost. Best for simple documents with clear structure.
  • Engine 2: More capable. Best for complex documents, handwriting, or ambiguous layouts.

Chunked extraction

For large documents, you can split processing into smaller chunks. Set a chunk size (number of pages per chunk) and optionally a chunk overlap (pages shared between consecutive chunks to preserve context). Each chunk is extracted independently, and the results are combined automatically.

Inputs and outputs

Allowed inputs: Trigger nodes, route, parse, review. Output: Structured JSON data matching the configured schema.

Credit cost

EnginePrecisionCost per page
Engine 1Fast1 credit
Engine 1Standard2 credits
Engine 1Advanced3 credits
Engine 2Fast2 credits
Engine 2Standard3 credits
Engine 2Advanced5 credits

Schema design

Best practices for designing extraction schemas

Review action

Add human review after extraction

Upload and extract

Quick guide to uploading and extracting data

Parse action

Pre-process documents with OCR before extraction