Document Types
Supported document types and their extraction schemas
DocuRift supports multiple document types, each with specialized extraction models optimized for that specific format. Our machine learning models are trained on millions of real-world documents, enabling high-accuracy extraction across a wide variety of layouts, formats, and languages.
Overview
Choosing the correct document type is crucial for accurate extraction. Each document type has a specialized model that understands the unique structure and fields of that format. For example, an invoice model knows to look for vendor information, line items, totals, and payment terms, while a bill of lading model focuses on shipper details, cargo descriptions, and port information.
When you specify the correct document type, DocuRift:
- Applies the most accurate extraction model for that format
- Returns structured data with fields specific to that document type
- Provides higher confidence scores through targeted extraction
- Reduces errors by understanding the document's context
If you're unsure about the document type, you can use the generic type, which performs OCR and returns raw text. However, for best results, we recommend specifying the exact document type whenever possible.
Supported Document Types
Generic Document
General-purpose OCR for any document
Invoice
Extract billing data from invoices
Bill of Lading
Parse maritime shipping documents
Proof of Delivery
Extract delivery confirmation data
Packing List
Process package manifests
Customs Declaration
Extract customs form data
Freight Invoice
Parse freight billing documents
Delivery Challan
Process delivery challans
Lorry Receipt
Extract transport receipt data
Airway Bill
Parse air cargo documents
Choosing the Right Type
Not sure which document type to use? This guide will help you select the best option based on your document's purpose:
| If Your Document Is... | Use Type | Typical Fields Extracted |
|------------------------|----------|--------------------------|
| Unknown or mixed format | generic | Raw text, tables |
| Commercial billing | invoice | Vendor, line items, totals, dates |
| Ocean shipping | bill_of_lading | Shipper, consignee, cargo, ports |
| Delivery confirmation | proof_of_delivery | Recipient, signature, timestamp |
| Package contents | packing_list | Items, quantities, weights |
| Import/export forms | customs_declaration | HS codes, values, duties |
| Shipping charges | freight_invoice | Charges, routes, carriers |
| Goods transfer | delivery_challan | Sender, receiver, goods |
| Truck transport | lorry_receipt | Vehicle, driver, consignment |
| Air cargo | airway_bill | AWB number, routing, weights |
Credit Costs by Type
Different document types require different levels of processing complexity. Here's how credits are calculated:
| Type | Credits per Page | Why | |------|-----------------|-----| | Generic | 1 | Basic OCR processing | | Invoice | 2 | Structured extraction with line items | | Bill of Lading | 3 | Complex multi-party documents | | Customs Declaration | 3 | Regulatory compliance fields | | All Others | 2 | Standard structured extraction |
Credits are charged per page processed. Multi-page documents are charged based on the total number of pages. For example, a 5-page invoice would cost 10 credits (5 pages × 2 credits).
Example Request
Processing a document is simple. Just specify the file and document type in your API request:
curl -X POST https://api.docurift.com/v1/documents/process \
-H "X-API-Key: your_api_key" \
-F "file=@document.pdf" \
-F "documentType=invoice"
The API will return structured JSON with all extracted fields, confidence scores, and metadata. See each document type's page for the specific fields and schema returned.
Accuracy and Confidence
Each extracted field includes a confidence score between 0 and 1. Higher scores indicate greater certainty in the extraction. We recommend:
- 0.9+: High confidence, suitable for automated processing
- 0.7-0.9: Medium confidence, may want human review
- Below 0.7: Low confidence, recommend manual verification
Need a New Document Type?
If you need support for a document type not listed here, contact us. We're always expanding our supported formats based on customer needs. Our team can work with you to build custom extraction models for your specific document types.