Process Document (Async)
Asynchronously process large documents with webhook notifications
Upload and process documents asynchronously. This endpoint returns immediately with a job ID while processing continues in the background. Ideal for large documents and batch processing.
/v1/documents/process/asyncOverview
The asynchronous processing endpoint is designed for:
- Large documents (10+ pages)
- Batch document processing
- Background processing workflows
- Integration with webhook notifications
For immediate results with smaller documents, use the sync endpoint.
Request
Headers
| Parameter | Type | Description |
|---|---|---|
X-API-Keyrequired | string | Your DocuRift API key (format: frc_xxxxx) |
Content-Typerequired | string | Must be multipart/form-data for file uploads |
Body Parameters
| Parameter | Type | Description |
|---|---|---|
filerequired | File | Document file to process. Supported formats: PDF, PNG, JPG, JPEG, WEBP, TIFF. Maximum size: 50MB |
documentType | string | Type of document for optimized extraction Default: generic |
webhookUrl | string | URL to receive webhook notification when processing completes |
webhookSecret | string | Secret for webhook signature verification (HMAC-SHA256) |
callbackMetadata | object | Custom metadata to include in the webhook payload |
priority | string | Processing priority: normal, high (paid plans only) Default: normal |
language | string | Primary language of the document (ISO 639-1 code) Default: en |
extractTables | boolean | Enable table extraction and structuring Default: true |
Code Examples
cURL
curl -X POST https://api.docurift.com/v1/documents/process/async \
-H "X-API-Key: frc_your_api_key_here" \
-F "file=@large-document.pdf" \
-F "documentType=invoice" \
-F "webhookUrl=https://your-app.com/webhooks/docurift" \
-F "webhookSecret=whsec_your_secret_here"Python
import requests
import os
import time
API_KEY = os.getenv('DOCURIFT_API_KEY')
API_URL = 'https://api.docurift.com/v1'
def process_document_async(file_path, document_type='generic', webhook_url=None):
"""Submit a document for asynchronous processing."""
headers = {
'X-API-Key': API_KEY
}
with open(file_path, 'rb') as f:
files = {'file': f}
data = {
'documentType': document_type,
}
if webhook_url:
data['webhookUrl'] = webhook_url
response = requests.post(
f'{API_URL}/documents/process/async',
headers=headers,
files=files,
data=data
)
response.raise_for_status()
return response.json()
def poll_document_status(document_id, max_attempts=60, interval=5):
"""Poll for document processing completion."""
headers = {'X-API-Key': API_KEY}
for attempt in range(max_attempts):
response = requests.get(
f'{API_URL}/documents/{document_id}',
headers=headers
)
response.raise_for_status()
result = response.json()
status = result['data']['status']
if status == 'completed':
return result
elif status == 'failed':
raise Exception(f"Processing failed: {result['data'].get('error')}")
print(f"Status: {status}, waiting {interval}s...")
time.sleep(interval)
raise Exception("Timeout waiting for document processing")
# Example: Submit and poll for results
result = process_document_async('large-invoice.pdf', 'invoice')
document_id = result['data']['id']
print(f"Submitted document: {document_id}")
# Poll for completion
final_result = poll_document_status(document_id)
print(f"Extracted data: {final_result['data']['extractedData']}")JavaScript (Node.js)
import fs from 'fs';
import FormData from 'form-data';
import fetch from 'node-fetch';
const API_KEY = process.env.DOCURIFT_API_KEY;
const API_URL = 'https://api.docurift.com/v1';
async function processDocumentAsync(filePath, documentType = 'generic', webhookUrl = null) {
const form = new FormData();
form.append('file', fs.createReadStream(filePath));
form.append('documentType', documentType);
if (webhookUrl) {
form.append('webhookUrl', webhookUrl);
}
const response = await fetch(`${API_URL}/documents/process/async`, {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
...form.getHeaders()
},
body: form
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.error.message);
}
return response.json();
}
async function pollDocumentStatus(documentId, maxAttempts = 60, interval = 5000) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const response = await fetch(`${API_URL}/documents/${documentId}`, {
headers: { 'X-API-Key': API_KEY }
});
const result = await response.json();
const status = result.data.status;
if (status === 'completed') {
return result;
} else if (status === 'failed') {
throw new Error(`Processing failed: ${result.data.error}`);
}
console.log(`Status: ${status}, waiting...`);
await new Promise(resolve => setTimeout(resolve, interval));
}
throw new Error('Timeout waiting for document processing');
}
// Example: Submit and poll
const result = await processDocumentAsync('large-invoice.pdf', 'invoice');
console.log('Submitted document:', result.data.id);
const finalResult = await pollDocumentStatus(result.data.id);
console.log('Extracted data:', finalResult.data.extractedData);JavaScript (with Webhook)
import crypto from 'crypto';
import express from 'express';
const app = express();
app.use(express.json());
const WEBHOOK_SECRET = process.env.DOCURIFT_WEBHOOK_SECRET;
function verifyWebhookSignature(payload, signature) {
const expectedSignature = crypto
.createHmac('sha256', WEBHOOK_SECRET)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expectedSignature)
);
}
app.post('/webhooks/docurift', (req, res) => {
const signature = req.headers['x-docurift-signature'];
// Verify webhook authenticity
if (!verifyWebhookSignature(req.body, signature)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const { event, data } = req.body;
switch (event) {
case 'document.completed':
console.log('Document processed:', data.id);
console.log('Extracted data:', data.extractedData);
// Process the extracted data
break;
case 'document.failed':
console.error('Processing failed:', data.id, data.error);
// Handle failure
break;
}
res.json({ received: true });
});
app.listen(3000);Response
Success Response (202 Accepted)
{
"success": true,
"data": {
"id": "doc_abc123xyz456",
"status": "processing",
"estimatedCompletionTime": "2024-01-26T10:35:00Z",
"queuePosition": 3,
"webhookConfigured": true,
"createdAt": "2024-01-26T10:30:00Z"
},
"message": "Document submitted for processing. You will receive a webhook notification when complete."
}Response Fields
| Parameter | Type | Description |
|---|---|---|
id | string | Unique document identifier for polling and retrieval |
status | string | Current status: queued, processing |
estimatedCompletionTime | string | Estimated ISO 8601 timestamp for completion |
queuePosition | number | Position in processing queue (if queued) |
webhookConfigured | boolean | Whether a webhook URL was provided |
createdAt | string | ISO 8601 timestamp when document was submitted |
Webhook Notification
When processing completes, DocuRift sends a POST request to your webhook URL.
Webhook Payload
{
"event": "document.completed",
"timestamp": "2024-01-26T10:32:15Z",
"data": {
"id": "doc_abc123xyz456",
"organizationId": "org_xyz789",
"fileName": "large-invoice.pdf",
"documentType": "invoice",
"status": "completed",
"pagesProcessed": 15,
"confidence": 0.94,
"extractedData": {
"invoiceNumber": "INV-2024-00456",
"totalAmount": 15750.00,
"currency": "USD"
},
"metadata": {
"processingTimeMs": 45230,
"modelVersion": "v2.1.0"
},
"processedAt": "2024-01-26T10:32:15Z"
},
"callbackMetadata": {
"internalId": "order-12345",
"department": "accounts-payable"
}
}Webhook Headers
| Header | Description |
|--------|-------------|
| X-DocuRift-Signature | HMAC-SHA256 signature for verification |
| X-DocuRift-Event | Event type (document.completed, document.failed) |
| X-DocuRift-Timestamp | ISO 8601 timestamp of the event |
| X-DocuRift-Delivery-Id | Unique delivery ID for idempotency |
Webhook Events
| Event | Description |
|-------|-------------|
| document.completed | Document processing finished successfully |
| document.failed | Document processing failed |
Webhook Retry Policy
If your webhook endpoint returns a non-2xx status code, DocuRift will retry:
- Retry attempts: Up to 5 retries
- Retry schedule: 1 min, 5 min, 30 min, 2 hours, 24 hours
- Retry headers:
X-DocuRift-Retry-Countindicates retry number
Polling Pattern
If you prefer polling over webhooks, use this pattern:
import time
import requests
def wait_for_document(document_id, timeout=300, interval=5):
"""
Poll for document completion with exponential backoff.
Args:
document_id: The document ID to poll
timeout: Maximum wait time in seconds
interval: Initial polling interval
"""
start_time = time.time()
current_interval = interval
while time.time() - start_time < timeout:
response = requests.get(
f'{API_URL}/documents/{document_id}',
headers={'X-API-Key': API_KEY}
)
result = response.json()
status = result['data']['status']
if status == 'completed':
return result['data']
elif status == 'failed':
raise Exception(result['data'].get('error', 'Unknown error'))
# Exponential backoff (max 30 seconds)
time.sleep(min(current_interval, 30))
current_interval *= 1.5
raise TimeoutError(f"Document {document_id} did not complete within {timeout}s")Polling vs Webhooks
Webhooks are recommended for production use as they're more efficient and provide real-time notifications. Use polling only during development or when webhooks aren't feasible.
Error Responses
400 Bad Request
{
"success": false,
"error": {
"code": "INVALID_WEBHOOK_URL",
"message": "Webhook URL must be HTTPS and publicly accessible"
}
}401 Unauthorized
{
"success": false,
"error": {
"code": "INVALID_API_KEY",
"message": "Invalid API key"
}
}402 Payment Required
{
"success": false,
"error": {
"code": "INSUFFICIENT_CREDITS",
"message": "Insufficient credits for estimated page count"
}
}Error Codes Reference
| Code | HTTP Status | Description | Solution |
|------|-------------|-------------|----------|
| INVALID_FILE_TYPE | 400 | Unsupported file format | Use PDF, PNG, JPG, WEBP, or TIFF |
| INVALID_WEBHOOK_URL | 400 | Invalid or non-HTTPS webhook | Use HTTPS URL |
| FILE_TOO_LARGE | 413 | File exceeds 50MB limit | Compress or split document |
| INVALID_API_KEY | 401 | API key invalid or expired | Check API key |
| INSUFFICIENT_CREDITS | 402 | Not enough credits | Purchase more credits |
| RATE_LIMIT_EXCEEDED | 429 | Too many requests | Implement backoff |
Best Practices
Webhook Security
import hmac
import hashlib
def verify_webhook(payload: bytes, signature: str, secret: str) -> bool:
"""Verify webhook signature for authenticity."""
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)Idempotency
Use the X-DocuRift-Delivery-Id header to ensure idempotent webhook handling:
const processedDeliveries = new Set();
app.post('/webhooks/docurift', async (req, res) => {
const deliveryId = req.headers['x-docurift-delivery-id'];
// Skip if already processed
if (processedDeliveries.has(deliveryId)) {
return res.json({ status: 'already_processed' });
}
// Process the webhook
await handleWebhook(req.body);
// Mark as processed
processedDeliveries.add(deliveryId);
res.json({ received: true });
});Webhook Timeout
Webhook endpoints must respond within 30 seconds. For long-running operations, acknowledge immediately and process asynchronously.
Related Endpoints
- Process Document (Sync) - For immediate results
- Get Document - Retrieve processed document
- List Documents - List all documents