8 min read

Process Document (Async)

Asynchronously process large documents with webhook notifications

Upload and process documents asynchronously. This endpoint returns immediately with a job ID while processing continues in the background. Ideal for large documents and batch processing.

POST/v1/documents/process/async

Overview

The asynchronous processing endpoint is designed for:

  • Large documents (10+ pages)
  • Batch document processing
  • Background processing workflows
  • Integration with webhook notifications

For immediate results with smaller documents, use the sync endpoint.

Request

Headers

ParameterTypeDescription
X-API-Keyrequired
stringYour DocuRift API key (format: frc_xxxxx)
Content-Typerequired
stringMust be multipart/form-data for file uploads

Body Parameters

ParameterTypeDescription
filerequired
FileDocument file to process. Supported formats: PDF, PNG, JPG, JPEG, WEBP, TIFF. Maximum size: 50MB
documentType
stringType of document for optimized extraction
Default: generic
webhookUrl
stringURL to receive webhook notification when processing completes
webhookSecret
stringSecret for webhook signature verification (HMAC-SHA256)
callbackMetadata
objectCustom metadata to include in the webhook payload
priority
stringProcessing priority: normal, high (paid plans only)
Default: normal
language
stringPrimary language of the document (ISO 639-1 code)
Default: en
extractTables
booleanEnable table extraction and structuring
Default: true

Code Examples

cURL

curl
curl -X POST https://api.docurift.com/v1/documents/process/async \
-H "X-API-Key: frc_your_api_key_here" \
-F "file=@large-document.pdf" \
-F "documentType=invoice" \
-F "webhookUrl=https://your-app.com/webhooks/docurift" \
-F "webhookSecret=whsec_your_secret_here"

Python

process_async.py
import requests
import os
import time

API_KEY = os.getenv('DOCURIFT_API_KEY')
API_URL = 'https://api.docurift.com/v1'

def process_document_async(file_path, document_type='generic', webhook_url=None):
  """Submit a document for asynchronous processing."""
  headers = {
      'X-API-Key': API_KEY
  }

  with open(file_path, 'rb') as f:
      files = {'file': f}
      data = {
          'documentType': document_type,
      }

      if webhook_url:
          data['webhookUrl'] = webhook_url

      response = requests.post(
          f'{API_URL}/documents/process/async',
          headers=headers,
          files=files,
          data=data
      )

  response.raise_for_status()
  return response.json()

def poll_document_status(document_id, max_attempts=60, interval=5):
  """Poll for document processing completion."""
  headers = {'X-API-Key': API_KEY}

  for attempt in range(max_attempts):
      response = requests.get(
          f'{API_URL}/documents/{document_id}',
          headers=headers
      )
      response.raise_for_status()
      result = response.json()

      status = result['data']['status']

      if status == 'completed':
          return result
      elif status == 'failed':
          raise Exception(f"Processing failed: {result['data'].get('error')}")

      print(f"Status: {status}, waiting {interval}s...")
      time.sleep(interval)

  raise Exception("Timeout waiting for document processing")

# Example: Submit and poll for results
result = process_document_async('large-invoice.pdf', 'invoice')
document_id = result['data']['id']
print(f"Submitted document: {document_id}")

# Poll for completion
final_result = poll_document_status(document_id)
print(f"Extracted data: {final_result['data']['extractedData']}")

JavaScript (Node.js)

processAsync.js
import fs from 'fs';
import FormData from 'form-data';
import fetch from 'node-fetch';

const API_KEY = process.env.DOCURIFT_API_KEY;
const API_URL = 'https://api.docurift.com/v1';

async function processDocumentAsync(filePath, documentType = 'generic', webhookUrl = null) {
const form = new FormData();
form.append('file', fs.createReadStream(filePath));
form.append('documentType', documentType);

if (webhookUrl) {
  form.append('webhookUrl', webhookUrl);
}

const response = await fetch(`${API_URL}/documents/process/async`, {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    ...form.getHeaders()
  },
  body: form
});

if (!response.ok) {
  const error = await response.json();
  throw new Error(error.error.message);
}

return response.json();
}

async function pollDocumentStatus(documentId, maxAttempts = 60, interval = 5000) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
  const response = await fetch(`${API_URL}/documents/${documentId}`, {
    headers: { 'X-API-Key': API_KEY }
  });

  const result = await response.json();
  const status = result.data.status;

  if (status === 'completed') {
    return result;
  } else if (status === 'failed') {
    throw new Error(`Processing failed: ${result.data.error}`);
  }

  console.log(`Status: ${status}, waiting...`);
  await new Promise(resolve => setTimeout(resolve, interval));
}

throw new Error('Timeout waiting for document processing');
}

// Example: Submit and poll
const result = await processDocumentAsync('large-invoice.pdf', 'invoice');
console.log('Submitted document:', result.data.id);

const finalResult = await pollDocumentStatus(result.data.id);
console.log('Extracted data:', finalResult.data.extractedData);

JavaScript (with Webhook)

webhookHandler.js
import crypto from 'crypto';
import express from 'express';

const app = express();
app.use(express.json());

const WEBHOOK_SECRET = process.env.DOCURIFT_WEBHOOK_SECRET;

function verifyWebhookSignature(payload, signature) {
const expectedSignature = crypto
  .createHmac('sha256', WEBHOOK_SECRET)
  .update(JSON.stringify(payload))
  .digest('hex');

return crypto.timingSafeEqual(
  Buffer.from(signature),
  Buffer.from(expectedSignature)
);
}

app.post('/webhooks/docurift', (req, res) => {
const signature = req.headers['x-docurift-signature'];

// Verify webhook authenticity
if (!verifyWebhookSignature(req.body, signature)) {
  return res.status(401).json({ error: 'Invalid signature' });
}

const { event, data } = req.body;

switch (event) {
  case 'document.completed':
    console.log('Document processed:', data.id);
    console.log('Extracted data:', data.extractedData);
    // Process the extracted data
    break;

  case 'document.failed':
    console.error('Processing failed:', data.id, data.error);
    // Handle failure
    break;
}

res.json({ received: true });
});

app.listen(3000);

Response

Success Response (202 Accepted)

response.json
{
"success": true,
"data": {
  "id": "doc_abc123xyz456",
  "status": "processing",
  "estimatedCompletionTime": "2024-01-26T10:35:00Z",
  "queuePosition": 3,
  "webhookConfigured": true,
  "createdAt": "2024-01-26T10:30:00Z"
},
"message": "Document submitted for processing. You will receive a webhook notification when complete."
}

Response Fields

ParameterTypeDescription
id
stringUnique document identifier for polling and retrieval
status
stringCurrent status: queued, processing
estimatedCompletionTime
stringEstimated ISO 8601 timestamp for completion
queuePosition
numberPosition in processing queue (if queued)
webhookConfigured
booleanWhether a webhook URL was provided
createdAt
stringISO 8601 timestamp when document was submitted

Webhook Notification

When processing completes, DocuRift sends a POST request to your webhook URL.

Webhook Payload

webhook_payload.json
{
"event": "document.completed",
"timestamp": "2024-01-26T10:32:15Z",
"data": {
  "id": "doc_abc123xyz456",
  "organizationId": "org_xyz789",
  "fileName": "large-invoice.pdf",
  "documentType": "invoice",
  "status": "completed",
  "pagesProcessed": 15,
  "confidence": 0.94,
  "extractedData": {
    "invoiceNumber": "INV-2024-00456",
    "totalAmount": 15750.00,
    "currency": "USD"
  },
  "metadata": {
    "processingTimeMs": 45230,
    "modelVersion": "v2.1.0"
  },
  "processedAt": "2024-01-26T10:32:15Z"
},
"callbackMetadata": {
  "internalId": "order-12345",
  "department": "accounts-payable"
}
}

Webhook Headers

| Header | Description | |--------|-------------| | X-DocuRift-Signature | HMAC-SHA256 signature for verification | | X-DocuRift-Event | Event type (document.completed, document.failed) | | X-DocuRift-Timestamp | ISO 8601 timestamp of the event | | X-DocuRift-Delivery-Id | Unique delivery ID for idempotency |

Webhook Events

| Event | Description | |-------|-------------| | document.completed | Document processing finished successfully | | document.failed | Document processing failed |

Webhook Retry Policy

If your webhook endpoint returns a non-2xx status code, DocuRift will retry:

  • Retry attempts: Up to 5 retries
  • Retry schedule: 1 min, 5 min, 30 min, 2 hours, 24 hours
  • Retry headers: X-DocuRift-Retry-Count indicates retry number

Polling Pattern

If you prefer polling over webhooks, use this pattern:

polling.py
import time
import requests

def wait_for_document(document_id, timeout=300, interval=5):
  """
  Poll for document completion with exponential backoff.

  Args:
      document_id: The document ID to poll
      timeout: Maximum wait time in seconds
      interval: Initial polling interval
  """
  start_time = time.time()
  current_interval = interval

  while time.time() - start_time < timeout:
      response = requests.get(
          f'{API_URL}/documents/{document_id}',
          headers={'X-API-Key': API_KEY}
      )

      result = response.json()
      status = result['data']['status']

      if status == 'completed':
          return result['data']
      elif status == 'failed':
          raise Exception(result['data'].get('error', 'Unknown error'))

      # Exponential backoff (max 30 seconds)
      time.sleep(min(current_interval, 30))
      current_interval *= 1.5

  raise TimeoutError(f"Document {document_id} did not complete within {timeout}s")
💡

Polling vs Webhooks

Webhooks are recommended for production use as they're more efficient and provide real-time notifications. Use polling only during development or when webhooks aren't feasible.

Error Responses

400 Bad Request

error_400.json
{
"success": false,
"error": {
  "code": "INVALID_WEBHOOK_URL",
  "message": "Webhook URL must be HTTPS and publicly accessible"
}
}

401 Unauthorized

error_401.json
{
"success": false,
"error": {
  "code": "INVALID_API_KEY",
  "message": "Invalid API key"
}
}

402 Payment Required

error_402.json
{
"success": false,
"error": {
  "code": "INSUFFICIENT_CREDITS",
  "message": "Insufficient credits for estimated page count"
}
}

Error Codes Reference

| Code | HTTP Status | Description | Solution | |------|-------------|-------------|----------| | INVALID_FILE_TYPE | 400 | Unsupported file format | Use PDF, PNG, JPG, WEBP, or TIFF | | INVALID_WEBHOOK_URL | 400 | Invalid or non-HTTPS webhook | Use HTTPS URL | | FILE_TOO_LARGE | 413 | File exceeds 50MB limit | Compress or split document | | INVALID_API_KEY | 401 | API key invalid or expired | Check API key | | INSUFFICIENT_CREDITS | 402 | Not enough credits | Purchase more credits | | RATE_LIMIT_EXCEEDED | 429 | Too many requests | Implement backoff |

Best Practices

Webhook Security

verify_webhook.py
import hmac
import hashlib

def verify_webhook(payload: bytes, signature: str, secret: str) -> bool:
  """Verify webhook signature for authenticity."""
  expected = hmac.new(
      secret.encode(),
      payload,
      hashlib.sha256
  ).hexdigest()

  return hmac.compare_digest(expected, signature)

Idempotency

Use the X-DocuRift-Delivery-Id header to ensure idempotent webhook handling:

idempotent_handler.js
const processedDeliveries = new Set();

app.post('/webhooks/docurift', async (req, res) => {
const deliveryId = req.headers['x-docurift-delivery-id'];

// Skip if already processed
if (processedDeliveries.has(deliveryId)) {
  return res.json({ status: 'already_processed' });
}

// Process the webhook
await handleWebhook(req.body);

// Mark as processed
processedDeliveries.add(deliveryId);

res.json({ received: true });
});
⚠️

Webhook Timeout

Webhook endpoints must respond within 30 seconds. For long-running operations, acknowledge immediately and process asynchronously.