2 min read

Jobs API

Monitor and manage asynchronous processing jobs

The Jobs API allows you to monitor the status of asynchronous document processing jobs and retrieve their results. When processing large documents or handling high volumes of files, asynchronous processing is often the best approach. The Jobs API gives you full control over tracking and managing these background processing tasks.

Asynchronous processing is ideal for production workloads where you need to handle many documents without blocking your application. Instead of waiting for each document to complete, you can submit batches and receive webhook notifications when processing finishes.

Overview

When you submit a document for async processing, a job is created and assigned a unique ID. You can use this ID to track the job's progress. Jobs transition through these states as they're processed:

  1. pending - Job is queued for processing
  2. processing - Document is being analyzed
  3. completed - Extraction finished successfully
  4. failed - An error occurred during processing

Available Endpoints

Job Lifecycle

┌─────────┐     ┌────────────┐     ┌───────────┐
│ pending │ ──▶ │ processing │ ──▶ │ completed │
└─────────┘     └────────────┘     └───────────┘
                      │
                      ▼
                 ┌────────┐
                 │ failed │
                 └────────┘

Polling vs Webhooks

While you can poll the job status endpoint, we recommend using Webhooks for better efficiency:

| Method | Pros | Cons | |--------|------|------| | Polling | Simple, works everywhere | Wastes API calls, delayed updates | | Webhooks | Real-time, efficient | Requires endpoint setup |

Example: Polling for Status

import time
import requests

def wait_for_completion(job_id, api_key, max_wait=60):
    """Poll until job completes or timeout."""
    start = time.time()
    while time.time() - start < max_wait:
        response = requests.get(
            f"https://api.docurift.com/v1/jobs/{job_id}",
            headers={"X-API-Key": api_key}
        )
        status = response.json()["data"]["status"]

        if status == "completed":
            return response.json()["data"]
        elif status == "failed":
            raise Exception("Job failed")

        time.sleep(2)  # Poll every 2 seconds

    raise TimeoutError("Job did not complete in time")