Jobs API
Monitor and manage asynchronous processing jobs
The Jobs API allows you to monitor the status of asynchronous document processing jobs and retrieve their results. When processing large documents or handling high volumes of files, asynchronous processing is often the best approach. The Jobs API gives you full control over tracking and managing these background processing tasks.
Asynchronous processing is ideal for production workloads where you need to handle many documents without blocking your application. Instead of waiting for each document to complete, you can submit batches and receive webhook notifications when processing finishes.
Overview
When you submit a document for async processing, a job is created and assigned a unique ID. You can use this ID to track the job's progress. Jobs transition through these states as they're processed:
- pending - Job is queued for processing
- processing - Document is being analyzed
- completed - Extraction finished successfully
- failed - An error occurred during processing
Available Endpoints
Get Job Status
Check the current status of a processing job
List Jobs
Query your job history with filters
Job Lifecycle
┌─────────┐ ┌────────────┐ ┌───────────┐
│ pending │ ──▶ │ processing │ ──▶ │ completed │
└─────────┘ └────────────┘ └───────────┘
│
▼
┌────────┐
│ failed │
└────────┘
Polling vs Webhooks
While you can poll the job status endpoint, we recommend using Webhooks for better efficiency:
| Method | Pros | Cons | |--------|------|------| | Polling | Simple, works everywhere | Wastes API calls, delayed updates | | Webhooks | Real-time, efficient | Requires endpoint setup |
Example: Polling for Status
import time
import requests
def wait_for_completion(job_id, api_key, max_wait=60):
"""Poll until job completes or timeout."""
start = time.time()
while time.time() - start < max_wait:
response = requests.get(
f"https://api.docurift.com/v1/jobs/{job_id}",
headers={"X-API-Key": api_key}
)
status = response.json()["data"]["status"]
if status == "completed":
return response.json()["data"]
elif status == "failed":
raise Exception("Job failed")
time.sleep(2) # Poll every 2 seconds
raise TimeoutError("Job did not complete in time")