Batch Processing with LLM APIs

Batch processing transforms LLMs from interactive tools into data processing pipelines. When you need to process thousands of documents, generate content at scale, or run evaluations over a large dataset, the batch APIs offered by OpenAI and Anthropic provide a 50% cost discount while handling the throughput management for you. This guide covers when to use batch processing, how to implement it, and how to handle errors.

When to Use Batch Processing

Batch processing is ideal for: offline data enrichment (classifying or extracting information from a corpus of documents); content generation at scale (generating product descriptions, summaries, or metadata for thousands of items); evaluation pipelines (running a benchmark over a test dataset); and dataset construction (generating training examples, annotations, or synthetic data). Batch processing is not suitable for real-time user interactions, latency-sensitive operations, or tasks where each result is needed immediately to determine the next request. The batch API processes requests asynchronously with a 24-hour completion window.

OpenAI Batch API

The OpenAI Batch API accepts a JSONL file where each line is a request object with a custom_id (for matching results to inputs), the method (always POST), the url (e.g., /v1/chat/completions), and the body (identical to a synchronous API request). Upload the file, create a batch with the file ID, poll for completion, then download the output JSONL file. Results are in the same JSONL format with each line containing the custom_id and the full response (or an error). The 50% discount applies to all batch requests. Maximum batch size is 50,000 requests or 100 MB of JSONL, whichever is smaller.

Anthropic Message Batches

Anthropic's Message Batches API works similarly: submit a list of request objects (up to 10,000 per batch), each with a custom_id and a params object containing the same fields as a synchronous messages.create call. Poll with the batch ID until processing is complete, then retrieve results. The 50% discount applies. Anthropic also supports prompt caching within batch requests — if your batch uses a consistent large system prompt, combine caching and batching to achieve approximately 70-75% cost reduction compared to uncached synchronous requests.

Parallelisation and Rate Limits

Even with the batch API, rate limits apply — you cannot submit an unlimited number of requests in a single batch. For very large jobs (millions of requests), split into multiple batches and submit them sequentially. For time-sensitive batch jobs, running multiple concurrent batches distributes the load and reduces total processing time. If you need results faster than 24 hours, use the synchronous API with rate limit management (token bucket, exponential backoff) to maximise throughput within your rate limits, accepting the 2x higher cost.

Error Handling in Batch Jobs

Batch output files include both successful results and error responses. Always process error responses: log the custom_id, error code, and error message; retry failed requests (the error is usually a model error like content_policy_violation or a temporary service error); and build a reconciliation step that matches output custom_ids to input records to identify and handle any missing results. Common batch errors include: context length exceeded (reduce the input); content policy violations (review and modify the prompt); and model errors (retry with the same request, usually succeeds on second attempt). Implement idempotent batch job design so partially completed batches can be safely re-run.

Monitoring and Cost Tracking

Track batch job progress by polling the batch status endpoint every few minutes. Log the submitted_at, in_progress_at, and completed_at timestamps to understand processing latency. Track per-batch costs by summing the total_tokens across all output responses and multiplying by the model's batch price. For large ongoing batch pipelines, set up a dashboard that shows daily batch volume, average completion time, error rate, and total cost. Compare batch costs against what the equivalent synchronous requests would have cost to quantify the savings and validate that the batch API discount is being applied correctly.

Try These Tools

FAQ

Can I cancel a batch that has already been submitted?
Yes. Both OpenAI and Anthropic support cancellation of in-progress batches. Partial results up to the cancellation point are available in the output file. Cancelled batches are still charged for requests that were processed before cancellation.
What happens if the batch takes longer than 24 hours?
OpenAI guarantees batch completion within 24 hours for standard batches. If a batch is not complete within 24 hours, it is cancelled and the output file contains results for completed requests only. This is rare but plan for it by checking the expires_at field and monitoring batch status.
Can I use different models in the same batch?
No. Each batch is associated with a single model. To use multiple models for the same dataset, create separate batches per model.

Related Guides