When should I use the batch API instead of real-time?

Use the batch API for offline jobs where 24-hour latency is acceptable: dataset labeling, bulk content generation, nightly report generation, and one-time data processing tasks. Never use batch for user-facing features that need immediate responses.

How much does prompt caching save?

Prompt caching on OpenAI charges 50% less for cached input tokens and 0% for 100% cache hit scenarios. Claude caching is more nuanced with different rates. For jobs with a fixed system prompt, caching the prompt reduces input costs by the fraction it represents of total input tokens.

What happens if a batch job partially fails?

Both OpenAI and Anthropic batch APIs process each item independently. Failed items are reported in the output file with error details. You can re-run only the failed items using the error output as a new batch input.

Calculate Batch Processing Cost for a Dataset

Processing large datasets through AI APIs requires careful cost estimation before committing to a run. The batch APIs from OpenAI and Anthropic offer 50% discounts versus real-time pricing but have 24-hour completion windows — making them ideal for offline data processing tasks but unsuitable for user-facing features. This example shows a 10,000-row dataset and calculates total cost, estimated processing time, and per-item cost across real-time and batch modes for several models. The key variables in batch cost estimation are: average tokens per item (input + output), the model being used, whether you qualify for batch API pricing, and whether you have existing cached prompts that share a common prefix. Prompt caching (available on GPT-4o and Claude models) reduces input token costs by 50-90% for requests that share a long common prefix like a system prompt or reference document — this is the single most impactful cost optimization for large-scale batch jobs. For estimating token counts before running the full batch, sample 100 representative items from your dataset, compute average token counts, and add 15% margin. Real datasets always have outliers with much higher token counts than average — capping max_tokens and handling truncated responses gracefully prevents individual long items from distorting your cost estimate.

Example

Dataset: product_reviews.csv
Total rows: 10000
Task: sentiment analysis and category extraction
Average input tokens per row: 350
Expected output tokens per row: 80
Model options: gpt-4o, gpt-4o-mini, claude-3-5-haiku
Use batch API: yes
Prompt caching eligible: yes (shared 500-token system prompt)

[ open in AI Batch Cost Calculator → ]

FAQ

When should I use the batch API instead of real-time?: Use the batch API for offline jobs where 24-hour latency is acceptable: dataset labeling, bulk content generation, nightly report generation, and one-time data processing tasks. Never use batch for user-facing features that need immediate responses.
How much does prompt caching save?: Prompt caching on OpenAI charges 50% less for cached input tokens and 0% for 100% cache hit scenarios. Claude caching is more nuanced with different rates. For jobs with a fixed system prompt, caching the prompt reduces input costs by the fraction it represents of total input tokens.
What happens if a batch job partially fails?: Both OpenAI and Anthropic batch APIs process each item independently. Failed items are reported in the output file with error details. You can re-run only the failed items using the error output as a new batch input.

Related Examples

Estimate API Cost for a Chat Conversation

Budgeting for LLM API usage requires understanding both input and output token p...

Count Tokens in a Paragraph

Token counting is the foundation of every cost and context window calculation wh...

Format CSV Data for AI Fine-Tuning

Fine-tuning LLMs on custom datasets requires converting raw training data into t...

Estimate OpenAI API Cost for a Chatbot

Running a production chatbot on OpenAI costs more than most developers expect wh...