How much does prompt caching save on Claude?

Cache reads cost $0.30/MTok versus $3.00/MTok for full input on Claude 3.5 Sonnet — a 90% reduction for cached tokens. If your system prompt is 1,000 tokens and documents average 8,000 tokens, caching the 1,000-token prompt saves 11% of total input cost per request.

How long does Claude cache persist?

Claude prompt caches persist for 5 minutes after the last access. Each cache hit resets the 5-minute timer. For batch jobs, structure your requests to make the first request in each batch hit the cache to warm it up before parallel processing.

Can I combine the Batch API and prompt caching?

Yes. The Batch API supports prompt caching. Requests in the same batch that share a common prefix benefit from cache hits on top of the 50% batch discount. This combination is ideal for high-volume document processing pipelines.

Estimate Claude API Cost for Document Processing

Claude pricing has two dimensions that differ from OpenAI: prompt caching and the Batch API. Prompt caching stores the KV representations of your request prefix and charges a much lower rate for subsequent requests that share the same prefix. The Batch API processes requests with a 24-hour window at 50% discount. Combining both optimizations can reduce document processing costs by 75-90% compared to the standard real-time pricing without caching. This example estimates the cost of processing 5,000 contracts against a standard 1,000-token legal analysis system prompt. With caching enabled and the Batch API used for overnight processing, the cost drops dramatically compared to naive real-time processing. The estimator shows the breakdown: how much goes to the system prompt, how much to the document tokens, and how much to output generation. A note on Claude's pricing model: input tokens are priced differently based on whether they are cache writes (first time a prefix is stored), cache reads (subsequent hits on the same prefix), or uncached inputs. Output tokens are priced the same regardless of caching. Understanding this three-tier input pricing is essential for accurate cost projection on high-volume Claude deployments.

Example

Task: legal contract analysis
Model: claude-3-5-sonnet-20241022
Number of documents: 5000
Average document tokens: 8500
System prompt tokens: 1000 (cacheable)
Output tokens per document: 600
Processing mode: batch (24h window)
Prompt caching: enabled

[ open in Claude API Cost Calculator → ]

FAQ

How much does prompt caching save on Claude?: Cache reads cost $0.30/MTok versus $3.00/MTok for full input on Claude 3.5 Sonnet — a 90% reduction for cached tokens. If your system prompt is 1,000 tokens and documents average 8,000 tokens, caching the 1,000-token prompt saves 11% of total input cost per request.
How long does Claude cache persist?: Claude prompt caches persist for 5 minutes after the last access. Each cache hit resets the 5-minute timer. For batch jobs, structure your requests to make the first request in each batch hit the cache to warm it up before parallel processing.
Can I combine the Batch API and prompt caching?: Yes. The Batch API supports prompt caching. Requests in the same batch that share a common prefix benefit from cache hits on top of the 50% batch discount. This combination is ideal for high-volume document processing pipelines.

Related Examples

Build an Anthropic Messages API Request

The Anthropic Messages API is the primary interface for all Claude models. Unlik...

Process a Long Document with Claude 200K Context

Claude 3.5 Sonnet and Haiku both offer a 200,000-token context window, equivalen...

Estimate API Cost for a Chat Conversation

Budgeting for LLM API usage requires understanding both input and output token p...

Calculate Batch Processing Cost for a Dataset

Processing large datasets through AI APIs requires careful cost estimation befor...