Use Gemini Context Caching for Repeated Queries
Gemini context caching allows you to store the computed KV representations of a large document or system instruction and reference the cached content across multiple requests at a fraction of the full processing cost. For workflows where you run many queries against the same document — a legal contract, a codebase, a research paper — context caching reduces input token costs by up to 75%. This example shows how to create a cached context and reference it in subsequent requests. The caching workflow has three steps: first, create a cached content object using the cachedContents endpoint, specifying the content to cache and a time-to-live (TTL). Second, receive a cached content name in the response. Third, reference this name in subsequent generateContent requests using the cachedContent field. The cache is stored server-side; you only pay the lower cache read price for the cached tokens in all subsequent requests during the TTL. Context caching has a minimum token requirement (4,096 tokens) and a minimum duration (1 hour). The cost model is: you pay to create the cache (full input price), then a lower hourly storage fee while the cache is active, and a reduced per-request rate for cache reads. The break-even point is approximately 3-4 requests for a 1-hour cache, making it economical for any workflow with more than a few queries per document.
# Step 1: Create cached context
POST /v1beta/cachedContents
{
"model": "models/gemini-1.5-pro",
"systemInstruction": {"parts": [{"text": "You are a legal document analyst."}]},
"contents": [{"role": "user", "parts": [{"text": "[Full 50,000-token contract text here]"}]}],
"ttl": "3600s"
}
# Step 2: Use cached context (reference name from step 1)
POST /v1beta/models/gemini-1.5-pro:generateContent
{
"cachedContent": "cachedContents/abc123",
"contents": [{"role": "user", "parts": [{"text": "What are the termination clauses?"}]}]
}FAQ
- How much does Gemini context caching cost?
- Creating a cache costs the standard input price. Storage costs $1.00 per million tokens per hour. Cache reads cost 25% of the standard input price. For 10+ queries per cached document, caching is almost always cost-positive.
- What is the minimum content size for caching?
- The minimum cache size is 4,096 tokens. Content smaller than this cannot be cached. The practical sweet spot is 10,000+ tokens where the per-request savings clearly outweigh the storage cost.
- Does context caching work with multimodal content?
- Yes. You can cache video, images, and audio alongside text in a single cached context object. This is especially valuable for video analysis workflows where the same recording is analyzed with many different questions.
Related Examples
The Google Gemini API uses the generateContent endpoint with a structure that di...
Estimate Gemini API Cost for a Document PipelineGoogle Gemini pricing has a tiered structure based on prompt length: requests un...
Process a Long Document with Claude 200K ContextClaude 3.5 Sonnet and Haiku both offer a 200,000-token context window, equivalen...
Calculate Batch Processing Cost for a DatasetProcessing large datasets through AI APIs requires careful cost estimation befor...