Use Gemini Context Caching for Repeated Queries

Gemini context caching allows you to store the computed KV representations of a large document or system instruction and reference the cached content across multiple requests at a fraction of the full processing cost. For workflows where you run many queries against the same document — a legal contract, a codebase, a research paper — context caching reduces input token costs by up to 75%. This example shows how to create a cached context and reference it in subsequent requests. The caching workflow has three steps: first, create a cached content object using the cachedContents endpoint, specifying the content to cache and a time-to-live (TTL). Second, receive a cached content name in the response. Third, reference this name in subsequent generateContent requests using the cachedContent field. The cache is stored server-side; you only pay the lower cache read price for the cached tokens in all subsequent requests during the TTL. Context caching has a minimum token requirement (4,096 tokens) and a minimum duration (1 hour). The cost model is: you pay to create the cache (full input price), then a lower hourly storage fee while the cache is active, and a reduced per-request rate for cache reads. The break-even point is approximately 3-4 requests for a 1-hour cache, making it economical for any workflow with more than a few queries per document.

Example
# Step 1: Create cached context
POST /v1beta/cachedContents
{
  "model": "models/gemini-1.5-pro",
  "systemInstruction": {"parts": [{"text": "You are a legal document analyst."}]},
  "contents": [{"role": "user", "parts": [{"text": "[Full 50,000-token contract text here]"}]}],
  "ttl": "3600s"
}

# Step 2: Use cached context (reference name from step 1)
POST /v1beta/models/gemini-1.5-pro:generateContent
{
  "cachedContent": "cachedContents/abc123",
  "contents": [{"role": "user", "parts": [{"text": "What are the termination clauses?"}]}]
}
[ open in Gemini API Request Builder → ]

FAQ

How much does Gemini context caching cost?
Creating a cache costs the standard input price. Storage costs $1.00 per million tokens per hour. Cache reads cost 25% of the standard input price. For 10+ queries per cached document, caching is almost always cost-positive.
What is the minimum content size for caching?
The minimum cache size is 4,096 tokens. Content smaller than this cannot be cached. The practical sweet spot is 10,000+ tokens where the per-request savings clearly outweigh the storage cost.
Does context caching work with multimodal content?
Yes. You can cache video, images, and audio alongside text in a single cached context object. This is especially valuable for video analysis workflows where the same recording is analyzed with many different questions.

Related Examples