How does Gemini's countTokens endpoint work?

POST to /v1beta/models/{model}:countTokens with the same contents array as your generateContent request. The response returns a totalTokens integer representing the exact token count that would be charged for the request.

Does Gemini count system instructions separately?

No. The countTokens response returns a single totalTokens count that includes system instructions, all content turns, and any tool definitions. There is no breakdown by message type in the current API.

How many tokens does an image use in Gemini?

Gemini 1.5 models use 258 tokens per image for standard-resolution inputs. High-resolution images may use more tokens. Videos are billed at 263 tokens per second of video content.

Count Tokens for a Gemini Request

The Gemini API provides a countTokens endpoint that returns the exact token count for a request — including system instructions, conversation history, and user messages — without consuming your quota or generating a response. This is especially valuable for Gemini given its massive 1M–2M token context windows, where fitting within limits requires precise measurement. This example shows a countTokens request and explains how to interpret the totalTokens response field. Gemini uses the SentencePiece tokenizer, which produces different token counts than GPT's tiktoken or Claude's tokenizer. For multilingual content, SentencePiece typically produces more efficient tokenization for languages beyond English. For code, the counts are similar across tokenizers. Always use Gemini's own countTokens endpoint rather than estimating from another model's token count. For vision-capable requests (Gemini 1.5 Pro and Flash), images are also counted — each image uses approximately 258 tokens for a standard-resolution image. The countTokens endpoint accepts image content in the same format as the generateContent endpoint, so you can get an accurate total count for multimodal requests before submitting them.

Example

Count tokens for this Gemini request:

System instruction: You are a helpful assistant that answers questions about software architecture and design patterns.

User message: Explain the difference between the Repository pattern and the Active Record pattern in software development. When should I use each one? Provide a concrete example in Python for both patterns.

Model: gemini-1.5-flash

[ open in Gemini Token Counter → ]

FAQ

How does Gemini's countTokens endpoint work?: POST to /v1beta/models/{model}:countTokens with the same contents array as your generateContent request. The response returns a totalTokens integer representing the exact token count that would be charged for the request.
Does Gemini count system instructions separately?: No. The countTokens response returns a single totalTokens count that includes system instructions, all content turns, and any tool definitions. There is no breakdown by message type in the current API.
How many tokens does an image use in Gemini?: Gemini 1.5 models use 258 tokens per image for standard-resolution inputs. High-resolution images may use more tokens. Videos are billed at 263 tokens per second of video content.

Related Examples

Build a Gemini Generate Content Request

The Google Gemini API uses the generateContent endpoint with a structure that di...

Estimate Gemini API Cost for a Document Pipeline

Google Gemini pricing has a tiered structure based on prompt length: requests un...

Count Tokens in a Paragraph

Token counting is the foundation of every cost and context window calculation wh...

Count Tokens for GPT-4o with tiktoken

OpenAI models use the tiktoken library with BPE (Byte Pair Encoding) to tokenize...