Build a Gemini Generate Content Request
The Google Gemini API uses the generateContent endpoint with a structure that differs meaningfully from both OpenAI and Anthropic. The system instruction goes in a dedicated systemInstruction field, user and model turns go in the contents array using "user" and "model" roles (not "assistant"), and generation parameters live in a generationConfig object. This example shows a complete Gemini 1.5 Pro request for a text summarization task. Gemini 1.5 Pro offers a 2,000,000-token context window — by far the largest of any production LLM API. This makes it uniquely suited for tasks involving very long documents, entire codebases, or lengthy research papers that exceed even Claude's 200K window. The 1.5 Flash model offers 1,000,000 tokens at significantly lower cost, making million-token context processing economically viable for the first time. The generationConfig object controls output behavior: temperature (0.0–2.0, default 1.0), topP, topK, maxOutputTokens, and stopSequences. Gemini also supports responseMimeType: "application/json" for JSON output mode, similar to OpenAI's json_object mode. For structured extraction tasks, combine JSON mode with a clear schema in the system instruction.
{
"model": "gemini-1.5-pro",
"systemInstruction": {
"parts": [{"text": "You are a concise technical writer. Summarize documents in plain English without jargon."}]
},
"contents": [
{
"role": "user",
"parts": [{"text": "Summarize the key points of the following technical documentation in 3-5 bullet points:\n\nREADME: This library provides a unified interface for connecting to multiple vector databases including Pinecone, Weaviate, Qdrant, and Chroma. It abstracts the client configuration, authentication, and query formats into a consistent API. Supported operations include upsert, query, delete, and list. The library handles connection pooling, retry logic, and rate limiting automatically."}]
}
],
"generationConfig": {
"temperature": 0.3,
"maxOutputTokens": 512
}
}FAQ
- What is the Gemini equivalent of the system prompt?
- The systemInstruction field at the top level of the request. It uses the same parts array format as content messages. Unlike OpenAI where system is a message role, Gemini treats systemInstruction as a separate, higher-priority field.
- What is the difference between Gemini 1.5 Pro and 1.5 Flash?
- Both have the same 1M-2M token context window, but Flash is optimized for speed and cost (about 4x cheaper than Pro). Pro is better for complex reasoning, nuanced writing, and tasks requiring strong instruction following. Flash is excellent for summarization and extraction.
- Does Gemini support function calling like OpenAI?
- Yes. Gemini supports function declarations in the tools array, using a functionDeclarations key. The model returns a functionCall part in the response, and you return the result via a functionResponse part in a subsequent user message.
Related Examples
The Gemini API provides a countTokens endpoint that returns the exact token coun...
Estimate Gemini API Cost for a Document PipelineGoogle Gemini pricing has a tiered structure based on prompt length: requests un...
Build an OpenAI Chat Completion RequestThe Chat Completion API is the primary interface for all GPT models and the foun...
Build an Anthropic Messages API RequestThe Anthropic Messages API is the primary interface for all Claude models. Unlik...