What happens when the context window fills up?

Most APIs either truncate the oldest messages silently or return an error. You should implement a pruning strategy — such as summarizing older turns — before the window fills to avoid losing important context.

Does the system prompt count against the context window?

Yes. The system prompt is tokenized and included in the context window along with the conversation history. It is counted as input tokens and billed accordingly.

How do I reduce system prompt token usage?

Remove redundant instructions, use bullet points instead of paragraphs, and move static reference data (like product lists) to retrieval rather than inline inclusion. Even small reductions multiply across millions of requests.

Calculate Context Window Usage for a System Prompt

Every LLM request draws from a fixed context window budget measured in tokens. Your system prompt, conversation history, and expected response must all fit within that budget — exceed it and the model either truncates your input silently or returns an error. This example shows a realistic system prompt for a customer support assistant and calculates exactly what fraction of common model context windows it occupies. System prompts are often the largest single consumer of context in production deployments. A detailed persona, a long set of instructions, and examples of correct responses can easily total 1,000–4,000 tokens before the user has typed a single word. The context calculator shows the token count alongside the percentage of capacity consumed for GPT-4o (128K), Claude 3.5 Sonnet (200K), and Gemini 1.5 Pro (1M), giving you an immediate sense of headroom for conversation history. As a rule of thumb: reserve at least 20% of the context window for the assistant's response, and plan conversation history pruning strategies before you need them in production. This example's system prompt leaves ample room for multi-turn conversations on most modern models.

Example

You are a helpful customer support assistant for Acme Software. Your role is to assist users with billing questions, account management, and technical issues related to Acme's suite of cloud products.

Guidelines:
- Always greet the user by name if they have provided it
- Be concise and professional, avoiding jargon where possible
- For billing issues, ask for the account email and last 4 digits of the payment method
- Never reveal internal pricing tiers, staff names, or escalation paths
- If you cannot resolve an issue, offer to create a support ticket
- Respond in the same language the user writes in

You have access to the following actions: lookup_account, create_ticket, check_service_status

[ open in AI Context Window Calculator → ]

FAQ

What happens when the context window fills up?: Most APIs either truncate the oldest messages silently or return an error. You should implement a pruning strategy — such as summarizing older turns — before the window fills to avoid losing important context.
Does the system prompt count against the context window?: Yes. The system prompt is tokenized and included in the context window along with the conversation history. It is counted as input tokens and billed accordingly.
How do I reduce system prompt token usage?: Remove redundant instructions, use bullet points instead of paragraphs, and move static reference data (like product lists) to retrieval rather than inline inclusion. Even small reductions multiply across millions of requests.

Related Examples

Count Tokens in a Paragraph

Token counting is the foundation of every cost and context window calculation wh...

Estimate API Cost for a Chat Conversation

Budgeting for LLM API usage requires understanding both input and output token p...

Process a Long Document with Claude 200K Context

Claude 3.5 Sonnet and Haiku both offer a 200,000-token context window, equivalen...

Check Context Window Utilization for GPT-4o

GPT-4o supports a 128,000-token context window — enough for roughly 100,000 word...