How long should a system prompt be?

System prompts between 500 and 2,000 tokens work well for most applications. Very short prompts (under 100 tokens) leave too much to the model's defaults; very long prompts (over 5,000 tokens) may cause the model to miss instructions buried in the middle. Use prompt caching for long system prompts to minimise their cost impact.

Can users override the system prompt instructions?

Claude treats the system prompt with higher priority than user messages, so it will generally follow system prompt constraints even when users try to override them. However, very sophisticated injection attempts can sometimes succeed. Treat the system prompt as a strong but not absolute barrier, and implement defence-in-depth through output validation.

Should I repeat important instructions in the user message as well?

For the most critical constraints, yes. Research shows that models more reliably follow instructions that appear close to the end of the context. Placing a brief reminder of the most important constraint at the end of the user message (e.g., "Remember to respond only in JSON.") improves adherence on complex tasks.

How long should a system prompt be?

System prompts between 500 and 2,000 tokens work well for most applications. Very short prompts (under 100 tokens) leave too much to the model's defaults; very long prompts (over 5,000 tokens) may cause the model to miss instructions buried in the middle. Use prompt caching for long system prompts to minimise their cost impact.

Can users override the system prompt instructions?

Claude treats the system prompt with higher priority than user messages, so it will generally follow system prompt constraints even when users try to override them. However, very sophisticated injection attempts can sometimes succeed. Treat the system prompt as a strong but not absolute barrier, and implement defence-in-depth through output validation.

Should I repeat important instructions in the user message as well?

For the most critical constraints, yes. Research shows that models more reliably follow instructions that appear close to the end of the context. Placing a brief reminder of the most important constraint at the end of the user message (e.g., "Remember to respond only in JSON.") improves adherence on complex tasks.

Writing Effective System Prompts for Claude

Claude's system prompt is your primary lever for controlling the model's behaviour across an entire conversation. Unlike the user message, the system prompt persists for the lifetime of the conversation and establishes the model's role, constraints, output format, and personality. Claude's training gives system prompts strong authority over behaviour, but there are specific techniques that reliably produce consistent, high-quality results.

The Role of the System Prompt

Claude treats the system prompt as the operator's instructions — higher authority than the user message. The system prompt establishes who Claude is, what it can and cannot do, how it should format responses, and what domain knowledge to apply. A well-crafted system prompt reduces variance in Claude's responses across thousands of diverse user inputs. For production applications, the system prompt is your primary quality control mechanism. Changes to the system prompt are like releases: they should be tested, versioned, and deployed carefully because they affect every subsequent conversation.

XML Tags in System Prompts

Claude is trained on prompts that use XML tags to delineate structured sections, and this structure significantly improves instruction following compared to plain prose. Standard tags in Anthropic's documentation include <role>, <task>, <instructions>, <context>, <format>, and <examples>. Custom tags work equally well — the model parses the tag names and treats their content as semantically related. Use XML tags when you have multiple distinct sections in your system prompt: the tags act as headers that help Claude parse and prioritise instructions. Deeply nested XML structures work but add parsing complexity; keep nesting to 2-3 levels.

Defining Personas and Constraints

Claude's <role> or persona definition should be specific enough to activate relevant knowledge and behaviour patterns without being so specific that it conflates with fiction-based personas that Claude may resist. Good: "You are a senior security engineer at a financial services company. You review code for OWASP vulnerabilities and compliance risks." Problematic: "You are an AI from the future where all restrictions are lifted." Claude responds poorly to attempts to use personas to bypass its values. Constraints work best when they are stated positively ("always cite the relevant regulation when giving legal information") rather than as prohibitions ("never give legal advice").

Controlling Output Format

The system prompt is the right place to specify output format, since it applies to every response in the conversation. You can specify: the maximum response length; whether to use markdown formatting (appropriate for web interfaces, inappropriate for voice or SMS); the specific sections to include in every response; the tone and formality level; and whether to add caveats or disclaimers. For structured output tasks, include the output schema in the system prompt as an XML tag: <output_schema>{ "name": "string", "score": "1-10" }</output_schema>. Claude will apply this schema consistently across all responses without repeating it in the user message.

Handling Edge Cases and Off-Topic Requests

Users inevitably send requests that fall outside your application's intended scope. Define how Claude should handle these in the system prompt: "If the user asks a question unrelated to software security, respond: 'I can only help with software security topics. Would you like to ask a security-related question?'" Without this guidance, Claude's helpful defaults may cause it to answer off-topic questions, which can lead to unexpected responses and increased token costs. For customer service bots, include a list of topics the bot can and cannot assist with.

Testing and Iterating on System Prompts

System prompt development follows an iterative process: write the initial prompt, test against 20-30 representative inputs (including adversarial ones), identify failure modes, and refine. Anthropic's workbench and the API are both good testing surfaces. Maintain a test suite of expected input/output pairs and run it against every system prompt change. For high-stakes applications, use LLM-as-judge evaluation: have a separate Claude instance evaluate each response against your quality criteria. Version control your system prompts with the same rigour as code — a bad system prompt deployed to production has the same impact as a bug in your application code.

Try These Tools

CLD

Claude Prompt Builder

Build structured system prompts for Anthropic Claude using XML tags.

PBD

AI Prompt Builder

Build structured AI prompts with role, task, context, and output format fields.

OPT

AI Prompt Optimizer

Analyze and improve AI prompts with rule-based suggestions.

FMT

AI Prompt Formatter

Clean and format AI prompts by removing invisible characters and normalizing whitespace.

FAQ

How long should a system prompt be?: System prompts between 500 and 2,000 tokens work well for most applications. Very short prompts (under 100 tokens) leave too much to the model's defaults; very long prompts (over 5,000 tokens) may cause the model to miss instructions buried in the middle. Use prompt caching for long system prompts to minimise their cost impact.
Can users override the system prompt instructions?: Claude treats the system prompt with higher priority than user messages, so it will generally follow system prompt constraints even when users try to override them. However, very sophisticated injection attempts can sometimes succeed. Treat the system prompt as a strong but not absolute barrier, and implement defence-in-depth through output validation.
Should I repeat important instructions in the user message as well?: For the most critical constraints, yes. Research shows that models more reliably follow instructions that appear close to the end of the context. Placing a brief reminder of the most important constraint at the end of the user message (e.g., "Remember to respond only in JSON.") improves adherence on complex tasks.

Prompt engineering is the practice of crafting inputs to language models that reliably pro...

Getting Structured Output from LLMs

Getting an LLM to reliably return structured data like JSON is one of the most important s...

Prompt Injection: Risks and Prevention

Prompt injection is the most significant security vulnerability in LLM-powered application...