Do I need to add special tokens myself when using Ollama?

No. When you use Ollama's messages API format, Ollama applies the chat template automatically. You only need to manually format with special tokens if you are calling the raw generate endpoint or running inference directly with the Hugging Face library.

How does Llama 3 system prompt performance compare to GPT-4?

Llama 3.1 70B follows system prompts reliably for most instruction types, approaching GPT-4o quality. Llama 3.1 8B is more prone to ignoring constraints under adversarial user inputs. Both benefit from concise, specific instructions over long, vague system prompts.

What is the maximum system prompt length for Llama 3?

Technically up to the full 128K context window, but practically 200-600 tokens is the sweet spot. Very long system prompts can cause Llama 3 to ignore instructions from earlier in the prompt, especially when the context fills with conversation history.

Write a System Prompt for Llama 3

Llama 3 uses a structured chat template with special tokens that must be applied correctly for the model to follow instructions reliably. Unlike GPT and Claude where the API handles template formatting transparently, local deployments of Llama 3 (via Ollama, vLLM, or direct inference) require the developer to either use the Hugging Face apply_chat_template method or manually format prompts with the correct special tokens. This example shows the complete formatted prompt for a code review assistant. The system prompt for Llama 3 should be concise and directive, similar to Claude's style. Llama 3 was instruction-tuned with a focus on following direct instructions without elaborate preambles. Unlike GPT models where longer, more elaborate system prompts generally improve results, Llama 3 performs best with tight, action-oriented system prompts under 500 tokens. Avoid vague quality instructions like "be helpful and thorough" — use specific behavioral constraints instead. For Llama 3 specifically, a few formatting tips improve instruction following: use numbered lists for multi-step instructions, avoid nested bullet points deeper than two levels, and put the most important constraint last (Llama tends to prioritize later instructions over earlier ones when they conflict). Test your system prompt with adversarial user inputs before production deployment, as open-source models without safety fine-tuning may be more susceptible to instruction override.

Example

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a senior software engineer conducting code reviews.

Review scope:
1. Identify bugs and logic errors first
2. Note security vulnerabilities (SQL injection, XSS, insecure dependencies)
3. Comment on performance bottlenecks for functions over 20 lines
4. Skip style comments unless they affect readability significantly

Format: Use code blocks for all code references. End each review with a 1-line summary rating: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.<|eot_id|>
<|start_header_id|>user<|end_header_id|>

[ open in LLaMA Prompt Builder → ]

FAQ

Do I need to add special tokens myself when using Ollama?: No. When you use Ollama's messages API format, Ollama applies the chat template automatically. You only need to manually format with special tokens if you are calling the raw generate endpoint or running inference directly with the Hugging Face library.
How does Llama 3 system prompt performance compare to GPT-4?: Llama 3.1 70B follows system prompts reliably for most instruction types, approaching GPT-4o quality. Llama 3.1 8B is more prone to ignoring constraints under adversarial user inputs. Both benefit from concise, specific instructions over long, vague system prompts.
What is the maximum system prompt length for Llama 3?: Technically up to the full 128K context window, but practically 200-600 tokens is the sweet spot. Very long system prompts can cause Llama 3 to ignore instructions from earlier in the prompt, especially when the context fills with conversation history.

Related Examples

Run Llama with the Ollama API

Ollama is the most popular tool for running Llama and other open-source models l...

Count Tokens for a Llama 3 Prompt

Llama 3 uses a custom tokenizer based on tiktoken's BPE algorithm with a vocabul...

Write a Claude System Prompt for a Research Assistant

Claude's system prompt is the highest-priority instruction channel in the Anthro...

Build an OpenAI System Prompt for a Coding Assistant

A well-structured system prompt is the most important factor in GPT model output...