Count Tokens for a Llama 3 Prompt

Llama 3 uses a custom tokenizer based on tiktoken's BPE algorithm with a vocabulary of 128,256 tokens — larger than GPT-4's cl100k_base vocabulary of 100,276 tokens. The larger vocabulary means Llama 3 tokenizes many common words and subwords more efficiently than earlier Llama versions, with most plain English text tokenizing at approximately the same rate as GPT models. This example counts tokens for a typical Llama 3 chat prompt including the special tokens that the chat template adds. Llama 3 chat format uses special tokens to delimit system, user, and assistant turns. The chat template adds <|begin_of_text|> at the start, <|start_header_id|>role<|end_header_id|> before each turn, and <|eot_id|> at the end of each turn. These special tokens are not visible in the raw prompt but are counted and billed in hosted deployments. The token counter includes these special tokens in its count, giving you the true billable token count for chat format requests. For Llama 3.1 deployed via hosted APIs (Meta AI, Together AI, Groq, Fireworks), the context window is 128,000 tokens. Local deployment with Ollama defaults to 2,048 tokens but can be extended with the num_ctx option. Always check the context window limit for your specific deployment environment.

Example
Count tokens for this Llama 3.1 chat prompt (including special tokens):

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful software engineering assistant.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Explain the difference between a mutex and a semaphore in concurrent programming.<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
[ open in LLaMA Token Counter → ]

FAQ

Does Llama 3 use the same tokenizer as GPT models?
No. Llama 3 uses a custom BPE tokenizer with a 128,256-token vocabulary, compared to GPT's cl100k_base with 100,276 tokens. The larger vocabulary means many words and common subwords get single-token representations, making Llama 3 slightly more token-efficient than GPT on English text.
What are Llama 3 special tokens?
Llama 3 chat format uses <|begin_of_text|>, <|start_header_id|>, <|end_header_id|>, and <|eot_id|> as structural tokens. These are added by the chat template and count toward the token limit. The <|eot_id|> (end of turn) token is particularly important as the stop token for generation.
What is the Llama 3.1 context window?
Llama 3.1 models support a 128,000-token context window, matching GPT-4o. The original Llama 3 8B and 70B models had an 8,192-token context window. Always check which version and configuration your deployment uses.

Related Examples