What is the difference between temperature and top_p?

Both control output randomness. Temperature scales the probability distribution of next tokens; top_p (nucleus sampling) truncates the distribution to the top p probability mass. OpenAI recommends using one or the other, not both simultaneously.

How do I format a multi-turn conversation?

Include previous messages in the messages array alternating between user and assistant roles. The API has no memory — you must send the full conversation history with every request to maintain context.

What model should I use for most tasks?

GPT-4o is the best general-purpose model for complex tasks. GPT-4o-mini offers 95%+ of GPT-4o quality at a fraction of the cost for simpler tasks. Use o1 or o3 for reasoning-intensive tasks like math, coding competitions, and multi-step planning.

Build an OpenAI Chat Completion Request

The Chat Completion API is the primary interface for all GPT models and the foundation of every OpenAI-powered application. A well-formed request includes a model identifier, a messages array with at least one user role message, and optional parameters to control output behavior. This example shows a complete request for a code explanation task with a system prompt, user message, temperature, and max_tokens configured for concise technical responses. The messages array follows a strict role protocol: system sets the persona and instructions, user carries the human turn, and assistant carries previous AI responses for multi-turn conversations. The order of messages matters — the system message should always come first, followed by alternating user and assistant messages. Inserting a second system message mid-conversation is not standard and may behave differently across models. Key parameters: temperature (0.0–2.0) controls randomness — use 0.0–0.3 for factual tasks and code, 0.7–1.0 for creative writing. max_tokens caps the response length in tokens (not words). top_p is an alternative randomness control; use either temperature or top_p, not both. The model field must exactly match an available model ID including version suffix (e.g., "gpt-4o-2024-08-06").

Example

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior software engineer. Explain code clearly and concisely."
    },
    {
      "role": "user",
      "content": "Explain what this function does:\n\nconst debounce = (fn, delay) => {\n  let timer;\n  return (...args) => {\n    clearTimeout(timer);\n    timer = setTimeout(() => fn(...args), delay);\n  };\n};"
    }
  ],
  "temperature": 0.2,
  "max_tokens": 300
}

[ open in OpenAI Request Builder → ]

FAQ

What is the difference between temperature and top_p?: Both control output randomness. Temperature scales the probability distribution of next tokens; top_p (nucleus sampling) truncates the distribution to the top p probability mass. OpenAI recommends using one or the other, not both simultaneously.
How do I format a multi-turn conversation?: Include previous messages in the messages array alternating between user and assistant roles. The API has no memory — you must send the full conversation history with every request to maintain context.
What model should I use for most tasks?: GPT-4o is the best general-purpose model for complex tasks. GPT-4o-mini offers 95%+ of GPT-4o quality at a fraction of the cost for simpler tasks. Use o1 or o3 for reasoning-intensive tasks like math, coding competitions, and multi-step planning.

Related Examples

Build an OpenAI System Prompt for a Coding Assistant

A well-structured system prompt is the most important factor in GPT model output...

Count Tokens for GPT-4o with tiktoken

OpenAI models use the tiktoken library with BPE (Byte Pair Encoding) to tokenize...

Build a JSON Schema for Structured AI Outputs

Structured output APIs from OpenAI (json_schema mode) and Anthropic (tool use) r...

Build an Anthropic Messages API Request

The Anthropic Messages API is the primary interface for all Claude models. Unlik...