Why do LLMs return invalid JSON?

Models learn from training data that includes Python dicts, JavaScript objects, and other JSON-like formats that use single quotes, trailing commas, or unquoted keys. The model confuses these formats, especially when the prompt does not strictly enforce the exact JSON spec.

What is the most common JSON error from AI responses?

Trailing commas are the most frequent issue, followed by single-quoted strings. Using json_mode or structured output APIs from OpenAI or Anthropic greatly reduces (but does not eliminate) these errors.

Can I prevent this by using structured outputs?

Structured output modes (OpenAI's json_object and json_schema modes, Anthropic's tool use) enforce valid JSON at the API level, but have limitations — they require defining a schema upfront and may not support all JSON features.

Repair Malformed JSON from an AI Response

LLMs frequently return invalid JSON despite being instructed to produce valid JSON output. Common failure modes include trailing commas after the last object property, single-quoted strings instead of double-quoted, unquoted keys, Python-style None and True/False instead of null and true/false, and truncated JSON when the response hits the max_tokens limit. This example shows a realistic malformed AI response and demonstrates how the repair tool fixes each issue automatically. The repair tool applies a sequence of targeted transformations: it converts single quotes to double quotes while preserving apostrophes in string values, removes trailing commas before closing brackets and braces, replaces None/True/False with their JSON equivalents, and attempts to close unclosed brackets and braces when truncation is detected. Each transformation is applied in order because fixing one issue (like single quotes) can expose the next (like an unquoted key). For production AI pipelines, JSON repair should be the first step in your response parsing logic before JSON.parse(). If repair still fails to produce valid JSON, fall back to extracting the partial data with a lenient parser or prompting the model again with the specific error. Never crash the pipeline because the model returned imperfect JSON.

Example

{
  'name': 'Alice Johnson',
  'age': 32,
  'active': True,
  'score': None,
  'tags': ['ai', 'developer', 'python',],
  'address': {
    'city': 'San Francisco',
    'zip': '94105',
  },
}

[ open in AI JSON Repair Tool → ]

FAQ

Why do LLMs return invalid JSON?: Models learn from training data that includes Python dicts, JavaScript objects, and other JSON-like formats that use single quotes, trailing commas, or unquoted keys. The model confuses these formats, especially when the prompt does not strictly enforce the exact JSON spec.
What is the most common JSON error from AI responses?: Trailing commas are the most frequent issue, followed by single-quoted strings. Using json_mode or structured output APIs from OpenAI or Anthropic greatly reduces (but does not eliminate) these errors.
Can I prevent this by using structured outputs?: Structured output modes (OpenAI's json_object and json_schema modes, Anthropic's tool use) enforce valid JSON at the API level, but have limitations — they require defining a schema upfront and may not support all JSON features.

Related Examples

Clean HTML-Heavy Text for AI Processing

Web-scraped content is almost never ready to feed directly to an LLM. HTML tags,...

Format CSV Data for AI Fine-Tuning

Fine-tuning LLMs on custom datasets requires converting raw training data into t...

Build a JSON Schema for Structured AI Outputs

Structured output APIs from OpenAI (json_schema mode) and Anthropic (tool use) r...