Clean and Format Raw LLM Markdown Output
LLMs frequently produce markdown that is technically valid but visually inconsistent: headings that skip levels (# followed immediately by ###), bullet lists that mix hyphens and asterisks, multiple blank lines between sections, bold text that uses both **word** and __word__ interchangeably, and trailing whitespace on every line. Feeding this output directly to a renderer produces acceptable results, but storing or re-using it in a pipeline causes subtle issues when downstream tools compare strings or apply further transformations. The output formatter normalises all of these inconsistencies to a canonical form: headings follow a strict hierarchy, list markers are unified, blank lines between sections are reduced to exactly one, and inline formatting uses a consistent style. The normalised output is semantically identical but tokenises more efficiently (useful if it re-enters a prompt as context) and diffs cleanly in version control. This example shows a realistic LLM response to a "write a project README" prompt, with the typical inconsistencies that appear when a model mixes training data from many different markdown styles. Run it through the formatter to see the before/after diff and understand which normalisation rules triggered.
# Project Overview This project is a **REST API** built with Node.js. ## Features - Authentication * Rate limiting - Logging ### Installation Run the following: ``` npm install ``` ### Usage See the docs folder for details.
FAQ
- Why does LLM output have inconsistent markdown?
- LLMs are trained on diverse markdown sources with different style conventions. Without strict output format instructions, the model mixes conventions from its training data, producing syntactically valid but stylistically inconsistent markdown.
- Does formatting change the meaning of the content?
- No. The formatter only adjusts whitespace, list markers, and heading levels to ensure consistent style. The actual text content, code blocks, and links are preserved unchanged.
- Should I format output before or after storing it?
- Format before storing. Normalised content diffs cleanly, is easier to search, and produces consistent results when rendered. Storing raw output and formatting on render means every reader pays the transformation cost.
Related Examples
Web-scraped content is almost never ready to feed directly to an LLM. HTML tags,...
Normalize Text with Smart Quotes and Em Dashes for LLM InputText copied from word processors, PDFs, and websites often contains Unicode typo...
Repair Malformed JSON from an AI ResponseLLMs frequently return invalid JSON despite being instructed to produce valid JS...
Clean and Format a Messy PromptPrompts written in word processors or copied from websites often contain invisib...