URL-Encode Special Characters

URL percent-encoding (also called URL encoding) is a critical security and correctness practice that every developer who builds web applications needs to understand. When user-supplied values are appended to URLs without encoding, it can cause query string corruption (an & in a value is interpreted as a parameter separator), path traversal attacks (a / in a value navigates to a different path), and in some contexts, injection vulnerabilities. The solution is systematic encoding of any user-controlled value before URL assembly. This collection shows the most commonly encountered encoding cases. Space encoding has two variants that cause confusion: %20 is the universal percent-encoding that works correctly in any part of a URL (paths, query strings, fragments). The + character represents a space only in the application/x-www-form-urlencoded format used by HTML form submissions in query strings. Using + in a URL path or anchor fragment is an error — browsers interpret + as a literal plus sign, not a space. Special characters in query values require careful attention: the & ampersand separates query parameters (?name=Alice&role=admin), so an & inside a parameter value (like "John & Jane") must be encoded as %26. Similarly, the = sign separates keys from values, so an = inside a key or value must be encoded as %3D. Missing these encodings creates query string parsing bugs that may silently truncate parameters or merge parameter values. Email addresses in query strings are particularly tricky: the @ sign (U+0040) is encoded as %40, and the + sign commonly used in email addresses ([email protected]) must be encoded as %2B so it isn't interpreted as a space character. Failing to encode email addresses in URLs is one of the most common URL encoding bugs in email confirmation and notification systems. Path segments with slashes: a file path like "my document.pdf" contains a space that must be encoded as %20 in the URL path. If the path also contained forward slashes, those would need to be encoded as %2F to prevent them from being interpreted as path segment separators. Emoji and non-ASCII encoding: characters above ASCII (128) require multi-byte UTF-8 encoding followed by percent-encoding of each byte. The wave emoji 👋 (U+1F44B) encodes to the four UTF-8 bytes F0 9F 91 8B, becoming %F0%9F%91%8B in the URL. This is why emoji in URLs look like a string of hex percent sequences. The unreserved characters that never need encoding: A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~). All other characters technically require encoding when used as data values, even if many browsers accept them unencoded. Implementation: in JavaScript, encodeURIComponent() encodes all characters except A-Z, a-z, 0-9, and the four unreserved punctuation marks. Use it for encoding individual query parameter values. Never use the weaker encodeURI() for parameter values — it doesn't encode characters like &, =, and ? that have special meaning in query strings.

Example
# Space
Hello World → Hello%20World

# Special characters in query values
name=John & Jane → name=John%20%26%20Jane
[email protected] → email=user%2Btag%40example.com

# Path with slashes
/files/my document.pdf → /files/my%20document.pdf

# Emoji (UTF-8 multi-byte)
Hello 👋 → Hello%20%F0%9F%91%8B

# Full URL example
https://example.com/search?q=C++ programming&lang=en
[ open in URL Encode → ]

FAQ

What is the difference between %20 and + for spaces?
%20 is the universal percent-encoding for a space that works in any part of a URL. The + character represents a space only in HTML form query strings (application/x-www-form-urlencoded). Use %20 in path segments.
Which characters are safe in URLs without encoding?
Unreserved characters that never need encoding: A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~). All other characters should be percent-encoded when used as data values.
How are non-Latin characters encoded in URLs?
Non-ASCII characters are first encoded as UTF-8 bytes, then each byte is percent-encoded. The emoji 👋 is U+1F44B, encoded as the four UTF-8 bytes F0 9F 91 8B, becoming %F0%9F%91%8B.

Related Examples