URL-Encode Special Characters
URL percent-encoding (also called URL encoding) is a critical security and correctness practice that every developer who builds web applications needs to understand. When user-supplied values are appended to URLs without encoding, it can cause query string corruption (an & in a value is interpreted as a parameter separator), path traversal attacks (a / in a value navigates to a different path), and in some contexts, injection vulnerabilities. The solution is systematic encoding of any user-controlled value before URL assembly. This collection shows the most commonly encountered encoding cases. Space encoding has two variants that cause confusion: %20 is the universal percent-encoding that works correctly in any part of a URL (paths, query strings, fragments). The + character represents a space only in the application/x-www-form-urlencoded format used by HTML form submissions in query strings. Using + in a URL path or anchor fragment is an error — browsers interpret + as a literal plus sign, not a space. Special characters in query values require careful attention: the & ampersand separates query parameters (?name=Alice&role=admin), so an & inside a parameter value (like "John & Jane") must be encoded as %26. Similarly, the = sign separates keys from values, so an = inside a key or value must be encoded as %3D. Missing these encodings creates query string parsing bugs that may silently truncate parameters or merge parameter values. Email addresses in query strings are particularly tricky: the @ sign (U+0040) is encoded as %40, and the + sign commonly used in email addresses ([email protected]) must be encoded as %2B so it isn't interpreted as a space character. Failing to encode email addresses in URLs is one of the most common URL encoding bugs in email confirmation and notification systems. Path segments with slashes: a file path like "my document.pdf" contains a space that must be encoded as %20 in the URL path. If the path also contained forward slashes, those would need to be encoded as %2F to prevent them from being interpreted as path segment separators. Emoji and non-ASCII encoding: characters above ASCII (128) require multi-byte UTF-8 encoding followed by percent-encoding of each byte. The wave emoji 👋 (U+1F44B) encodes to the four UTF-8 bytes F0 9F 91 8B, becoming %F0%9F%91%8B in the URL. This is why emoji in URLs look like a string of hex percent sequences. The unreserved characters that never need encoding: A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~). All other characters technically require encoding when used as data values, even if many browsers accept them unencoded. Implementation: in JavaScript, encodeURIComponent() encodes all characters except A-Z, a-z, 0-9, and the four unreserved punctuation marks. Use it for encoding individual query parameter values. Never use the weaker encodeURI() for parameter values — it doesn't encode characters like &, =, and ? that have special meaning in query strings.
# Space Hello World → Hello%20World # Special characters in query values name=John & Jane → name=John%20%26%20Jane [email protected] → email=user%2Btag%40example.com # Path with slashes /files/my document.pdf → /files/my%20document.pdf # Emoji (UTF-8 multi-byte) Hello 👋 → Hello%20%F0%9F%91%8B # Full URL example https://example.com/search?q=C++ programming&lang=en
FAQ
- What is the difference between %20 and + for spaces?
- %20 is the universal percent-encoding for a space that works in any part of a URL. The + character represents a space only in HTML form query strings (application/x-www-form-urlencoded). Use %20 in path segments.
- Which characters are safe in URLs without encoding?
- Unreserved characters that never need encoding: A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~). All other characters should be percent-encoded when used as data values.
- How are non-Latin characters encoded in URLs?
- Non-ASCII characters are first encoded as UTF-8 bytes, then each byte is percent-encoded. The emoji 👋 is U+1F44B, encoded as the four UTF-8 bytes F0 9F 91 8B, becoming %F0%9F%91%8B.
Related Examples
Base64 encoding is used in dozens of everyday web contexts including HTTP Basic ...
HTML Entity Encoding ExamplesHTML entity encoding converts characters that have special meaning in HTML into ...
OAuth 2.0 Authorization Code FlowOAuth 2.0 authorization code flow is the recommended authentication grant type f...