Unicode and Emoji Encoding
Unicode assigns a unique code point (U+XXXX) to every character in every human writing system plus thousands of symbols and emoji. In source code and databases you may encounter characters in their native form, as Unicode escapes, or as UTF-8 byte sequences. This collection shows the relationship between the visual character, its Unicode code point, its UTF-8 encoding, and its JavaScript/Python escape representation. The Unicode encoder converts any text to its escape sequences and back, which is useful for debugging encoding issues in APIs and databases.
# Emoji with Unicode code points 😀 = U+1F600 = \uD83D\uDE00 (JS surrogate pair) = \U0001F600 (Python) ❤️ = U+2764 U+FE0F = \u2764\uFE0F 🌍 = U+1F30D = \uD83C\uDF0D ✅ = U+2705 = \u2705 # Common symbols © = U+00A9 = \u00A9 → = U+2192 = \u2192 ∞ = U+221E = \u221E ° = U+00B0 = \u00B0 # Latin extended é = U+00E9 = \u00E9 (UTF-8: 0xC3 0xA9) ñ = U+00F1 = \u00F1 (UTF-8: 0xC3 0xB1)
FAQ
- What is the difference between Unicode and UTF-8?
- Unicode is the standard that assigns code points to characters. UTF-8 is one of several encodings that define how those code points are stored as bytes. UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII.
- Why do some emoji show as two characters in JavaScript?
- JavaScript strings are UTF-16 encoded. Code points above U+FFFF (including most emoji) require two 16-bit code units called a surrogate pair. The emoji length in JavaScript may be 2 even though visually it is one character.
- How do I safely store emoji in a MySQL database?
- Use the utf8mb4 character set in MySQL. The older utf8 charset only supports 3-byte UTF-8 sequences and cannot store emoji (which require 4 bytes). Set the column and connection charset to utf8mb4.
Related Examples
HTML entity encoding converts characters that have special meaning in HTML into ...
URL-Encode Special CharactersURL encoding (percent-encoding) converts characters that are not allowed in URLs...
Common Base64 Encoding ExamplesBase64 encoding is used in dozens of everyday web contexts including HTTP Basic ...