$devtoolkit.sh/examples/encoding/unicode-emojis

Unicode and Emoji Encoding

Unicode assigns a unique code point (U+XXXX) to every character in every human writing system plus thousands of symbols and emoji. In source code and databases you may encounter characters in their native form, as Unicode escapes, or as UTF-8 byte sequences. This collection shows the relationship between the visual character, its Unicode code point, its UTF-8 encoding, and its JavaScript/Python escape representation. The Unicode encoder converts any text to its escape sequences and back, which is useful for debugging encoding issues in APIs and databases.

Example
# Emoji with Unicode code points
😀 = U+1F600 = \uD83D\uDE00 (JS surrogate pair) = \U0001F600 (Python)
❤️  = U+2764 U+FE0F = \u2764\uFE0F
🌍 = U+1F30D = \uD83C\uDF0D
✅ = U+2705 = \u2705

# Common symbols
© = U+00A9 = \u00A9
→ = U+2192 = \u2192
∞ = U+221E = \u221E
° = U+00B0 = \u00B0

# Latin extended
é = U+00E9 = \u00E9 (UTF-8: 0xC3 0xA9)
ñ = U+00F1 = \u00F1 (UTF-8: 0xC3 0xB1)
[ open in Unicode Encode → ]

FAQ

What is the difference between Unicode and UTF-8?
Unicode is the standard that assigns code points to characters. UTF-8 is one of several encodings that define how those code points are stored as bytes. UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII.
Why do some emoji show as two characters in JavaScript?
JavaScript strings are UTF-16 encoded. Code points above U+FFFF (including most emoji) require two 16-bit code units called a surrogate pair. The emoji length in JavaScript may be 2 even though visually it is one character.
How do I safely store emoji in a MySQL database?
Use the utf8mb4 character set in MySQL. The older utf8 charset only supports 3-byte UTF-8 sequences and cannot store emoji (which require 4 bytes). Set the column and connection charset to utf8mb4.

Related Examples

/examples/encoding/unicode-emojisv1.0.0