Charset Tools - Character Set & Unicode Tools and Conversions

Convert character to Unicode code point (number)

This tool shows Unicode details about any character (letter), including decimal/hex code point and HTML/URL encode syntax.

Convert code point (number) to character

What is a character set?

A character set or charset is the list of characters a computer can show or use, for example letters, numbers, punctuation, symbols and emoji. The character set used in a document or application determines which characters can be displayed. ASCII and Unicode are examples of character sets.

A character encoding is the rule that tells the computer how to turn those characters into bytes for storage or transfer. Common encodings include 7-bit ASCII, ISO-8859-1, Windows-1252, UTF-8 and UTF-16. The most popular encoding is UTF-8, which can represent characters from all writing systems.

What is Unicode?

Unicode is a widely used standard that assigns a unique number (a code point) to every character. This lets text be displayed and exchanged consistently across different devices and programs.

UTF-8 and UTF-16 are two common character encodings that implement Unicode. They differ in how they turn code points into bytes: UTF-8 uses 1-4 bytes per character, while UTF-16 typically uses 2 bytes for many characters and 4 bytes for less common ones. Important: both encodings support the same Unicode characters.

What is a code point?

A code point is a unique number assigned to each character in a character set. It serves as an identifier for that character.

For example, in the Unicode character sets, the code point for the letter "A" is U+0041 (65 in decimal), the code point for the euro sign "€" is U+20AC (8364 in decimal), and the code point for the emoji "😀" is U+1F600 (128512 in decimal).

Code points are typically represented in hexadecimal notation, prefixed with "U+" to indicate that they belong to the Unicode standard.

What is a Unicode plane?

A Unicode plane is a range of code points in the Unicode character set. The Unicode standard divides the code points into 17 planes (numbered 0-16), each containing 65,536 code points. The first plane (plane 0) is called the Basic Multilingual Plane (BMP) and contains the most commonly used characters. Each plane can be divided into blocks, which are groups of related characters. For example, the "Emoticons" block contains various emoji characters.

What is a font?

A font is a collection of characters that share a common design. A popular font is "Arial", it includes characters such as letters, numbers, punctuation marks, and symbols. A font defines the visual appearance of these characters, such as their shape, size, and style. Each visual representation of a character in a font is called a glyph.

Find a list of all the Charset Tools on the right-side menu, including tools to fix garbled text, convert to/from UTF-8, convert Punycode, and more.

Information about setting a charset has been moved to its own page.