This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set (abbr. UCS, official designation: ISO/IEC 10646), is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

UCS has a potential capacity of over 1 million characters. Each UCS character is abstractly represented by a code point, an integer between 0 and 1,114,111 (1,114,112 = 2²⁰ + 2¹⁶ or 17 × 2¹⁶ = 0x110000 code points), used to represent each character within the internal logic of text processing software. As of Unicode 15.1, released in September 2023, 293,792 (26%) of these code points are allocated, 149,878 (13%) have been assigned characters, 137,468 (12%) are reserved for private use, 2,048 are used to enable the mechanism of surrogates, and 66 are designated as noncharacters, leaving the remaining 820,320 (74%) unallocated. The number of encoded characters is made up as follows:

149,641 graphical characters (some of which do not have a visible glyph, but are still counted as graphical)
237 special purpose characters for control and formatting.

ISO maintains the basic mapping of characters from character name to code point. Often, the terms character and code point will be used interchangeably. However, when a distinction is made, a code point refers to the integer of the character: what one might think of as its address. Meanwhile, a character in ISO/IEC 10646 includes the combination of the code point and its name, Unicode adds many other useful properties to the character set, such as block, category, script, and directionality.

In addition to the UCS, the supplementary Unicode Standard, (not a joint project with ISO, but rather a publication of the Unicode Consortium,) provides other implementation details such as:

mappings between UCS and other character sets
different collations of characters and character strings for different languages
an algorithm for laying out bidirectional text ("the BiDi algorithm"), where text on the same line may shift between left-to-right ("LTR") and right-to-left ("RTL")
a case-folding algorithm

Computer software end users enter these characters into programs through various input methods, for example, physical keyboards or virtual character palettes.

The UCS can be divided in various ways, such as by plane, block, character category, or character property.^[1]

	Baseline advance	No baseline advance
Allow line-break (Separators)	Space U+0020	Zero Width Space U+200B
Inhibit line-break (Joiners)	No-Break Space U+00A0	Word Joiner U+2060

Property	Example	Details
Name	LATIN CAPITAL LETTER A	This is a permanent name assigned by the joint cooperation of Unicode and the ISO UCS. A few known poorly chosen names exist and are acknowledged (e.g. U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET, which is misspelled – should be BRACKET) but will not be changed, in order to ensure specification stability.^[16]
Code Point	U+0041	The Unicode code point is a number also permanently assigned along with the "Name" property and included in the companion UCS. The usual custom is to represent the code point as hexadecimal number with the prefix "U+" in front.
Representative Glyph	^[17]	The representative glyphs are provided in code charts.^[18]
General Category	Uppercase_Letter	The general category^[19] is expressed as a two-letter sequence such as "Lu" for uppercase letter or "Nd", for decimal digit number.
Combining Class	Not_Reordered (0)	Since diacritics and other combining marks can be expressed with multiple characters in Unicode the "Combining Class" property allows characters to be differentiated by the type of combining character it represents. The combining class can be expressed as an integer between 0 and 255 or as a named value. The integer values allow the combining marks to be reordered into a canonical order to make string comparison of identical strings possible.
Bidirectional Category	Left_To_Right	Indicates the type of character for applying the Unicode bidirectional algorithm.
Bidirectional Mirrored	no	Indicates the character's glyph must be reversed or mirrored within the bidirectional algorithm. Mirrored glyphs can be provided by font makers, extracted from other characters related through the "Bidirectional Mirroring Glyph" property or synthesized by the text rendering system.
Bidirectional Mirroring Glyph	N/A	This property indicates the code point of another character whose glyph can serve as the mirrored glyph for the present character when mirroring within the bidirectional algorithm.
Decimal Digit Value	NaN	For numerals, this property indicates the numeric value of the character. Decimal digits have all three values set to the same value, presentational rich text compatibility characters and other Arabic-Indic non-decimal digits typically have only the latter two properties set to the numeric value of the character while numerals unrelated to Arabic Indic digits such as Roman Numerals or Hanzhou/Suzhou numerals typically have only the "Numeric Value" indicated.
Digit Value	NaN
Numeric Value	NaN
Ideographic	False	Indicates the character is a CJK ideograph: a logograph in the Han script.^[20]
Default Ignorable	False	Indicates the character is ignorable for implementations and that no glyph, last resort glyph, or replacement character need be displayed.
Deprecated	False	Unicode never removes characters from the repertoire, but on occasion Unicode has deprecated a small number of characters.

Character reference overview

Planes

Blocks

Categories

Special-purpose characters

Byte order mark

Mathematical invisibles

Fraction slash

Bidirectional neutral formatting

Bidirectional general formatting

Interlinear annotation characters

Script-specific

Others

Characters vs code points

Whitespace, joiners, and separators

Grapheme joiners and non-joiners

Word joiners and separators

Other separators

Spaces

Line-break control characters

Types of code point

Assigned characters

Private-use characters

Surrogates

Noncharacters

Reserved code points

Characters, grapheme clusters and glyphs

Compatibility characters

Character properties

See also

References

External links