A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation forms")—including UTF-8, UTF-16 and UTF-32—but which may or may not actually be accompanied by a CCSID number to indicate that this encoding is being used.

Difference between a code page and a CCSID

The terms code page and CCSID are often used interchangeably, even though they are not synonymous. A code page may be only part of what makes up a CCSID. The following definitions from IBM help to illustrate this point:

Examples

The following examples show how some CCSIDs are made up of other CCSIDs.

CCSID 932[4]
Character set Code page CCSID Encoding scheme
01122 00897 897 SBCS
00370 00301 301 DBCS
CCSID 942[5]
Character set Code page CCSID Encoding scheme
01172 01041 1041 SBCS
00370 00301 301 DBCS
CCSID 5028[6]
Character set Code page CCSID Encoding scheme
01170 00897 4993 SBCS
00370 00301 301 DBCS

All three of these variant Shift-JIS CCSIDs are multi-byte character sets (MBCS): the single-byte character set (SBCS) portion of each CCSID is different. The double-byte character set (DBCS) portion is the same across each CCSID. CCSID 5028 uses an updated code page 897 called CCSID 4993. CCSID 932 uses the original code page 897, which is CCSID 897. CCSID 942 uses a different SBCS from the other two CCSIDs, which is 1041.

Also notice how CCSID 5028 and 4993 are different by 4096 (1000 in hexadecimal) from the predecessor CCSID with the same code page identifier. This is a common way that CDRA denotes an upgraded CCSID.

There are a few reasons for this complexity:

References

  1. ^ a b c "IBM Terminology—Terms C". IBM. Retrieved 2013-01-25.
  2. ^ "Character Data Representation Architecture". IBM. Appendix A. Encoding Schemes. Retrieved 2019-06-29.
  3. ^ "Character Data Representation Architecture". IBM. Chapter 3. CDRA Identifiers, section "Long-Form Identification". Retrieved 2019-06-29.
  4. ^ "Japanese PC Data Mixed including 1880 UDC". Globalization. IBM. Archived from the original on February 20, 2012. Retrieved November 29, 2011.
  5. ^ "Japanese PC Data Mixed including 1880 UDC, Extended SBCS". Globalization. IBM. Archived from the original on December 1, 2014. Retrieved November 29, 2011.
  6. ^ "Japanese PC Data Mixed including 1880 UDC (Katakana - PC common set for SBCS)". Globalization. IBM. Archived from the original on November 29, 2014. Retrieved November 29, 2011.
  7. ^ "Us-en_software_HP". 9 November 2020.