DECwindows Motif supports the following Simplified Chinese codesets:
The ASCII, GB2312-80 and extended GB character sets are combined to form the DEC Hanzi codeset.
DEC Hanzi, or Simplified Chinese and denoted as dechanzi, uses a 2-byte data representation for symbols and ideographic characters defined in the GB2312-80 character set. To differentiate GB2312-80 codes from ASCII codes, the most significant bit (MSB) of the first byte is always set on while that of the second byte is on for GB2312-80 and off for extended GB as shown in Figure 2-1.
ASCII | 0 | |||||
GB2312-80 | 1 | 1 | ||||
First Byte | Second Byte | |||||
Extended GB | 1 | 0 | ||||
First Byte | Second Byte |
The first byte of a 2-byte code determines its row number, while the second byte determines its column number.
The following formulas illustrate the code of a GB2312-80 character or an extended GB character in relation to its row and column numbers.
GB2312-80 character:
First byte = A0 + row number
Second byte = A0 + column number
Extended GB character:
First byte = A0 + row number
Second byte = 20 + column number
For example, if a character is positioned at the first column of the 16th row on the GB2312-80 code plane, its encoding value is calculated as follows:
First byte = A0 (hex) + 16 = B0 (hex) Second byte = A0 (hex) + 01 = A1 (hex)
The resulting encoded value is B0A1.
Similarly, if a character is positioned at the first column of the 16th row on the extended GB code plane, its encoding value is calculated as follows:
First byte = A0 (hex) + 16 = B0 (hex)
Second byte = 20 (hex) + 01 = 21 (hex)
The resulting encoded value is B021.
Figure 2-2 illustrates the division of a 2-byte code space and the position of the Chinese character sets.
Second Byte | |||||
00 | 20 | 80 | A0 | FF | |
First Byte |
20 | ||||
80 | |||||
A0 | |||||
FF | Extended GB | GB2312-80 |
The GB18030 codeset provides 1-byte, 2-byte, and 4-byte encoding with the following structure:
Number of Bytes | Encoding Range | Code Points |
---|---|---|
1 byte | 0x00 to 0x7F | 128 |
2 bytes | 0x81 to 0xFE 0x40 to 0xFE (except 0x7F) | 23940 |
4 bytes | 0x81 to 0xFE 0x30 to 0x39 0x81 to 0xFE 0x30 to 0x39 | 1587600 |
GB18030 1-byte code supports ASCII characters.
GB18030 2-byte code supports all the CJK characters (Chinese, Japanese, Korean) in the Unicode Version 2.1 Standard.
GB18030 4-byte code supports Unicode Version 3.0 additions. The 4-byte code also leaves a large number of unassigned code points available for future use.