Main index | Section 5 | Options |
VARIABLE
len1
mask1
len2
mask2
len3
mask3
len4
mask4
mask
The wchar_t encoding of EUC multibyte characters is dependent on the len and mask arguments. First, the bytes are moved into a wchar_t as follows:
byte0 << ((lenN-1) * 8) | byte1 << ((lenN-2) * 8) | ... | bytelenN-1
The result is then ANDed with ~mask and ORed with maskN. Codesets 2 and 3 are special in that the leading byte (0x8e or 0x8f) is first removed and the lenN argument is reduced by 1.
For example, the ja_JP.eucJP locale has the following VARIABLE line:
VARIABLE 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080
Codeset 1 consists of the values 0x0000 - 0x007f.
Codeset 2 consists of the values who have the bits 0x8080 set.
Codeset 3 consists of the values 0x0080 - 0x00ff.
Codeset 4 consists of the values 0x8000 - 0xff7f excluding the values which have the 0x0080 bit set.
Notice that the global mask is set to 0x8080, this implies that from those 2 bits the codeset can be determined.
EUC (5) | September 9, 2019 |
Main index | Section 5 | Options |
Please direct any comments about this manual page service to Ben Bullock. Privacy policy.
“ | F U cn rd dis U mst uz Unix. | ” |