JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.
Scope of use and compatibility
The character set JIS X 0208 establishes is primarily for the purpose of information interchange between data processing systems and the devices connected to them, or mutually between data communication systems. This character set can be used for data processing and text processing.Partial implementations of the character set are not considered compatible. Because there are places where such things have happened as the original drafting committee of the first standard taking care to separate characters between level 1 and level 2 and the second standard then shuffling some variant characters between the levels, at least in the first and second standards, it is conjectured that non-kanji and level 1-only implementation Japanese computer systems were at one time considered for development. However, such implementations have never been specified as compatible, though examples such as the early NEC PC-9801 did exist.
Even though there are provisions in the JIS X 0208:1997 standard concerning compatibility, at the present time, it is generally considered that this standard neither certifies compatibility nor is it an official manufacturing standard that amounts to a declaration of self-compatibility. Consequently, de facto, JIS X 0208-"compatible" products are not considered to exist. Terminology such as "conformant" and "support" is included in JIS X 0208, but the semantics of these terms vary from person to person.
Code charts
Lead byte
The first encoding byte corresponds to the row or cell number plus 0x20, or 32 in decimal. Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21, and so forth.For lead bytes used for characters other than kanji, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for kanji, links are provided to the appropriate section of Wiktionary's kanji index.
Non-Kanji rows
Character set 0x21 (row number 1, special characters)
Some vendors use slightly different Unicode mapping for this set than the one below. For example, Microsoft maps kuten 1-29 to U+2015, whereas Apple maps it to U+2014. Similarly, Microsoft maps kuten 1-61 to U+FF0D, and Apple maps it to U+2212. Unicode mapping of the wave dash also differs between vendors. See the cells with footnotes below.ASCII and JISCII punctuation may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as Shift JIS, EUC-JP or ISO 2022-JP.
Character set 0x22 (row number 2, special characters)
Most of the characters in this set were added in 1983, except for characters 0x2221-0x222E, which were included in the original 1978 version of the standard.Character set 0x23 (row number 3, digits and Roman)
This set includes a subset of the ISO 646 invariant set, minus punctuation and symbols, comprising western Arabic numerals and both cases of the Basic Latin alphabet. Characters in this set may use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as EUC-JP, Shift JIS or ISO 2022-JP.Compare row 3 of KPS 9566, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and of GB 2312, which include their entire national variants of ISO 646 in this row, rather than only the alphanumeric subset.
Character set 0x24 (row number 4, Hiragana)
This row contains Japanese Hiragana.Compare row 4 of GB 2312, which matches this row. Compare and contrast row 10 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row.
Character set 0x25 (row number 5, Katakana)
This row contains Japanese Katakana.Compare row 5 of GB 2312, which matches this row. Compare and contrast row 11 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row. Contrast the considerably different Katakana layout used by JIS X 0201.
Character set 0x26 (row number 6, Greek)
This row contains basic support for the modern Greek alphabet, without diacritics or the final sigma.Compare row 6 of GB 2312 and GB 12345 and row 6 of KPS 9566, which include the same Greek letters in the same layout, although GB 12345 adds vertical presentation forms and KPS 9566 adds Roman numerals. Compare and contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.
Character set 0x27 (row number 7, Cyrillic)
This row contains the modern Russian alphabet and is not necessarily sufficient for representing other forms of the Cyrillic script.Compare row 7 of GB 2312, which matches this row. Compare and contrast row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout.
Character set 0x28 (row number 8, box drawing)
All characters in this set were added in 1983, and were not present in the original 1978 revision of the standard.Extension character set 0x2D (row number 13, NEC special characters)
Rows 9 through 15 of the JIS X 0208 standard are left empty.However, the following layout for row 13, first introduced by NEC, is a common extension. It is used by Windows-932, by the PostScript variant of MacJapanese, and by JIS X 0213. Unlike the other extensions made by Windows-932/WHATWG and JIS X 0213, the two match rather than colliding, so decoding of most of this row is better supported than the other extensions made by JIS X 0213.
Kanji rows
Code structure
In order to represent code points, column/line numbers are used for one-byte codes and kuten numbers are used for two-byte codes. For a way to identify a character without depending on a code, character names are used.Single byte codes
Almost all JIS X 0208 graphic character codes are represented with two bytes of at least seven bits each. However, every control character, as well as the plain space - although not the ideographic space - is represented with a one-byte code. In order to represent the bit combination of a one-byte code, two decimal numbers – a column number and a line number – are used. Three high-order bits out of seven or four high-order bits out of eight, counting from zero to seven or from zero to fifteen respectively, form the column number. Four low-order bits counting from zero to fifteen form the line number. Each decimal number corresponds to one hexadecimal digit. For example, the bit combination corresponding to the graphic character "space" is 010 0000 as a 7-bit number, and 0010 0000 as an 8-bit number. In column/line notation, this is represented as 2/0. Other representations of the same single-byte code include 0x20 as hexadecimal, or 32 as a single decimal number.Code points and code numbers
The double-byte codes are laid out in 94 numbered groups, each called a row. Every row contains 94 numbered codes, each called a cell. This makes a total of 8836 possible code points ; these are laid out in the standard in a 94-line, 94-column code table.A row number and a cell number form a kuten point, which is used to represent double-byte code points. A code number or kuten number is expressed in the form "row-cell", the row and cell numbers being separated by a hyphen. For example, the character "wikt:亜" has a code point at row 16, cell 1, so its code number is represented as "16-01".
In 7-bit JIS X 0208, both bytes must be from the 94-byte range of 0x21 through 0x7E - exactly corresponding to the range used for 7-bit ASCII printing characters, not counting the space. Accordingly, the encoded bytes are obtained by adding 0x20 to each number. For instance, the above example of 16-01 would be represented by the bytes
0x30 0x21. The 8-bit EUC-JP instead uses the range 0xA1 through 0xFE, whereas other encodings such as Shift JIS use more complicated transforms. Shift JIS includes more encoding space than is needed for JIS X 0208 itself; some Shift JIS specific extensions to JIS X 0208 make use of row numbers above 94.This structure is also used in the Mainland Chinese GB 2312, where it is natively known as 区位, and the South Korean KS C 5601, where the ku and ten are respectively known as hang and yol. The later JIS X 0213 extends this structure by having more than one plane of rows, which is also the structure used by CNS 11643, and related to the structure used by CCCII.
Unassigned code points
Among the 2-byte codes, rows 9 to 15 and 85 to 94 are unassigned code points; that is, they are code points with no characters assigned to them. Also, some cells in other rows are also essentially unassigned code points.These empty areas contain code points that should basically not be used. Except when there is prior agreement among the relevant parties, characters for information interchange should not be assigned to the unassigned code points.
Even when assigning characters to unassigned code points, graphic characters defined in the standard should not be assigned to them, and the same character should not be assigned to multiple unassigned code points; characters should not be duplicated in the set.
Furthermore, when assigning characters to unassigned code points, it is necessary to be cautious of unification in regards to kanji glyphs. For example, row 25 cell 66 corresponds to the kanji meaning "high" or "expensive"; both the form with a component resembling the "mouth" character in the middle and the less common form with a ladder-like construction in the same location are subsumed into the same code point. Consequently, limiting point 25-66 to the "mouth" form and assigning the latter "ladder" form to an unassigned code point would technically be in violation of the standard.
In practice, however, several vendor-specific Shift JIS variants, including Windows-932 and MacJapanese, encode vendor extensions in unallocated rows of the encoding space for JIS X 0208. Also, most of the codes unassigned in JIS X 0208 are assigned by the newer JIS X 0213 standard.