Extended Unix Code

Extended Unix Code is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded character set taking one byte, and a character belonging to a 94×94 coded character set represented in two bytes. The EUC-CN form of and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes, including an initial, whereas a single character in EUC-TW can take up to four bytes.
Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially EUC-KR for South Korea.

Encoding structure

The structure of EUC is based on the standard, which specifies a system of graphical character sets that can be represented with a sequence of the 94 7-bit bytes 0x21–7E, or alternatively 0xA1–FE if an eighth bit is available. This allows for sets of 94 graphical characters, or 8836 characters, or 830584 characters. Although initially 0x20 and 0x7F were always the space and and 0xA0 and 0xFF were unused, later editions of allowed the use of the bytes 0xA0 and 0xFF within sets under certain circumstances, allowing the inclusion of 96-character sets. The ranges 0x00–1F and 0x80–9F are used for C0 and C1 control codes.
EUC is a family of 8-bit profiles of, as opposed to 7-bit profiles such as ISO-2022-JP. As such, only compliant character sets can have EUC forms. Up to four coded character sets can be represented with the EUC scheme. The G0 set is set to an compliant coded character set such as ASCII, or and invoked over GL. If ASCII is used, this makes the code an extended ASCII encoding; the most common deviation from ASCII is that 0x5C is often used to represent a yen sign in EUC-JP and a won sign in EUC-KR.
The other code sets are invoked over GR. Hence, to get the EUC form of a character, the most significant bit of each coding byte is set ; this allows the software to easily distinguish whether a particular byte in a character string belongs to the code or the extended code. Characters in code sets 2 and 3 are prefixed with the control codes and respectively, and invoked over GR. Besides the initial shift code, any byte outside of the range 0xA0–0xFF appearing in a character from code sets 1 through 3 is not a valid EUC code.
The EUC code itself does not make use of the announcement and designation sequences from. However, the code specification is equivalent to the following sequence of four announcement sequences, with meanings breaking down as follows.

Individual sequence	Hexadecimal	Feature of EUC denoted
`ESC SP C`	`1B 20 43`	ISO-8
`ESC SP Z`	`1B 20 5A`	G2 accessed using SS2
ESC SP variable-length encoding described above is sometimes referred to as the EUC packed format, which is the encoding format usually labeled as EUC. However, internal processing of EUC data may make use of a fixed-length transformation format called the EUC complete two-byte format. This represents: Code set 0 as two bytes in the range 0x21–0x7E. Code set 1 as two bytes in the range 0xA0–0xFF. Code set 2 as a byte in the range 0x21–0x7E followed by a byte in the range 0xA0–0xFF. Code set 3 as a byte in the range 0xA0–0xFF followed by a byte in the range 0x21–0x7E. Initial bytes of 0x00 and 0x80 are used in cases where the code set uses only one byte. There is also a four-byte fixed-length format. These fixed-length encoding formats are suited to internal processing and are not usually encountered in interchange. EUC-JP is registered with the IANA in both formats, the packed format as "EUC-JP" or "csEUCPkdFmtJapanese" and the fixed width format as "csEUCFixWidJapanese". Only the packed format is included in the [WHATWG">variable-width encoding">variable-length encoding described above is sometimes referred to as the EUC packed format, which is the encoding format usually labeled as EUC. However, internal processing of EUC data may make use of a fixed-length transformation format called the EUC complete two-byte format. This represents: Code set 0 as two bytes in the range 0x21–0x7E. Code set 1 as two bytes in the range 0xA0–0xFF. Code set 2 as a byte in the range 0x21–0x7E followed by a byte in the range 0xA0–0xFF. Code set 3 as a byte in the range 0xA0–0xFF followed by a byte in the range 0x21–0x7E. Initial bytes of 0x00 and 0x80 are used in cases where the code set uses only one byte. There is also a four-byte fixed-length format. These fixed-length encoding formats are suited to internal processing and are not usually encountered in interchange. EUC-JP is registered with the IANA in both formats, the packed format as "EUC-JP" or "csEUCPkdFmtJapanese" and the fixed width format as "csEUCFixWidJapanese". Only the packed format is included in the [WHATWG Encoding Standard used by HTML5. EUC-CN EUC-CN is the usual encoded form of the standard for simplified Chinese characters. Unlike the case of Japanese JIS X 0208 and ISO-2022-JP, is not normally used in a 7-bit code version, although a variant form called HZ was sometimes used on USENET. An ASCII character is represented in its usual encoding. A character from is represented by two bytes, both from the range 0xA1–0xFE. 748 code An encoding related to EUC-CN is the "748" code used in the WITS typesetting system developed by Beijing's Founder Technology. The 748 code contains all of, but is not -compliant and therefore not a true EUC code. The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting. IBM code pages 1380, 1381, 1382 and 1383 IBM code page 1381 comprises the single-byte code page 1115 and the double-byte code page 1380, which encodes GB 2312 the same way as EUC-CN, but deviates from the EUC structure by extending the lead byte range back to 0x8C, adding 31 IBM-selected characters in 0x8CE0 through 0x8CFE and adding 1880 user-defined characters with lead bytes 0x8D through 0xA0. IBM code page 1383 comprises the single-byte code page 367 and the double-byte code page 1382, which differs by conforming to the EUC structure, adding the 31 IBM-selected characters in 0xFEE0 through 0xFEFE instead, and including only 1360 user-defined characters, interspersed in the positions not used by GB 2312. The alternative CCSID 5479 is used for the pure EUC-CN code page: it uses CCSID 9574 as its double-byte set, which uses CPGID 1382 but excludes the IBM-selected and user-defined characters. GBK and GB 18030 GBK is an extension to. It defines an extended form of the EUC-CN encoding capable of representing a larger array of CJK characters sourced largely from, including traditional Chinese characters and characters used only in Japanese. It is not, however, a true EUC code, because ASCII bytes may appear as trail bytes, due to a larger encoding space being required. Variants of GBK are implemented by Windows code page 936, and by IBM's code page 1386. The Unicode-based character encoding defines an extension of GBK capable of encoding the entirety of Unicode. However, Unicode encoded as is a variable-length encoding which may use up to four bytes per character, due to an even larger encoding space being required. Being an extension of GBK, it is a superset of EUC-CN but is not itself a true EUC code. Being a Unicode encoding, its repertoire is identical to that of other Unicode transformation formats such as UTF-8. Mac OS Chinese Simplified Other EUC-CN variants deviating from the EUC mechanism include the classic Mac OS Chinese Simplified script. It uses the bytes 0x80, 0x81, 0x82, 0xA0, 0xFD, 0xFE, and 0xFF for the U with umlaut, two special font metric characters, the non-breaking space, the copyright sign, the trademark sign and the ellipsis respectively. This differs in what is regarded as a single-byte character versus the first byte of a two-byte character from both EUC and GBK. This use of 0xA0, 0xFD, 0xFE and 0xFF matches Apple's Shift_JIS variant. Besides these changes to the lead byte range, the other distinctive feature of the double-byte portion of Mac OS Chinese Simplified is the inclusion of two extensions to the basic GB 2312-80 set in rows 6 and 8. These are considered "standard extensions to GB 2312", neither of which is proprietary to Apple: the row 8 extension was taken from GB 6345.1, both extensions are included by GB/T 12345, and both extensions are included by GB 18030. EUC-JP EUC-JP is a variable-length encoding used to represent the elements of three Japanese character set standards, namely,, and. Other names for this encoding include Unixized JIS and AT&T JIS. Less than 0.1% of all web pages use EUC-JP since January 2025, while 2.3% of websites written with Japanese use this second-most popular encoding. It is called Code page 954 by IBM. Microsoft has two code page numbers for this encoding. This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by ISO-2022-JP, which is based on the same character set standards, and without ASCII bytes appearing as trail bytes. A related and partially compatible encoding, called EUC-JISx0213 or EUC-JIS-2004, encodes and . Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used or its extensions, although it became heavily used by Unix or Unix-like operating systems. Therefore, whether Japanese websites use EUC-JP or Shift_JIS often depends on what OS the author uses. Characters are encoded as follows: As an EUC/ISO 2022 compliant encoding, the C0 control characters, space, and DEL are represented as in ASCII. A graphical character from ASCII is represented as its usual one-byte representation, in the range 0x21 - 0x7E. While some variants of EUC-JP encode the lower half of here, most encode ASCII, including the W3C/WHATWG Encoding standard used by HTML5, and so does EUC-JIS-2004. While this means that 0x5C is typically mapped to Unicode as U+005C REVERSE SOLIDUS, U+005C may be displayed as a Yen sign by certain Japanese-locale fonts, e.g. on Microsoft Windows, for compatibility with the lower half of. A character from JIS X 0208 is represented by two bytes, both in the range 0xA1 - 0xFE. This differs from the ISO-2022-JP representation by having the high bit set. This code set may also contain vendor extensions in some EUC-JP variants. In EUC-JIS-2004, the first plane of is encoded here, which is effectively a superset of standard. A character from the upper half of is represented by two bytes, the first being 0x8E, the second being the usual representation in the range 0xA1 - 0xDF. This set may contain IBM vendor extensions in some variants. A character from JIS X 0212 is represented in EUC-JP by three bytes, the first being 0x8F, the following two being in the range 0xA1-0xFE, i.e. with the high bit set. In addition to standard, code set 3 of some EUC-JP variants may also contain extensions in rows 83 and 84 to represent characters from IBM's Shift JIS extensions which lack standard JIS X 0212 mappings, which may be coded in either of two layouts, one defined by IBM themselves and one defined by the OSF. In EUC-JIS-2004, the second plane of is encoded here, which does not collide with the allocated rows in standard. Some implementations of EUC-JIS-2004, such as the one used by Python, allow both and plane 2 characters in this set. Vendor extensions to EUC-JP were often allocated within the individual code sets, as opposed to using invalid EUC sequences. However, some vendor-specific encodings are partially compatible with EUC-JP, due to encoding over GR, but do not follow the packed EUC structure. Often, these do not include use of the single shifts from EUC-JP, and are thus not straight extensions of EUC-JP, with the exception of Super DEC Kanji. DEC Kanji Digital Equipment Corporation defines two variants of EUC-JP only partly conforming to the EUC packed format, but also bearing some resemblance to the complete two-byte format. The overall format of the "DEC Kanji" encoding mostly corresponds to fixed-length EUC; however, code set 0 is not required to be left-padded with null bytes. JIS X 0208 is, as usual, used for code set 1; code set 2 is absent; code set 3 is encoded like the two-byte fixed width format, but used for two-byte user defined characters rather than being specified for JIS X 0212. In the basic "DEC Kanji" encoding, only the first 31 rows of code set 3 are used for user-defined characters: rows 32 through 94 are reserved, similarly to the unused rows in code set 1. The "Super DEC Kanji" encoding accepts codes both from the "DEC Kanji" encoding and from packed-format EUC, for a total of five code-sets. It also allows the entire user defined code set, and the unused rows at the ends of the JIS X 0208 and JIS X 0212 code sets, to be used for user-defined characters. HP-16 Hewlett-Packard defines an encoding referred to as "HP-16". This accompanies their "HP-15" encoding, which is a variant of Shift JIS. HP-16 encodes using the same bytes as in EUC-JP, but does not use the single shift codes, and adds three user-defined regions which do not follow the packed-format EUC structure: Lead bytes 0xA1–C2, trail bytes 0x21–7E Lead bytes 0xC3–E3, trail bytes 0x21–3F Lead bytes 0xC3–E1, trail bytes 0x40–64 IKIS The IKIS encoding used by Data General resembles EUC-JP without single shifts, i.e. with only code sets 0 and 1. Half-width katakana are instead included in row 8 of JIS X 0208. JIS X 0208 rows 9 through 12 are used for user-defined characters. Adaptations of EUC-JP for EBCDIC KEIS is an EBCDIC encoding used by Hitachi, with double-byte characters included using shifting sequences, making it a stateful encoding. Specifically, the sequence switches to single-byte mode and the sequence switches to double-byte mode. However, JIS X 0208 characters are encoded using the same byte sequences used to encode them in EUC-JP. This results in duplicate encodings for the —0x4040 per the DBCS-Host code structure, and 0xA1A1 as in EUC-JP. This differs from IBM's DBCS-Host encoding for Japanese, the layout of which builds on versions which predate JIS X 0208 altogether. The lead byte range is extended back to 0x59, out of which the lead bytes 0x81–A0 are designated for user-defined characters, and the remainder are used for corporate-defined characters, including both kanji and non-kanji. JEF is an EBCDIC encoding used on Fujitsu FACOM mainframes, contrasting with FMR used on Fujitsu PCs. Like KEIS, JEF is a stateful encoding, switching to a double-byte DBCS-Host mode using shifting sequences. Also similarly to KEIS, codes are represented the same as in EUC-JP. The lead byte range is extended back to 0x41, with 0x80–0xA0 designated for user definition; lead bytes 0x41–0x7F are assigned row numbers 101 through 163 for kuten purposes, although row 162 is unused. Rows 101 through 148 are used for extended kanji, while rows 149 through 163 are used for extended non-kanji. EUC-KR EUC-KR is a variable-length encoding to represent Korean text using two coded character sets, and either or ASCII, depending on variant. stipulates the encoding and dubbed it as EUC-KR. A character drawn from KS X 1001 is encoded as two bytes in GR and a character from or ASCII takes one byte in GL. It is usually referred to as Wansung in the Republic of Korea. IBM refers to the double-byte component as Code page 971, and to EUC-KR with ASCII as Code page 970. It is implemented as Code page 20949 and Code page 51949 by Microsoft. , less than 0.06% of all web pages globally declare using EUC-KR, but 4.0% of South Korean web pages use EUC-KR. Including extensions, it is the most widely used legacy character encoding in Korea on all three major platforms, but its use has been very slowly shifting to UTF-8 as it gains popularity, especially on Linux and macOS. As with most other encodings, UTF-8 is now preferred for new use, solving problems with consistency between platforms and vendors. Unified Hangul Code A common extension of EUC-KR is the Unified Hangul Code, which is the default Korean codepage on Microsoft Windows. It is given the code page number 949 by Microsoft, and 1261 or 1363 by IBM. IBM's code page 949 is a different, unrelated, EUC-KR extension. Unified Hangul Code extends EUC-KR by using codes that do not conform to the EUC structure to incorporate additional syllable blocks, completing the coverage of the composed syllable blocks available in Johab and Unicode. The W3C/WHATWG Encoding Standard used by HTML5 incorporates the Unified Hangul Code extensions into its definition of EUC-KR. Mac OS Korean (HangulTalk) Other encodings incorporating EUC-KR as a subset include the Mac OS Korean script, which was used by HangulTalk, the Korean localization of the classic Mac OS. It was developed by Elex Computer, who were at the time the authorised distributor of Apple Macintosh computers in South Korea. HangulTalk adds extension characters with lead bytes between 0xA1 and 0xAD, both in unused space within the EUC-KR GR plane, and using non-EUC codes outside of it. Some of these characters are font-style-independent stylized dingbats. Many of these characters do not have exact Unicode mappings, and Apple software maps these cases variously to combining sequences, to approximate mappings with an appended private-use character as a modifier for round-trip purposes, or to private-use characters. Apple also uses certain single-byte codes outside of the EUC-KR plane for additional characters: 0x80 for a required space, 0x81 for a won sign, 0x82 for an en dash, 0x83 for a copyright sign, 0x84 for a wide underscore and 0xFF for an ellipsis. Although none of these additional single-byte codes are within the lead byte range of plain EUC-KR, some are within the lead byte range of Unified Hangul Code. EUC-KP Similarly to KS X 1001, the North Korean KPS 9566 standard is typically used in EUC form; in these contexts, it is sometimes referred to as EUC-KP. More recent editions of the standard extend the EUC representation with characters using non-EUC two-byte codes, in a similar manner to Unified Hangul Code. EUC-TH Although certain single-byte encodings such as the ISO/IEC 8859 series technically conform to the EUC structure, they are rarely labeled as EUC. However, is used on Solaris as a label for TIS-620. EUC-TW EUC-TW is a variable-length encoding that supports ASCII and 16 planes of, each of which is 94×94. It is a rarely used encoding for traditional Chinese characters as used in Taiwan. Variants of Big5 are much more common than EUC-TW, although Big5 only encodes the first two planes of CNS 11643 hanzi, while UTF-8 is becoming more common. As an EUC/ISO 2022 encoding, the C0 control characters, ASCII space, and DEL are encoded as in ASCII. A graphical character from ASCII is encoded in GL as its usual single-byte representation. A character from CNS 11643 plane 1 is encoded as two bytes in GR. A character in planes 1 through 16 of CNS 11643 is encoded as four bytes: * The first byte is always 0x8E. * The second byte indicates the plane, the number of which is obtained by subtracting 0xA0 from that byte. * The third and fourth bytes are in GR. Note that plane 1 of CNS 11643 is encoded twice as code set 1 and a part of code set 2. Books 1 Vineland 71,214 2 Project Hail Mary 31,941 3 Wuthering Heights 18,607 4 Hamlet 15,928 5 Hamnet (novel) 15,832 6 Frankenstein 11,017 7 Flowers in the Attic 10,307 8 The Count of Monte Cristo 9,133 9 Dune Messiah 8,113 10 The Testaments 8,006 Films 1 Sinners (2025 film) 622,394 2 Hamnet (film) 295,777 3 Weapons (2025 film) 223,917 4 Mr Nobody Against Putin 163,645 5 Marty Supreme 149,377 6 KPop Demon Hunters 133,023 7 Sentimental Value 129,966 8 Bugonia (film) 112,650 9 The Secret Agent (2025 film) 77,032 10 All the Empty Rooms 73,731 Programming Languages 1 Python (programming language) 4,694 2 C (programming language) 4,564 3 JavaScript 3,307 4 Scratch (programming language) 2,739 5 C++ 2,012 6 Rust (programming language) 1,710 7 Java (programming language) 1,662 8 R (programming language) 1,501 9 COBOL 1,427 10 YAML 1,308 TV Series 1 The Madison (TV series) 106,133 2 One Piece (2023 TV series) 76,319 3 Scarpetta (TV series) 62,845 4 Paradise (2025 TV series) 48,765 5 The Other Bennet Sister 39,436 6 The Pitt 39,127 7 DTF St. Louis 37,811 8 Love Story (2026 TV series) 32,476 9 Young Sherlock (British TV series) 30,900 10 Bridgerton 29,723 Video Games 1 Resident Evil Requiem 23,671 2 Wordle 22,659 3 Crimson Desert 21,539 4 Pokémon Pokopia 21,183 5 Pokémon (video game series) 8,283 6 Minecraft 7,928 7 Roblox 7,908 8 Grand Theft Auto VI 7,100 9 Grand Theft Auto V 6,727 10 Poppy Playtime 6,368 © 2026 OWIKI.org. Content is available under Creative Commons Attribution-ShareAlike 4.0 unless otherwise noted. Status: ONLINE Version: 1.05

Extended Unix Code

Encoding structure

EUC-CN

748 code

IBM code pages 1380, 1381, 1382 and 1383

GBK and GB 18030

Mac OS Chinese Simplified

EUC-JP

DEC Kanji

HP-16

IKIS

Adaptations of EUC-JP for EBCDIC

EUC-KR

Unified Hangul Code

Mac OS Korean (HangulTalk)

EUC-KP

EUC-TH

EUC-TW