Code page

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.
The term "code page" originated from IBM's EBCDIC-based mainframe systems, but Microsoft, SAP, and Oracle Corporation are among the vendors that use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets, identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual, a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding, even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP.
Hewlett-Packard uses a similar concept in its HP-UX operating system and its Printer Command Language protocol for printers. The terminology, however, is different: What others call a character set, HP calls a symbol set, and what IBM or Microsoft call a code page, HP calls a symbol set code. HP developed a series of symbol sets, each with an associated symbol set code, to encode both its own character sets and other vendors’ character sets.
The multitude of character sets leads many vendors to recommend Unicode.

The code page numbering system

IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware.
With the release of PC DOS version 3.3 IBM introduced the code page numbering system to regular PC users, as the code page numbers were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.
After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one third-party vendor also has its own different list of numeric assignments. IBM's current assignments are listed in their CCSID repository, while Microsoft's assignments are documented within the MSDN. Additionally, a list of the names and approximate IANA abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine.
Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into eight bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.
The text mode of standard PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the display adapter for easy switching. There was a selection of third-party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages.

Relationship to ASCII

The majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a parity bit in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these "extended ASCII character sets" and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings.

Relationship to Unicode

Unicode is an effort to include all characters from all currently and historically used human languages into single character enumeration, removing the need to distinguish between different code pages when handling digitally stored text. Unicode tries to retain backwards compatibility with many legacy code pages, copying some code pages 1:1 in the design process. An explicit design goal of Unicode was to allow round-trip conversion between all common legacy code pages, although this goal has not always been achieved.
Some vendors, namely IBM and Microsoft, have anachronistically assigned code page numbers to Unicode encodings. This convention allows code page numbers to be used as metadata to identify the correct decoding algorithm when encountering binary stored data.

IBM code pages

EBCDIC-based code pages

These code pages are used by IBM in its EBCDIC character sets for mainframe computers.

1 – USA WP, Original
2 – USA
3 – USA Accounting, Version A
4 – USA
5 – USA
6 – Latin America
7 – Germany F.R. / Austria
8 – Germany F.R.
9 – France, Belgium
10 – Canada
11 – Canada
12 – Italy
13 – Netherlands
14 – Spain
15 – Switzerland
16 – Switzerland
17 – Switzerland
18 – Sweden / Finland
19 – Sweden / Finland WP, version 2
20 – Denmark/Norway
21 – Brazil
22 – Portugal
23 – United Kingdom
24 – United Kingdom
25 – Japan
26 – Japan
27 – Greece
29 – Iceland
30 – Turkey
31 – South Africa
32 – Czechoslovakia
33 – Czechoslovakia
34 – Czechoslovakia
35 – Romania
36 – Romania
37 – USA/Canada - CECP
37-2 – The real 3279 APL codepage, as used by C/370. This is very close to 1047, except for caret and not-sign inverted. It is not officially recognized by IBM, even though SHARE has pointed out its existence.
38 – USA ASCII
39 – United Kingdom / Israel
40 – United Kingdom
251 – China
252 – Poland
254 – Hungary
256 – International #1
257 – International #2
258 – International #3
259 – Symbols, Set 7
260 – Canadian French - 116
264 – Print Train & Text processing extended
273 – Germany F.R./Austria - CECP
274 – Old Belgium Code Page
275 – Brazil - CECP
276 – Canada - 94
277 – Denmark, Norway - CECP
278 – Finland, Sweden - CECP
279 – French - 94
280 – Italy - CECP
281 – Japan - CECP
282 – Portugal - CECP
283 – Spain - 190
284 – Spain/Latin America - CECP
285 – United Kingdom - CECP
286 – Austria / Germany F.R. Alternate
287 – Denmark / Norway Alternate
288 – Finland / Sweden Alternate
289 – Spain Alternate
290 – Japanese Extended
293 – APL
297 – France
298 – Japan
300 – Japan DBCS
310 – Graphic Escape APL/TN
320 – Hungary
321 – Yugoslavia
322 – Turkey
330 – International #4
340 – EBCDIC, OCR
351 – GDDM default
352 – Printing and publishing option
353 – BCDIC-A
354 – BCDIC-B
355 – PTTC/BCD standard option
357 – PTTC/BCD H option
358 – PTTC/BCD Correspondence option
359 – PTTC/BCD Monocase option
360 – PTTC/BCD Duocase option
361 – EBCDIC Publishing International
363 – Symbols, set 8
382 – EBCDIC Publishing Austria, Germany F.R. Alternate
383 – EBCDIC Publishing Belgium
384 – EBCDIC Publishing Brazil
385 – EBCDIC Publishing Canada
386 – EBCDIC Publishing Denmark, Norway
387 – EBCDIC Publishing Finland, Sweden
388 – EBCDIC Publishing France
389 – EBCDIC Publishing Italy
390 – EBCDIC Publishing Japan
391 – EBCDIC Publishing Portugal
392 – EBCDIC Publishing Spain, Philippines
393 – EBCDIC Publishing Latin America
394 – EBCDIC Publishing China, UK, Ireland
395 – EBCDIC Publishing Australia, New Zealand, USA, Canada
396 – BookMaster Specials
410 – Cyrillic
420 – Arabic
421 – Maghreb/French
423 – Greek
424 – Hebrew
425 – Arabic / Latin for OS/390 Open Edition
435 – Teletext Isomorphic
500 – International #5
803 – Hebrew Character Set A
829 – Host Math Symbols- Publishing
830 – Math Format
831 – Portugal
833 – Korean Extended
834 – Korean Hangul
835 – Traditional Chinese DBCS
836 – Simplified Chinese Extended
837 – Simplified Chinese DBCS
838 – Thai with Low Marks & Accented Characters
839 – Thai DBCS
870 – Latin 2
871 – Iceland
875 – Greek
880 – Cyrillic
881 – United States - 5080 Graphics System
882 – United Kingdom - 5080 Graphics System
883 – Sweden - 5080 Graphics System
884 – Germany - 5080 Graphics System
885 – France - 5080 Graphics System
886 – Italy - 5080 Graphics System
887 – Japan - 5080 Graphics System
888 – France AZERTY - 5080 Graphics System
889 – Thailand
890 – Yugoslavia
892 – EBCDIC, OCR A
893 – EBCDIC, OCR B
905 – Latin 3
918 – Urdu Bilingual
924 – Latin 9
930 – Japan MIX
931 – Japan MIX
933 – Korea MIX
935 – Simplified Chinese MIX
937 – Traditional Chinese MIX
939 – Japan MIX
1001 – MICR
1002 – EBCDIC DCF Release 2 Compatibility
1003 – EBCDIC DCF, US Text subset
1005 – EBCDIC Isomorphic Text Communication
1007 – EBCDIC Arabic
1024 – EBCDIC T.61
1025 – Cyrillic, Multilingual
1026 – EBCDIC Turkey
1027 – Japanese Extended
1028 – EBCDIC Publishing Hebrew
1030 – Japanese Extended
1031 – Japanese Extended
1032 – MICR, E13-B Combined
1033 – MICR, CMC-7 Combined
1037 – Korea - 5080/6090 Graphics System
1039 – GML Compatibility
1047 – Latin 1/Open Systems
1068 – DCF Compatibility
1069 – Latin 4
1070 – USA / Canada Version 0
1071 – Germany F.R. / Austria
1072 – Belgium
1073 – Brazil
1074 – Denmark, Norway
1075 – Finland, Sweden
1076 – Italy
1077 – Japan
1078 – Portugal
1079 – Spain / Latin America Version 0
1080 – United Kingdom
1081 – France Version 0
1082 – Israel
1083 – Israel
1084 – International#5 Version 0
1085 – Iceland
1087 – Symbol Set
1091 – Modified Symbols, Set 7
1093 – IBM Logo
1097 – Farsi Bilingual
1110 – Latin 2
1112 – Baltic Multilingual
1113 – Latin 6
1122 – Estonia
1123 – Cyrillic, Ukraine
1130 – Vietnamese
1132 – Lao EBCDIC
1136 – Hitachi Katakana
1137 – Devanagari EBCDIC
1140 – USA, Canada, etc. ECECP
1141 – Austria, Germany ECECP
1142 – Denmark, Norway ECECP
1143 – Finland, Sweden ECECP
1144 – Italy ECECP
1145 – Spain, Latin America ECECP
1146 – UK ECECP
1147 – France ECECP with euro
1148 – International ECECP with euro
1149 – Icelandic ECECP with euro
1150 – Korean Extended with box characters
1151 – Simplified Chinese Extended with box characters
1152 – Traditional Chinese Extended with box characters
1153 – Latin 2 Multilingual with euro
1154 – Cyrillic, Multilingual with euro
1155 – Turkey with euro
1156 – Baltic Multi with euro
1157 – Estonia with euro
1158 – Cyrillic, Ukraine with euro
1159 – T-Chinese EBCDIC
1160 – Thai with Low Marks & Accented Characters with euro
1164 – Vietnamese with euro
1165 – Latin 2/Open Systems
1166 – Cyrillic Kazakh
1175 – Turkey with euro and lira
1278 – EBCDIC Adobe Standard Encoding
1279 – Hitachi Japanese Katakana Host
1300 – Generic Bar Code/OCR-B
1301 – Zip + 4 POSTNET Bar Code
1302 – Facing Identification Marks
1303 – EBCDIC Bar Code
1364 – Korea MIX
1371 – Traditional Chinese MIX
1376 – Traditional Chinese DBCS Host extension for HKSCS
1377 – Mixed Host HKSCS Growing
1378 – Traditional Chinese DBCS Host extension for HKSCS and Simplified Chinese
1379 – Mixed Host HKSCS and Simplified Chinese Growing
1388 – Simplified Chinese MIX
1390 – Simplified Chinese MIX Japan MIX
1399 – Japan MIX

DOS code pages

These code pages are used by IBM in its PC DOS operating system. These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font. The interface of those adapters was typically limited to single byte character sets with only 256 characters in each font/encoding.

301 – IBM-PC Japan DBCS
437 – Original IBM PC hardware code page
720 – Arabic
737 – Greek
775 – Latin-7
808 – Russian with euro
848 – Ukrainian with euro
849 – Belarusian with euro
850 – Latin-1
851 – Greek
852 – Latin-2
853 – Latin-3
855 – Cyrillic
856 – Hebrew
857 – Latin-5
858 – Latin-1 with euro symbol
859 – Latin-9
860 – Portuguese
861 – Icelandic
862 – Hebrew
863 – Canadian French
864 – Arabic
865 – Danish/Norwegian
866 – Belarusian, Russian, Ukrainian
867 – Hebrew + euro
868 – Urdu
869 – Greek
872 – Cyrillic with euro
874 – Thai with Low Tone Marks & Ancient Chars
876 – OCR A
877 – OCR B
878 – KOI8-R
891 – Korean PC SBCS
898 – IBM-PC WP Multilingual
899 – IBM-PC Symbol
903 – Simplified Chinese PC SBCS
904 – Traditional Chinese PC SBCS
906 – International Set #5 3812/3820
907 – ASCII APL
909 – IBM-PC APL2 Extended
910 – IBM-PC APL2
911 – IBM-PC Japan #1
926 – Korean PC DBCS
927 – Traditional Chinese PC DBCS
928 – Simplified Chinese PC DBCS
929 – Thai PC DBCS
932 – IBM-PC Japan MIX
934 – IBM-PC Korea MIX
936 – IBM-PC Simplified Chinese MIX
938 – IBM-PC Traditional Chinese MIX
942 – IBM-PC Japan MIX
943 – IBM-PC Japan OPEN
944 – IBM-PC Korea MIX
946 – IBM-PC Simplified Chinese
948 – IBM-PC Traditional Chinese
949 – Korean
951 – Korean DBCS
1034 – Printer Application - Shipping Label, Set #2
1040 – Korean Extended
1041 – Japanese Extended
1042 – Simplified Chinese Extended
1043 – Traditional Chinese Extended
1044 – Printer Application - Shipping Label, Set #1
1086 – IBM-PC Japan #1
1088 – Revised Korean
1092 – IBM-PC Modified Symbols
1098 – Farsi
1108 – DITROFF Base Compatibility
1109 – DITROFF Specials Compatibility
1115 – IBM-PC People's Republic of China
1116 – Estonian
1117 – Latvian
1118 – Lithuanian
1119 – Lithuanian and Russian
1125 – Cyrillic, Ukrainian
1127 – IBM-PC Arabic / French
1131 – IBM-PC Data, Cyrillic, Belarusian
1139 – Japan Alphanumeric Katakana
1161 – Thai with Low Tone Marks & Ancient Chars with euro
1167 – KOI8-RU
1168 – KOI8-U
1370 – Traditional Chinese MIX
1380 – IBM-PC Simplified Chinese GB PC-DATA
1381 – IBM-PC Simplified Chinese
1393 – Japanese JIS X 0213 DBCS
1394 – IBM-PC Japan

When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but newer encoding systems, in particular Unicode, are encouraged for new designs.
DOS code pages are typically stored in.CPI files.

IBM AIX code pages

These code pages are used by IBM in its AIX operating system. They emulate several character sets, namely those ones designed to be used accordingly to ISO, such as UNIX-like operating systems.

367 – 7-bit US-ASCII
371 – 7-bit US-ASCII APL
806 – ISCII
813 – ISO 8859-7
819 – ISO 8859-1
895 – 7-bit Japan Latin
896 – 7-bit Japan Katakana Extended
901 – ISO 8859-13 with euro
902 – ISO Estonian with euro
912 – ISO 8859-2
913 – ISO 8859-3
914 – ISO 8859-4
915 – ISO 8859-5
916 – ISO 8859-8
919 – ISO 8859-10
920 – ISO 8859-9
921 – ISO 8859-13
922 – ISO Estonian
923 – ISO 8859-15
952 – EUC Japanese for JIS X 0208
953 – EUC Japanese for JIS X 0212
954 – EUC Japanese
955 – TCP Japanese, JIS X 0208-1978
956 – TCP Japanese
957 – TCP Japanese
958 – TCP Japanese
959 – TCP Japanese
960 – Traditional Chinese DBCS-EUC SICGCC Primary Set
961 – Traditional Chinese DBCS-EUC SICGCC Full Set + IBM Select + UDC
963 – Traditional Chinese TCP, CNS 11643 plane 2 only
964 – EUC Traditional Chinese
965 – TCP Traditional Chinese
970 – EUC Korean
971 – EUC Korean DBCS
1006 – ISO 8-bit Urdu
1008 – ISO 8-bit Arabic
1009 – 7-bit ISO IRV
1010 – 7-bit France
1011 – 7-bit Germany F.R.
1012 – 7-bit Italy
1013 – 7-bit United Kingdom
1014 – 7-bit Spain
1015 – 7-bit Portugal
1016 – 7-bit Norway
1017 – 7-bit Denmark
1018 – 7-bit Finland/Sweden
1019 – 7-bit Netherlands
1029 – Arabic Extended
1036 – CCITT T.61
1046 – Arabic Extended
1089 – ISO 8859-6
1111 – Variant of ISO 8859-2
1124 – ISO Ukrainian, similar to ISO 8859-5
1129 – ISO Vietnamese
1133 – ISO Lao
1163 – ISO Vietnamese with euro
1350 – EUC Japanese
1382 – EUC Simplified Chinese
1383 – EUC Simplified Chinese

Code page 819 is identical to Latin-1, ISO/IEC 8859-1, and with slightly-modified commands, permits MS-DOS machines to use that encoding. It was used with IBM AS/400 minicomputers.

IBM OS/2 code pages

These code pages are used by IBM in its OS/2 operating system.

1004 – Latin-1 Extended, Desk Top Publishing/Windows

Windows emulation code pages

These code pages are used by IBM when emulating the Microsoft Windows character sets. Most of these code pages have the same number as Microsoft code pages, although they are not exactly identical. Some code pages, though, are new from IBM, not devised by Microsoft.

897 – IBM-PC SBCS Japanese
941 – IBM-PC Japanese DBCS for Open environment
947 – IBM-PC DBCS for
950 – Traditional Chinese MIX
1114 – IBM-PC SBCS
1126 – IBM-PC Korean SBCS
1162 – Windows Thai
1169 – Windows Cyrillic Asian
1174 – Windows Kazakh
1250 – Windows Central Europe
1251 – Windows Cyrillic
1252 – Windows Western
1253 – Windows Greek
1254 – Windows Turkish
1255 – Windows Hebrew
1256 – Windows Arabic
1257 – Windows Baltic
1258 – Windows Vietnamese
1360 – Korean JOHAB DBCS
1361 – Korean
1362 – Korean Hangul DBCS
1363 – Windows Korean
1372 – IBM-PC MS T Chinese Big5 encoding
1373 – Windows Traditional Chinese
1374 – IBM-PC DB Big5 encoding extension for HKSCS
1375 – Mixed Big5 encoding extension for HKSCS
1385 – IBM-PC Simplified Chinese DBCS
1386 – IBM-PC Simplified Chinese GBK
1391 – Simplified Chinese 4 Byte
1392 – IBM-PC Simplified Chinese MIX

Macintosh emulation code pages

These code pages are used by IBM when emulating the Apple Macintosh character sets.

1275 – Apple Roman
1280 – Apple Greek
1281 – Apple Turkish
1282 – Apple Central European
1283 – Apple Cyrillic
1284 – Apple Croatian
1285 – Apple Romanian
1286 – Apple Icelandic

Adobe emulation code pages

These code pages are used by IBM when emulating the Adobe character sets.

1038 – Adobe Symbol Encoding
1276 – Adobe Standard Encoding
1277 – Adobe Latin 1

HP emulation code pages

These code pages are used by IBM when emulating the HP character sets.

1050 – HP Roman Extension
1051 – HP Roman-8
1052 – HP Gothic Legal
1053 – HP Gothic-1
1054 – HP ASCII
1055 – HP PC-Line
1056 – HP Line Draw
1057 – HP PC-8
1058 – HP PC-8DN
1351 – Japanese DBCS HP character set
5039 – Japanese MIX

DEC emulation code pages

These code pages are used by IBM when emulating the DEC character sets.

1020 – 7-bit Canadian NRC Set
1021 – 7-bit Switzerland NRC Set
1023 – 7-bit Spanish NRC Set
1090 – Special Characters and Line Drawing Set
1100 – DEC Multinational
1101 – 7-bit British NRC Set
1102 – 7-bit Dutch NRC Set
1103 – 7-bit Finnish NRC Set
1104 – 7-bit French NRC Set
1105 – 7-bit Norwegian/Danish NRC Set
1106 – 7-bit Swedish NRC Set
1107 – 7-bit Norwegian/Danish NRC Alternate
1287 – DEC Greek
1288 – DEC Turkish

IBM Unicode code pages

1200 – UTF-16BE Unicode with IBM Private Use Area
1201 – UTF-16BE Unicode
1202 – UTF-16LE Unicode with IBM PUA
1203 – UTF-16LE Unicode
1208 – UTF-8 Unicode with IBM PUA
1209 – UTF-8 Unicode
1400 – ISO 10646 UCS-BMP
1401 – ISO 10646 UCS-SMP
1402 – ISO 10646 UCS-SIP
1414 – ISO 10646 UCS-SSP
1445 – IBM AFP PUA No. 1
1446 – ISO 10646 UCS-PUP15
1447 – ISO 10646 UCS-PUP16
1448 – UCS-BMP
1449 – IBM default PUA

Microsoft code pages

Windows code pages

These code pages are used by Microsoft in its own Windows operating system. Microsoft defined a number of code pages known as the ANSI code pages. Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes from ISO 6429 mentioned by ISO 8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.

42 – Windows Symbol
874 – Windows Thai
1250 – Windows Central Europe
1251 – Windows Cyrillic
1252 – Windows Western
1253 – Windows Greek
1254 – Windows Turkish
1255 – Windows Hebrew
1256 – Windows Arabic
1257 – Windows Baltic
1258 – Windows Vietnamese

Microsoft recommends new applications use UTF-8 or UCS-2/UTF-16 instead of these code pages.

DBCS code pages

These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "Windows" code page for the applicable locale.

932 – Supports Japanese Shift-JIS
936 – Supports Simplified Chinese GB2312 or GBK
949 – Supports Korean Unified Hangul Code
950 – Supports Traditional Chinese Big5
* 951 – Supports Traditional Chinese Big5 with HKSCS

MS-DOS code pages

These code pages are used by Microsoft in its MS-DOS operating system. Microsoft refers to these as the OEM code pages because they were defined by the original equipment manufacturers who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standards organization. Most of these code pages have the same number as the equivalent IBM code pages, although some are not exactly identical.

708 – Arabic
720 – Arabic
737 – Greek
850 – Latin-1
851 – Greek
852 – Latin-2
855 – Cyrillic
857 – Latin-5
858 – Latin-1 with euro symbol
859 – Latin-9
860 – Portuguese
861 – Icelandic
862 – Hebrew
863 – Canadian French
864 – Arabic
865 – Danish/Norwegian
866 – Belarusian, Russian, Ukrainian
869 – Greek

Macintosh emulation code pages

These code pages are used by Microsoft when emulating the Apple Macintosh character sets.

10000 - Apple Macintosh Roman
10001 - Apple Japanese
10002 - Apple Traditional Chinese
10003 - Apple Korean
10004 - Apple Arabic
10005 - Apple Hebrew
10006 - Apple Greek
10007 - Apple Macintosh Cyrillic
10008 - Apple Simplified Chinese
10010 - Apple Romanian
10017 - Apple Ukrainian
10021 - Apple Thai
10029 - Apple Macintosh Central Europe
10079 - Apple Icelandic
10081 - Apple Turkish
10082 - Apple Croatian

Various other Microsoft code pages

The following code page numbers are specific to Microsoft Windows. IBM may use different numbers for these code pages. They emulate several character sets, namely those ones designed to be used accordingly to ISO, such as UNIX-like operating systems.

20000 – Traditional Chinese CNS
20001 – Traditional Chinese TCA
20002 – Traditional Chinese ETEN
20003 – Traditional Chinese IBM5500
20004 – Traditional Chinese TeleText
20005 – Traditional Chinese Wang
20105 – 7-bit IA5 IRV
20106 – 7-bit IA5 German
20107 – 7-bit IA5 Swedish
20108 - 7-bit IA5 Norwegian
20127 – 7-bit US-ASCII
20261 – CCITT T.61
20269 – ISO 6937
20273
20277
20278
20284
20285
20290 - Japanese language in EBCDIC
20297
20420
20423
20424
20833
20838
20866 – KOI8-R
20871
20880 – EBCDIC Cyrillic
20905
20924
20932 - EUC-JP
20936
20949
21025 – EBCDIC Cyrillic
21027
21866 – KOI8-U
28591 – ISO-8859-1
28592 – ISO-8859-2
28593 – ISO-8859-3
28594 – ISO-8859-4
28595 – ISO-8859-5
28596 – ISO-8859-6
28597 – ISO-8859-7
28598 – ISO-8859-8
28599 – ISO-8859-9
28600 – ISO-8859-10
28601 – ISO-8859-11
28602 – not used
28603 – ISO-8859-13
28604 – ISO-8859-14
28605 – ISO-8859-15
28606 – ISO-8859-16
38596 – ISO-8859-6
38598 – ISO-8859-8

Microsoft Unicode code pages

1200 – UTF-16LE Unicode
1201 – UTF-16BE Unicode
12000 – UTF-32LE Unicode
12001 – UTF-32BE Unicode
65000 – UTF-7 Unicode
65001 – UTF-8 Unicode
65520 – Empty Unicode Plane

HP Symbol Sets

HP developed a series of Symbol Sets to encode either its own character sets or other vendors’ character sets. They are normally 7-bit character sets which, when moved to the higher part and associated with the ASCII character set, make up 8-bit character sets.

HP own Symbol Sets

Symbol Set 0E — HP Roman Extension — 7-bit character set with accented letters
Symbol Set 0G — HP 7-bit German
Symbol Set 0L — HP 7-bit PC Line
Symbol Set 0M — HP Math-7
Symbol Set 0T — HP Thai-8
Symbol Set 1S — HP 7-bit Spanish
Symbol Set 1U — HP 7-bit Gothic Legal
Symbol Set 4Q — HP Line Draw
Symbol Set 4U — HP Roman-9 — Roman-8 + €
Symbol Set 7J — HP Desktop
Symbol Set 7S — HP 7-bit European Spanish
Symbol Set 8E — HP East-8
Symbol Set 8G — HP Greek-8
Symbol Set 8H — HP Hebrew-8
Symbol Set 8I — MS LineDraw
Symbol Set 8K — HP Kana-8
Symbol Set 8L — HP LineDraw
Symbol Set 8M — HP Math-8
Symbol Set 8R — HP Cyrillic-8
Symbol Set 8S — HP 7-bit Latin American Spanish
Symbol Set 8T — HP Turkish-8
Symbol Set 8U — HP Roman-8
Symbol Set 8V — HP Arabic-8
Symbol Set 9K — HP Korean-8
Symbol Set 9T — PC 8T
Symbol Set 9V — Latin / Arabic for Windows
Symbol Set 11U — PC 8D/N
Symbol set 14G — PC-8 Greek Alternate
Symbol Set 18K —
Symbol Set 18T —
Symbol Set 19C —
Symbol Set 19K —

Symbol Sets from other vendors

Symbol Set 0D — ISO 60: 7-bit Norwegian
Symbol Set 0F — ISO 25: 7-bit French
Symbol Set 0H — HP 7-bit Hebrew — Practically the same as Israeli Standard SI 960
Symbol Set 0I — ISO 15: 7-bit Italian
Symbol Set 0K — ISO 14: 7-bit Japanese Katakana
Symbol Set 0N — ISO 8859-1 Latin 1
Symbol Set 0R — ISO 8859-5 Latin/Cyrillic
Symbol Set 0S — ISO 11: 7-bit Swedish
Symbol Set 0U — ISO 6: 7-bit U.S.
Symbol Set 0V — Arabic
Symbol Set 1D — ISO 61: 7-bit Norwegian
Symbol Set 1E — ISO 4: 7-bit U. K.
Symbol Set 1F — ISO 69: 7-bit French
Symbol Set 1G — ISO 21: 7-bit German
Symbol Set 1K — ISO 13: 7-bit Japanese Latin
Symbol Set 1T — Windows Thai
Symbol Set 2K — ISO 57: 7-bit Simplified Chinese Latin
Symbol Set 2N — ISO 8859-2 Latin 2
Symbol Set 2S — ISO 17: 7-bit Spanish
Symbol Set 2U — ISO 2: 7-bit International Reference Version
Symbol Set 3N — ISO 8859-3 Latin 3
Symbol Set 3R — PC-866 Russia
Symbol Set 3S — ISO 10: 7-bit Swedish
Symbol Set 4N — ISO 8859-4 Latin 4
Symbol Set 4S — ISO 16: 7-bit Portuguese
Symbol Set 5M — PS Math Symbol
Symbol Set 5N — ISO 8859-9 Latin 5
Symbol Set 5S — ISO 84: 7-bit Portuguese
Symbol Set 5T — Windows 3.1 Latin-5
Symbol Set 6J — Microsoft Publishing
Symbol Set 6M — Ventura Math
Symbol Set 6N — ISO 8859-10 Latin 6
Symbol Set 6S — ISO 85: 7-bit Spanish
Symbol Set 7H — ISO 8859-8 Latin/Hebrew
Symbol Set 9E — Windows 3.1 Latin 2
Symbol Set 9G — Windows 98 Greek
Symbol Set 9J — PC 1004
Symbol Set 9L — Ventura ITC Zapf Dingbats
Symbol Set 9N — ISO 8859-15 Latin 9
Symbol Set 9R — Windows 98 Cyrillic
Symbol Set 9U — Windows 3.0
Symbol Set 10G — PC-851 Latin/Greek
Symbol Set 10J — PS Text
Symbol Set 10L — PS ITC Zapf Dingbats
Symbol Set 10N — ISO 8859-5 Latin/Cyrillic
Symbol Set 10R — PC-855 Cyrillic
Symbol Set 10T — Teletex
Symbol Set 10U — PC-8
Symbol Set 10V — CP-864
Symbol Set 11G — CP-869
Symbol Set 11J — PS ISO Latin-1
Symbol Set 11N — ISO 8859-6 Latin/Arabic
Symbol Set 12G — PC Latin/Greek
Symbol Set 12J — MC Text
Symbol Set 12N — ISO 8859-7 Latin/Greek
Symbol Set 12R — PC Gost
Symbol Set 12U — PC-850 Latin 1
Symbol Set 13J — Ventura International
Symbol Set 13R — PC Bulgarian
Symbol Set 13U — PC-858 Latin 1 + €
Symbol Set 14J — Ventura U. S.
Symbol Set 14L — Windows Dingbats
Symbol Set 14P — ABICOMP International
Symbol Set 14R — PC Ukrainian
Symbol Set 15H — PC-862 Israel
Symbol Set 16U — PC-857 Latin 5
Symbol Set 17U — PC-852 Latin 2
Symbol Set 18N — UTF-8
Symbol Set 18U — PC-853 Latin 3
Symbol Set 19L — Windows 98 Baltic
Symbol Set 19M — Windows Symbol
Symbol Set 19U — Windows 3.1 Latin 1
Symbol Set 20U — PC-860 Portugal
Symbol Set 21U — PC-861 Iceland
Symbol Set 23U — PC-863 Canada - French
Symbol Set 24Q — PC-Polish Mazowia
Symbol Set 25U — PC-865 Denmark/Norway
Symbol Set 26U — PC-775 Latin 7
Symbol Set 27Q — PC-8 PC Nova
Symbol Set 27U — PC Latvian Russian
Symbol Set 28U — PC Lithuanian/Russian
Symbol Set 29U — PC-772 Lithuanian/Russian

Code pages from other vendors

These code pages are independent assignments by third party vendors. Since the original IBM PC code page was not really designed for international use, several partially compatible country or region specific variants emerged.
These code pages number assignments are not official neither by IBM, neither by Microsoft and almost none of them is referred as a usable character set by IANA. The numbers assigned to these code pages are arbitrary and may clash to registered numbers in use by IBM or Microsoft. Some of them may predate codepage switching being added in DOS 3.3.

100 – DOS Hebrew hardware fontpage
111 – DOS Greek
112 – DOS Turkish
113 – DOS Yugoslavian
151 – DOS Nafitha Arabic
152 – DOS Nafitha Arabic
161 – DOS Arabic
162 – DOS Arabic with vowel diacritics
163 – DOS Arabic and French
164 – DOS Arabic and French with vowel diacritics
165 – DOS Arabic
166 – IBM Arabic PC
190 – DEC DOS German
210 – DEC DOS Greek
220 – DEC DOS Spanish
489 – Czechoslovakian
620 – DOS Polish (Mazovia)
667 – DOS Polish (Mazovia)
668 – DOS Polish
706 – MS-DOS Server Arabic Sakhr
707 – MS-DOS Arabic Sakhr
709 – MS-DOS Arabic
710 – MS-DOS Arabic
711 – MS-DOS Arabic Nafitha Enhanced
714 – MS-DOS Arabic Sakr
715 – MS-DOS Arabic APTEC
721 – MS-DOS Arabic Nafitha International
768 – Arabic Al-Arabi
770 – DOS Estonian, Latvian, Lithuanian
771 – DOS Lithuanian/Cyrillic — KBL
772 – DOS Lithuanian/Cyrillic
773 – DOS Latin-7 — KBL
774 – DOS Lithuanian
775 – DOS Latin-7 Baltic Rim
776 – DOS Lithuanian
777 – DOS Accented Lithuanian — KBL
778 – DOS Accented Lithuanian
790 – DOS Polish (Mazovia) with curly quotation marks
854 – Spanish
881 – Latin 1
882 – Latin 2
883 – Latin 3
884 – Latin 4
885 – Latin 5
895 – Czech (Kamenický),
896 – DOS Polish (Mazovia)
900 – DOS Russian
928 – Greek ; same as Greek National Standard ELOT 928
966 – Saudi Arabian
972 – Hebrew
991 – DOS Polish (Mazovia)
999 – DOS Serbo-Croatian I ; also known as PC Nova and CroSCII; lower part is JUSI.B1.002, upper part is code page 437; supports Slovenian and Serbo-Croatian
1001 – Arabic
1261 – Windows Korean IBM-1261 LMBCS-17, similar to 1363
1270 – Windows Sámi
1300 – ANSI
2001 – Lithuanian KBL ; same as code page 771
3001 – Estonian 1 ; same as code page 1116
3002 – Estonian 2 ; same as code page 922
3011 – Latvian 1 ; same as code page 437-Latvian
3012 – Latvian-2 ; same as code page 866-Latvian
3021 – Bulgarian ; same as MIK
3031 – Hebrew ; same as code page 862
3041 – Maltese ; same as ISO 646 Maltese
3840 – IBM-Russian ; nearly the same as CP 866
3841 – Gost-Russian ; GOST 13052 plus characters for Central Asian languages
3843 – Polish ; same as Mazovia
3844 – CS2 ; same as Kamenický
3845 – Hungarian ; same as CWI
3846 – Turkish ; same as PC-8 Turkish + old Turkish Lira sign at code point A8
3847 – Brazil-ABNT ; same as the Brazilian National Standard NBR-9614:1986
3848 – Brazil-ABICOMP ; same as ABICOMP
3850 – Standard KU ; variation of the Kasetsart University encoding for Thai
3860 – Rajvitee KU ; variation of the Kasetsart University encoding for Thai
3861 – Microwiz KU ; variation of the Kasetsart University encoding for Thai
3863 – STD988 TIS ; variation of the TIS 620 encoding for Thai
3864 – Popular TIS ; variation of the TIS 620 encoding for Thai
3865 – Newsic TIS ; variation of the TIS 620 encoding for Thai
28799 – FOCAL ; same as FOCAL character set
28800 – HP RPL ; same as RPL
– CWI-2 supports Hungarian
– MIK supports Bulgarian
– DOS Serbo-Croatian II; supports Slovenian and Serbo-Croatian
— Russian Alternative code page ; this is the origin for IBM CP 866

List of code page assignments

List of known code page assignments :

ID	Names	Description	Origin	Platform	DOS	OS/2	Windows	Mac	Else	Encoding	Comment
0		Reserved	IBM, Microsoft		3.3+	1.0+					Internal OS use
437	CP437, IBM437	PC US	IBM	IBM PC	3.3+	1.0+				8-bit SBCS
57344 - 61439		Private use derivations	IBM								Private use code page derivations
65280 - 65533		Private use definitions	IBM								Private use code page definitions
65534		Reserved	IBM, Microsoft								Internal OS use
65535		Reserved	IBM, Microsoft		3.3+	1.0+					Internal OS use

Criticism

Many older character encodings suffer from several problems. Some vendors insufficiently document the meaning of all code point values in their code pages, which decreases the reliability of handling textual data consistently through various computer systems. Some vendors add proprietary extensions to established code pages, to add or change certain code point values: for example, byte 0x5C in Shift JIS can represent either a back slash or a yen sign depending on the platform. Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.
Applications may also mislabel text in Windows-1252 as ISO-8859-1. The only difference between these code pages is that the code point values in the range 0x800x9F, used by ISO-8859-1 for control characters, are instead used as additional printable characters in Windows-1252 – notably for quotation marks, the euro sign and the trademark symbol among others. Browsers on non-Windows platforms would tend to show empty boxes or question marks for these characters, making the text hard to read. Most browsers fixed this by ignoring the character set and interpreting as Windows-1252 to look acceptable. In HTML5, treating ISO-8859-1 as Windows-1252 is even codified as a W3C standard. Although browsers were typically programmed to deal with this behaviour, this was not always true of other software. Consequently, when receiving a file transfer from a Windows system, non-Windows platforms would either ignore these characters or treat them as a standard control characters and attempt to take the specified control action accordingly.
Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. UTF-8 has replaced the code-page method in terms of popularity on the Internet.

Private code pages

When, early in the history of personal computers, users did not find their character encoding requirements met, private or local code pages were created using terminate-and-stay-resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented.
When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.
In order to overcome such problems, the IBM Character Data Representation Architecture level 2 specifically reserves ranges of code page IDs for user-definable and private-use assignments. Whenever such code page IDs are used, the user must not assume that the same functionality and appearance can be reproduced in another system configuration or on another device or system unless the user takes care of this specifically.
The code page range 57344-61439 is officially reserved for user-definable code pages, whereas the range 65280-65533 is reserved for any user-definable "private use" assignments.
For example, a non-registered custom variant of code page 437 or 28591 could become 57781 or 61359, respectively, in order to avoid potential conflicts with other assignments and maintain the sometimes existing internal numerical logic in the assignments of the original code pages. An unregistered private code page not based on an existing code page, a device specific code page like a printer font, which just needs a logical handle to become addressable for the system, a frequently changing download font, or a code page number with a symbolic meaning in the local environment could have an assignment in the private range like 65280.
The code page IDs 0, 65534 and 65535 are reserved for internal use by operating systems such as DOS and must not be assigned to any specific code pages.