PETSCII
PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.
This character set was first used by the PET from 1977, and was subsequently used by the CBM-II, VIC-20, Commodore 64, Commodore 16, Commodore 116, Plus/4, and Commodore 128. However, the Amiga personal computer family instead uses standard ISO/IEC 8859-1.
History
The character set was largely designed by Leonard Tramiel and PET designer Chuck Peddle. The graphic characters of PETSCII were one of the extensions Commodore specified for Commodore BASIC when laying out desired changes to Microsoft's existing 6502 BASIC to Microsoft's Ric Weiland in 1977. The VIC-20 used the same pixel-for-pixel font as the PET, although the characters appeared wider due to the VIC's 22-column screen. The Commodore 64, however, used a slightly re-designed, heavy upper-case font, essentially a thicker version of the PET's, in order to avoid color artifacts created by the machine's higher resolution screen. The C64's lowercase characters are identical to the lowercase characters in the Atari 8-bit computers font.Peddle claims the inclusion of card suit symbols was spurred by the demand that it should be easy to write card games on the PET.
Specifications
"Unshifted" PETSCII is based on the 1963 version of ASCII. It has only uppercase letters, an up-arrow instead of caret at 0x5E and a left-arrow instead of an underscore at 0x5F. In all versions except the original Commodore PET, it also has a British pound sign instead of the backslash at 0x5C. Other characters added in ASCII-1967 do not exist in PETSCII. Codes 0xA0-0xDF are allotted to CBM-specific block graphics characters—horizontal and vertical lines, hatches, shades, triangles, circles and card suits.PETSCII also has a "shifted" mode, which changes the uppercase letters at 0x41-0x5A to lowercase, and changes the graphics at 0xC1-0xDA to uppercase letters. Upper- and lower-case are swapped from where ASCII has them. The mode is toggled by holding one of the SHIFT keys and then pressing and releasing the Commodore key. The shift can be done by POKEing location 59468 with the value 14 to select the alternative set or 12 to revert to standard. On the Commodore 64, the sets are alternated by flipping bit 2 of the byte 53272. On some models of PET, this can also be achieved via special control code
PRINT CHR$ which adjust the line spacing as well as changing the character set; the POKE method is still available and does not alter the line spacing.Included in PETSCII are cursor and screen control codes, such as
, , , and . The control codes appeared in program listings as reverse-video graphic characters, although some computer magazines, in their efforts to provide more clearly readable listings, pretty-printed the codes using their actual names in curly braces, like the above examples. This is unambiguous as PETSCII has no curly brace characters.Different mappings are used for storing characters and displaying characters. For example, to display the characters "@ABC" on screen by directly writing into the screen memory, one would POKE the decimal values 0, 1, 2, and 3 rather than 64, 65, 66, and 67.
The keyboard by default provides access to the lower half of the code page. Pressing Shift and a key gives the corresponding upper half code point. Some PETSCII code points cannot be printed and are only used for keyboard input.
Character set
The tables below represent the "interchange" PETSCII encoding, as used byCHR$.Control characters are defined in the ranges 0x00-0x1F and 0x80-0x9F, although which control characters are defined and what they are defined as varies between systems. The tables below exclude control characters—the encoding of control characters in discussed in [|§ Control characters].
The ranges 0x60-0x7F and 0xE0-0xFF are duplicate ranges, although what they duplicate varies between systems. On the Commodore PET, they duplicate 0x20-0x3F and 0xA0-0xBF, respectively; on the Commodore VIC-20, 64, 16, and 128 they duplicate 0xC0-0xDF and 0xA0-0xBF, respectively. While these characters are visually duplicates, they are semantically different; for example, on the Commodore PET, code points 0x2C and 0x6C both produce a comma character, but only 0x2C functions as a delimiter between input fields.
Graphic characters are mostly identical across systems, with the exceptions of 0x5C, 0xDE, and the range 0x60-0x7F. Additionally, in Commodore PET 2001's shifted character set, uppercase and lowercase letters are swapped relative to other systems'.
Unicode equivalents
PETSCII characters are represented in the Unicode standard in various blocks:- Basic Latin
- Latin-1 Supplement
- Greek and Coptic
- General Punctuation
- Arrows
- Box Drawing
- Block Elements
- Geometric Shapes
- Miscellaneous Symbols
- Dingbats
- Symbols for Legacy Computing
Standard
Unshifted
Shifted
Commodore PET
Unshifted
Shifted
Control characters
While the graphic characters were mostly shared between Commodore systems, the control characters frequently varied. The follow table describes what the control characters represent on the Commodore PET 2001, Commodore PET 8032, VIC-20, Commodore 64, Commodore 16, Commodore 128.The colors of the VIC-20 and C64/128 are listed in the VIC-II article.
Base 128
Out of PETSCII's first 192 codes, there are 128 graphic characters: 32–127 and 160–192. This permits "base128"-style encodings in DATA statements, or perhaps between PETSCII-speaking machines. This can also include control characters, which are visible when quoted, although which control characters are defined varies between systems.The primary application for a "Base 128" encoding is in DATA statements in Commodore BASIC. Binary data can be stored with relatively low overhead, allowing one character of data to encode seven bits of data. On a standard 80-character line, typically four characters are used for the line number, and two characters for the abbreviated DATA statement. Since the comma and colon are significant to BASIC, a quote character is also needed, leaving 73 characters for data. At seven bits per character, one DATA line could store 511 bits of binary data, for 79% efficiency. If three-digit line numbers are used, efficiency increases to 80%. If two-digit line numbers are used, efficiency is 82%.
| Line Numbers | Data chars per Line | Bits per Line | Efficiency | Max. Lines | Max. Total Data Bytes |
| 1-9 | 76 | 532 | 0.83125 | 9 | 598 |
| 10-99 | 75 | 525 | 0.820312 | 90 | 5,906 |
| 100-999 | 74 | 518 | 0.809375 | 900 | 58,275 |
| 1000-9999 | 73 | 511 | 0.7984375 | 9,000 | 574,875 |
| 10000-65535 | 72 | 504 | 0.7875 | 55,536 | 3.5 MB |
For storing binary data in Commodore BASIC, it appears that two- or three-digit line numbers are typically the best choice.