Control character


In computing and telecommunications, a control character or non-printing character is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7,, which might ring a bell.

History

in Morse code are a form of control character.
A form of control characters were introduced in the 1870 Baudot code: NUL and DEL.
The 1901 Murray code added the carriage return and line feed, and other versions of the Baudot code included other control characters.
The bell character, which rang a bell to alert operators, was also an early teletype control character.
Some control characters have also been called "format effectors".

In ASCII

There were quite a few control characters defined. This was because early terminals had very primitive mechanical or electrical controls that made any kind of state-remembering API quite expensive to implement, thus a different code for each and every function was a requirement. All entries in the ASCII table below code 3210 are control characters, including CR and LF used to separate lines of text. The code 12710 is also a control character.
Extended ASCII sets defined by ECMA-35 and ISO 8859 added the codes 12810 through 15910 as control characters. This was primarily done so that if the high bit was stripped, it would not change a printing character to a C0 control code. This second set is called the C1 set.
IBM's EBCDIC character set contains 65 control codes, including all of the ASCII C0 control codes plus additional codes which were not added to Unicode. There were also a number of attempts to define alternative sets of 32 control codes, none of these were transferred to Unicode either.
Only a small subset of the control characters are still in use for anything resembling their original purpose:
  • 0x00, originally intended to be an ignored character, but now used by many programming languages including C to mark the end of a string.
  • 0x04 End Of File character on Unix terminals.

  • 0x07, which may cause the device to emit a warning such as a bell or beep sound or the screen flashing.
  • 0x08, may overprint the previous character.
  • 0x09, moves the printing position right to the next tab stop.
  • 0x0A, moves the print head down one line. Used as the end of line marker in Unix-like systems.
  • 0x0B, vertical tabulation.
  • 0x0C, to cause a printer to eject paper to the top of the next page, or a video terminal to clear the screen.
  • 0x0D, moves the printing position to the start of the line, allowing overprinting. Used as the end of line marker in Classic Mac OS, OS-9, FLEX. A pair is used by CP/M-80 and its derivatives including DOS and Windows.
  • 0x1B. Introduces an escape sequence.
Control characters may do something when the user inputs them, such as Ctrl+C to interrupt the running process, and Ctrl+Z for ending typed-in file on Windows. These uses usually have little to do with their ASCII definition. Modern systems often describe shortcuts as though they are control characters but the code number is not even used to implement this.

In Unicode

These 65 control codes were carried over to Unicode. "Control-characters" are U+0000—U+001F, U+007F, and U+0080—U+009F. Their General Category is "Cc". The Cc control characters have no Name in Unicode, but are given labels such as "" instead.
Unicode added more characters that could be considered controls, but it makes a distinction between these "Formatting characters" and the 65 control characters. These are General Category "Cf" instead of "Cc".

Display

There are a number of techniques to display non-printing characters, which may be illustrated with the bell character in ASCII encoding:
ASCII-based keyboards have a key labelled "Control", "Ctrl", or "Cntl" which is used much like a shift key, being pressed in combination with another letter or symbol key. In one implementation, the control key generates the code 64 places below the code for the uppercase letter it is pressed in combination with. The other implementation is to take the ASCII code produced by the key and bitwise AND it with 0x1F, forcing bits 5 to 7 to zero. For example, pressing "control" and the letter "g", produces the code 7. The NULL character is represented by Ctrl-@, "@" being the code immediately before "A" in the ASCII character set. For convenience, some terminals accept Ctrl-Space as an alias for Ctrl-@. In either case, this produces one of the 32 ASCII control codes between 0 and 31. Neither approach works to produce the DEL character because of its special location in the table and its value, Ctrl-? is sometimes used for this character.
When the control key is held down, letter keys produce the same control characters regardless of the state of the shift or caps lock keys. In other words, it does not matter whether the key would have produced an upper-case or a lower-case letter. The interpretation of the control key with the space, graphics character, and digit keys varies between systems. Some will produce the same character code as if the control key were not held down. Other systems translate these keys into control characters when the control key is held down. The interpretation of the control key with non-ASCII keys also varies between systems.
Control characters are often rendered into a printable form known as caret notation by printing a caret and then the ASCII character that has a value of the control character plus 64. Control characters generated using letter keys are thus displayed with the upper-case form of the letter. For example, ^G represents code 7, which is generated by pressing the G key when the control key is held down.
Keyboards also typically have a few single keys which produce control character codes. For example, the key labelled "Backspace" typically produces code 8, "Tab" code 9, "Enter" or "Return" code 13.
Many keyboards include keys that do not correspond to any ASCII printable or control character, for example cursor control arrows and word processing functions. The associated keypresses are communicated to computer programs by one of four methods: appropriating otherwise unused control characters; using some encoding other than ASCII; using multi-character control sequences; or using an additional mechanism outside of generating characters. "Dumb" computer terminals typically use control sequences. Keyboards attached to stand-alone personal computers made in the 1980s typically use one of the first two methods. Modern computer keyboards generate scancodes that identify the specific physical keys that are pressed; computer software then determines how to handle the keys that are pressed, including any of the four methods described above.

The design purpose

The control characters were designed to fall into a few groups: printing and display control, data structuring, transmission control, and miscellaneous.

Printing and display control

Printing control characters were first used to control the physical mechanism of printers, the earliest output device. An early example of this idea was the use of Figures and Letters in Baudot code to shift between two code pages. A later, but still early, example was the out-of-band ASA carriage control characters. Later, control characters were integrated into the stream of data to be printed.
The carriage return character, when sent to such a device, causes it to put the character at the edge of the paper at which writing begins.
The line feed character causes the device to put the printing position on the next line. It may, depending on the device and its configuration, also move the printing position to the start of the next line.
The vertical and horizontal tab characters cause the output device to move the printing position to the next tab stop in the direction of reading.
The form feed character starts a new sheet of paper, and may or may not move to the start of the first line.
The backspace character moves the printing position one character space backwards. On printers, including hard-copy terminals, this is most often used so the printer can overprint characters to make other, not normally available, characters. On video terminals and other electronic output devices, there are often software configuration choices that allow a destructive backspace, which erases, or a non-destructive one, which does not.
The shift in and shift out characters selected alternate character sets, fonts, underlining, or other printing modes. Escape sequences were often used to do the same thing.
With the advent of computer terminals that did not physically print on paper and so offered more flexibility regarding screen placement, erasure, and so forth, printing control codes were adapted. Form feeds, for example, usually cleared the screen, there being no new paper page to move to. More complex escape sequences were developed to take advantage of the flexibility of the new terminals, and indeed of newer printers. The concept of a control character had always been somewhat limiting, and was extremely so when used with new, much more flexible, hardware. Control sequences could match the new flexibility and power and became the standard method. However, there were, and remain, a large variety of standard sequences to choose from.