Cangjie input method


The Cangjie input method is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name Cangjie is sometimes abbreviated as cj.
The input method was invented in 1976 by Chu Bong-Foo, and named after Cangjie, the mythological inventor of the Chinese writing system, at the suggestion of Chiang Wei-kuo, the former Defense Minister of Taiwan. Chu Bong-Foo released the patent for Cangjie in 1982, as he thought that the method should belong to Chinese cultural heritage. Therefore, Cangjie has become open-source software and is on every computer system that supports traditional Chinese characters, and it has been extended so that Cangjie is compatible with the simplified Chinese character set.
File:HK 石塘咀市政大廈 Shek Tong Tsui Municipal Services Building 電腦鍵盤 Chinese input keyboard Jan-2012.jpg|thumb|250px|A Chinese keyboard in Shek Tong Tsui Municipal Services Building, Hong Kong with Cangjie hints printed on the lower-left corners of the keys.
Cangjie is the first Chinese input method to use the QWERTY keyboard. Chu saw that the QWERTY keyboard had become an international standard, and therefore believed that Chinese-language input had to be based on it. Other, earlier methods use large keyboards with 40 to 2400 keys, except the Four-Corner Method, which uses only number keys.
Unlike the Pinyin input method, Cangjie is based on the graphological aspect of the characters: each graphical unit, called a "radical", is re-parented by a basic character component, 24 in total, each mapped to a particular letter key on a standard QWERTY keyboard. An additional "difficult character" function is mapped to the X key. Keys are categorized into four groups, to facilitate learning and memorization. Assigning codes to Chinese characters is done by separating the constituent "radicals" of the characters.

Overview

Keys and radicals

The basic character components in Cangjie are called radicals or letters. There are 24 radicals but 26 keys; the 24 radicals are associated with roughly 76 auxiliary shapes, which in many cases are either rotated or transposed versions of components of the basic shapes. For instance, the letter A can represent either itself, the slightly wider, or a 90° rotation of itself.
The 24 keys are placed in four groups:
  • Philosophical Group – corresponds to the letters 'A' to 'G' and represents the sun, the moon, and the five elements
  • Strokes Group – corresponds to the letters 'H' to 'N' and represents the brief and subtle strokes
  • Body-Related Group – corresponds to the letters 'O' to 'R' and represents various parts of the human anatomy
  • Shapes Group – corresponds to the letters 'S' to 'Y' and represents complex and enclosed character forms
The auxiliary shapes of each Cangjie radical have changed slightly across different versions of the Cangjie method. Thus, this is one reason that different versions of the Cangjie method are not completely compatible.
Chu also provided alternate names for some letters according to their characteristics as a mnemonics. They form a rhyme to help learners memorize the letters, each group being in a line:

Keyboard layout

Basic rules

There are several general decomposition rules that define how to analyze a character to arrive at a Cangjie code, as follows:
  • Order of decomposition – left to right, top to bottom, and outside to inside.
  • Geometrically connected forms - identify components and break up the character, i.e. 想→相+心.
  • * First component – usually the upper-most or the left-most part according to rule Order of decomposition, i.e. 相.
  • * The body – except the first component, i.e. 心.
  • Number of codes – take at most 5 codes
  • * For non-geometrically connected forms, take at most 4 codes.
  • * For geometrically connected forms, take at most 5 codes, 2 from the first component and 3 from the body.
  • ** if the first component has more than 2 codes, take the first and the last.
  • ** If the body has more than 3 codes, consider breaking it up further.
  • *** If it can be broken up into second and third components,
  • **** take the first code from the second component and the first and last codes from the third.
  • *** If it cannot be broken up further, take the first, second and last codes.
The rules are subject to various principles:
  • Conciseness – if multiple ways of decomposition are possible, the shorter decomposition is considered to be correct.
  • Completeness – if multiple ways of decomposition with the same length of code are possible, the one that identifies a more complex form first is correct.
  • Reflection of the form of the radical – the decomposition should reflect the shape of the radical, meaning using the same code twice or more should be avoided if possible, and the shape of the character should not be "cut" at a corner in the form.
  • Omission of codes
  • * Partial omission – when the number of codes in a complete decomposition exceeds the permitted number of codes, the extra codes are ignored.
  • * Omission in enclosed forms – when part of the character to be decomposed and the form is an enclosed form, only the shape of the enclosure is decomposed; the enclosed forms are omitted.

    Examples

  • * This character is geometrically connected, consisting of a single vertical structure, so we take the first, second, and last Cangjie codes from top to bottom.
  • * The Cangjie code is thus , corresponding to the basic shapes of the codes in this example.
  • * This character consists of geometrically unconnected parts arranged horizontally. For the initial decomposition, we treat it as two parts, and.
  • * The first part,, is geometrically unconnected from top to bottom; we take the first and last parts and arrive at .
  • * The second part is again geometrically unconnected, arranged horizontally. The two parts are and.
  • ** For the first part of this second part,, we take the first and last codes. Both are slants and therefore H; the first and last codes are thus .
  • ** For the second part of the original second part,, we take only the last part. Because this is geometrically unconnected and consists of two parts, the first part is the outer form while the second part is the dot in the middle. The dot is I, and therefore the last code is .
  • * The Cangjie code is thus , or .
  • * This example is identical to the example just above, except that the first part is ; the first and last codes are and .
  • * Repeating the same steps as in the above example, we get , or .

    Exceptions

Some forms are always decomposed in the same way, whether the rules say they should be decomposed this way or not. The number of such exceptions is small:
Some forms cannot be decomposed. They are represented by an X, which is the key on a Cangjie keyboard.
FormFixed decomposition

Early development

Initially, the Cangjie input method was not intended to produce a character in any character set. Instead, it was part of an integrated system consisting of the Cangjie input rules and a Cangjie controller board. This controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output, using the hi-res graphics mode of the Apple II. In the preface of the Cangjie user's manual, Chu Bong-Foo wrote in 1982:
Image:Mingzhu xiaoziku1.PNG|frame|right|Demonstration of character generator Mingzhus capability to generate characters according to the codes. The first character is ?, which denotes a kind of soup in Xuzhou cuisine.
In this early system, when the user types "yk", for example, to get the Chinese character, the Cangjie codes do not get converted to any character encoding and the actual string "yk" is stored. The Cangjie code for each character the encoding of that particular character.
A particular "feature" of this early system is that, if one sends random lowercase words to it, the character generator will attempt to construct Chinese characters according to the Cangjie decomposition rules, sometimes causing strange, unknown characters to appear. This unintended feature, "automatic generation of characters", is described in the manual and is responsible for producing [|more than 10,000 of the 15,000 characters] that the system can handle. The name Cangjie, evocative of the creation of new characters, was indeed apt for this early version of Cangjie.
The presence of the integrated character generator also explains the historical necessity for the existence of the "X" key, which is used for the disambiguation of decomposition collisions: because characters are "chosen" when the codes are "output", every character that can be displayed must in fact have a unique Cangjie decomposition. It would not make sense—nor would it be practical—for the system to provide a choice of candidate characters when a random text file is displayed, as the user would not know which of the candidates is correct.

Issues

Steep learning curve

Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing. However, many users find Cangjie is difficult to learn and use, with many difficulties caused by poor instruction.
  • In order to input using Cangjie, knowledge of both the names of the radicals as well as their auxiliary shapes is required. It is common to find tables of the Cangjie radicals with their auxiliary shapes taped onto the monitors of computer users.
  • One must also be familiar with the decomposition rules, lack of knowledge of which results in increased difficulty in typing the intended characters.
  • The user cannot type a character that they have forgotten how to write.
With enough practice, users can overcome the above problems. Typical touch-typists can type Chinese at 25 characters per minute, or better, using Cangjie, despite having difficulty remembering the list of auxiliary shapes or the decomposition rules. Experienced Cangjie typists can reportedly attain a typing speed from 60 cpm to over 200 cpm.
According to Chen Minzheng, his teaching experience at Longtian Elementary School in Taitung in 1990, the average typing speed of children was 90 words per minute, and some children even reached more than 130 words per minute.