Chinese character orders


Chinese character order, or Chinese character indexing, Chinese character collation and Chinese character sorting, is the way in which a Chinese character set is sorted into a sequence for the convenience of information retrieval. It may also refer to the sequence so produced.
English dictionaries and indexes are normally arranged in alphabetical order for quick lookup, but Chinese is written in tens of thousands of different characters, not just dozens of letters in an alphabet, and that makes the sorting job much more challenging.
The orders or sorting methods of Chinese dictionaries are traditionally divided into three categories:
  • Form-based orders, including stroke-based orders and component-based orders, which further includes radical-based orders, etc.
  • Sound-based orders, including Pinyin-based order and Bopomofo-based order
  • Meaning-based orders
In modern Chinese, people also use frequency orders, where words or characters are sorted by their frequencies of use in a text corpus. There is also computer-based sorting and lookup.
Chinese dictionaries include character dictionaries and word dictionaries. Chinese word orders are based on character orders. Single-character words are arranged by character sorting directly, and multi-character words can be sorted character by character in a similar way.
In the following sections, there is a general introduction to the orders and sorting methods currently in use, focused on those which are more popular and effective.

Form-based orders

In this category of orders, Chinese characters are sorted according to various features of their forms or shapes. There are two subcategories of form-based orders: stroke-based orders and component-based orders.

Stroke-based orders

In stroke-based orders, Chinese characters are sorted by different features of strokes, including stroke counts, stroke forms, stroke orders, stroke combinations, stroke positions, etc.

Stroke-count order

In this order, Chinese characters are sorted by their stroke count ascendingly. A character with less strokes is put before those of more strokes. For example, the different characters in "漢字筆劃, 汉字笔画 " are sorted into "汉字画笔漢", where stroke counts are put in brackets..

Stroke-count-stroke-order sorting

This is a combination of stroke-count sorting and stroke-order sorting. Characters are first arranged by stroke-counts ascendingly. Then Stroke-order sorting is employed to sort characters with the same number of strokes. The characters are first arranged by their first strokes according to an order of stroke groups, shu, pie, dian, zhe ”, or “dian, heng, shu, pie, zhe, if the first strokes belong to the same group, then sort by their second strokes in a similar way, and so on. In our example of the previous section, both 筆 and 畫 are of 12 strokes. 筆 starts with stroke ㇓of the pie group, and 畫 starts with ㇕ of the zhe group, and pie is before zhe in the groups order, so 筆 comes before 畫. Hence the different characters in "汉字笔画, 漢字筆劃" are finally sorted into "汉字画笔筆畫漢", where each character is put at its unique position.

GB stroke-based order

GB Stroke-Based Order, full name GB13000.1 Character Set Chinese Character Order is a standard released by the National Language Commission of China in 1999. This is an enhanced version of stroke-count-stroke-order sorting. According to this standard, the characters are first sorted by stroke counts, followed by stroke order. Then if there are characters of the same stroke count and stroke order, they will be sorted by the primary-secondary stroke order. For example, 子 and 孑 are both of 3 strokes and have the same five-group stroke order, but according to the rule of primary-secondary stroke order, primary stroke ㇐ is before secondary stroke ㇀. So 子 comes before 孑. If two characters are of the same stroke count, stroke order and primary-secondary stroke, then sort them according to the mode of stroke combination. Stroke separation precedes stroke connection, and connection precedes intersection. For example: 八 is before 人, which is before 乂. And there are other sorting rules for more accurate sorting.

YES order

YES is a simplified stroke-based sorting method free from stroke counting and grouping, without compromising accuracy. And it has been successfully applied to the indexing of all the characters in the Xinhua Zidian and Xiandai Hanyu Cidian. In this joint index you can look up a Chinese character to find its pinyin and Unicode, in addition to the page numbers in the two popular dictionaries.

Component-based orders

In this category, characters are sorted by one or more components.

Radical-based orders

A radical is a common component shared by a group of characters. The radical usually lies on the upper part or left side of a character and helps to express its meaning. For example, 花, 草, 菜 all have the radical of 艹, which indicates they are related to plant; 推, 拉, 打 share the radical of 扌, and are actions normally involving hands. In radical-based order, all the characters sharing a radical are put under that radical to form a radical family or section. Different families are arranged by their leading radicals in stroke-based order, and characters inside a family are also sorted by their strokes.
In many contemporary dictionaries, including Xinhua Zidian, Xiandai Hanyu Cidian and Oxford Chinese Dictionary, the radical-based character lookup system consists of three indexes or tables: a radical index, a character lookup index, and an index of characters with radicals difficult to find, all sorted in stroke-based order. To lookup a character in a dictionary, first find out its radical. Count its number of strokes and find it in the radical index in stroke-based order. When found, get its page number on the right side. Then, according to the page number, find the radical family in the character lookup table in stroke-based order. Count the number of strokes in the remaining parts of the character and find the target character within the family. And the page number on the right is the page number in the dictionary main body for the entry of the character. Characters with radicals difficult to find out can be looked up in the Index of Characters with Radicals Difficult to Find in stroke-based order.
The first radical system in history was created by a Chinese Scholar Xu Shen in his Shuowen Jiezi dictionary almost two thousand years ago in the Eastern Han Dynasty. This dictionary is still available today, with a total number of 540 radicals. Another milestone is the Kangxi radical system employed in the Kangxi Dictionary in 1716 in the era of Emperor Kangxi, with the number of radicals reduced to 214. The Kangxi radical sorting method is still in use in China, Japan and Korea. It is also used by the Unicode collation algorithm to sort CJK Unified Ideographs. The latest standard radical table of Chinese Mainland is the Table of Indexing Chinese Character Components with a list of 201 radicals.

Four-corner order

Chinese characters are written in the form of a square block. The Four-Corner Method assigns a 4-digit code to a character, each digit representing one corner of the block. The four corner digits appear in the sequence of "upper-left, upper-right, lower-left and lower-right". For example, the code of character 顏 is 0128, where the first digit 0 represents the upper-left component 亠, 1 for the upper right 一, 2 for the lower-left ㇓, and 8 represents the lower-right 八.
A fifth digit can be added to represent an extra part above the lower-right corner to gain higher sorting accuracy. For example, the extended code of character 佳 is 24214, where the fifth digit 4 represents component 十 above the final 一 in the lower-right corner.
When a set of characters are encoded in four-corner codes, they are sorted ascendingly into a four-corner order by the first four digits.

Cangjie-code order

In this method, Chinese characters are arranged alphabetically by their codes used in Cangjie input method. The Cangjie code of a character is a string of English letters each representing a selected Cangjie component in the character. For example, the Cangjie codes of the characters in 漢字排檢法 are 漢字排檢法, and can be sorted into a Cangjie-code order of 檢法漢字排.
Compared with sound-based orders, form-based orders are usually more complicated, but have the advantages of allowing character and word lookup without knowing its pronunciation, and effective collation of large character sets without support from other sorting methods.

Sound-based orders

There are two sound representation systems currently in use for Standard Chinese, i.e., pinyin and bopomofo. Accordingly, we have two methods of sound-based sorting for Standard Chinese.

Pinyin-based order

In this method, Chinese characters are sorted by their Pinyin alphabetically, for example, 汉字拼音排序法 is sorted into "法汉排拼序音字" with pinyin in brackets. Pinyin expressions of similar letters are ordered by their tones in the order of "tone 1, tone 2, tone 3, tone 4 and tone 5 ", such as "妈, 麻, 马, 骂, 吗". Characters of the same sound, i.e., same Pinyin letters and tones, are normally sorted by a stroke-based method.
Words of multiple characters can be sorted in two different ways. One is to sort character by character. If the first characters are the same, then sort by the second character, and so on. For example, "归并, 归还, 规划, 鬼话, 桂花 ". This method is used in Xiandai Hanyu Cidian. Another method is to sort according to the pinyin letters of the whole words, followed by sorting on tones when word pinyin letters are the same. For example, "归并, 规划, 鬼话, 桂花, 归还 ". This method is used in the ABC Chinese–English Dictionary.
Pinyin-based sorting is very convenient for looking up characters or words of which you know their pronunciation and Pinyin expressions. But you can not find words of which you do not know the sound.