Chinese character description languages


Several systems have been proposed for describing the internal structure of Chinese characters, including their strokes, components, and the stroke order, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point by Unicode and ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work for regular script, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.

CDL

Character Description Language is an XML-based declarative language co-created by Tom Bishop and Richard Cook for the Wenlin Institute. It defines characters by the arrangement of components, which are not required to reflect the semantic or etymological history of the character. In order for a component to fit into the allotted portion of the whole character's square, a set of fewer than 50 strokes allows one to construct approximately 1,000 components, which may in turn describe tens of thousands of characters.

Ideographic Description Sequences

Chapter 18 of The Unicode Standard defines the "Ideographic Description Sequences" syntax used to describe characters in featural terms, by arrangements of components with code points. Sixteen special characters in the range U+2FF0..U+2FFF act as prefix operators to combine other characters or sequences to form larger characters.
CharacterUnicode Character NumberFull Unicode Name
U+2FF0Ideographic description character left to right
U+2FF1Ideographic description character above to below
U+2FF2Ideographic description character left to middle and right
U+2FF3Ideographic description character above to middle and below
U+2FF4Ideographic description character full surround
U+2FF5Ideographic description character surround from above
U+2FF6Ideographic description character surround from below
U+2FF7Ideographic description character surround from left
U+2FFCIdeographic description character surround from right
U+2FF8Ideographic description character surround from upper left
U+2FF9Ideographic description character surround from upper right
U+2FFAIdeographic description character surround from lower left
U+2FFDIdeographic description character surround from lower right
U+2FFBIdeographic description character overlaid
U+2FFEIdeographic description character horizontal reflection
⿿U+2FFFIdeographic description character rotation

Two additional ideographic description characters are scattered in other Unicode blocks. is not officially an ideographic description character, but is sometimes used in ideographic description sequences.
CharacterCode pointBlockName
U+303ECJK Symbols and PunctuationIdeographic variation indicator
U+31EFCJK StrokesIdeographic description character subtraction

These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, the sawndip character

Works cited

  • *
*