Chinese character components


In written Chinese, components are building blocks of characters, composed of strokes.
In most cases, a component consists of more than one stroke, and is smaller than the whole of the character. For example, the character consists of two components: and. These can be further decomposed: can be analyzed as the sequence of strokes, and as the sequence.
There are two methods for Chinese character component analysis, hierarchical dividing and plane dividing. Hierarchical dividing separates layer by layer from larger to smaller components, and finally gets the primitive components. Plane dividing separates out the primitive components at one time.
The structure of a Chinese character is the pattern or rule in which the character is formed by its components. Chinese character structures include single-component structure, left-right structure, up-down structure and surrounding structure.

Analysis

Chinese characters may be analyzed in terms of smaller components. This analysis is generally based on graphical forms, without considering aspects like pronunciation and meaning.
Component analysis is very helpful for learning Chinese characters. For example:
  • →+
  • →+
  • →+
Through component analysis, one may learn characters in an easier way. If a student learns first, the knowledge will help with the learning or review of,, and. Obviously, learning by component analysis is much more efficient than learning by analyzing each character to strokes. Component analysis is also used in Chinese character encoding for computer input.
There are two methods for Chinese character dividing, hierarchical dividing and plane dividing. Hierarchical dividing separates layer by layer from large to small components, and finally gets the primitive components. Plane dividing separates out the primitive components at one time. Hierarchical dividing can display the external structure of Chinese characters, while plane splitting can be regarded as omitting the higher splitting levels, and directly writing out the final separating result of primitive components.

Rules for division

The rules for hierarchical dividing include:
  • The separation space ditch/gap is an obvious boundary, where the character is split into components.
  • If there is only one separation ditch, split into two components along the separation ditch. For example: →+, →+.
  • When there is more than one separation ditch, divide along the longer one first. For example: → +labels=no →+, to get the hierarchical structure of with two layers of components.
  • When several separation ditches are parallel and equal in length, divide along all of them. For example: →++.
  • Intersecting stroke groups are not divided, for example, and are primitive components.
  • The lower bound of dividing is generally greater than single strokes, and components with only two strokes, such as "labels=no", are not to be separated.
  • Hierarchical analysis should conform to the basic structure of Chinese characters. For example: the outermost layer of "" is in a left-right structure, so the left and right separation is employed first: →+, followed by the inside-outside division, although the latter's L-shaped separation ditch may be longer.
  • A character containing multi-level components are divided from larger to smaller sizes to generate first-level components, second-level components, third-level components, etc.

    An example

The hierarchical analysis of character in bracketed representation:
)+), 5 layers of components.
or in tree structure:

/ \

/ \

/ \ / \

/ \ / \

/ \

The level to which a Chinese character is to be analyzed or divided depends on actual applications.
In plane analysis, only components on the tree-leaves are presented, i.e.,
:,,,,,,,.

Analysis data of the ''Cihai''

The following is the analysis data of Cihai, with a character set of 16,339 traditional and simplified Chinese characters.
component leveldifferent componentstotal components
1306132065
2130234296
353916777
41953872
548396
612184
736

In most cases, a component is larger than a stroke and smaller than the whole character. The condition for a single stroke to be a component is: occupies a relatively independent location usually occupied by a multiple-stroke component in a character. For example: the top stroke in character, the bottom in, the left in, the right ㇟ in, the central ㇔ in, and the outer ㇆ in. In the special cases of one-stroke characters, such as and, a stroke is a component and is a character.

Classification of components

Character components and non-character components

A component that can independently form a character is a character component, or a component of independent character formation. For example, component labels=no formed character labels=no independently, and is a component in characters labels=no, labels=no and labels=no; and component labels=no is also a character by itself, and a component in labels=no, labels=no and labels=no.
A component that can not independently form a character is a non-character component, or a component of dependent character formation. For example, component labels=no in character labels=no, labels=no and labels=no; and component labels=no in labels=no, labels=no and labels=no. Neither labels=no nor labels=no is a character in modern Chinese.

Primitive components and Compound components

A component that cannot be divided into smaller components by the rules is a primitive component, or basic component. Primitive components are the final-level components of hierarchical dividing. For example, components labels=no and labels=no in character labels=no, and labels=no in character labels=no.
A component composed of two or more primitive components is a compound component. For example, component labels=no in character labels=no, labels=no and labels=no, and component labels=no in labels=no, labels=no and labels=no.

Hierarchy of components

A component divided out at the first level is called a level-one component, a component divided out at the second level is called a level-two component, and so on. A component divided out at the final level is called a final-level component, i.e., primitive component. For example, in the example of character labels=no,
labels=no
/ \
labels=no labels=no
/ \
labels=no
/ \ / \
labels=no labels=no labels=no labels=no
/ \ / \
labels=no labels=no labels=no labels=no
/ \
labels=no labels=no
where the leaf components labels=no, labels=no, labels=no, labels=no, labels=no, labels=no, labels=no and labels=no are final-level components or primitive components.

Single-stroke components and multi-stroke components

A component formed by one stroke is called a single-stroke component. For example,
stroke ㇐ in character labels=no, stroke ㇑ in character labels=no, stroke ㇓ in character labels=no, stroke ㇔ in character labels=no, stroke ㇆ in character labels=no.
A component formed by more than one stroke is called a multi-stroke component. For example,
component labels=no in character labels=no, labels=no in character labels=no, and labels=no of labels=no.

Primitive components

Among the 16,339 traditional, simplified and unsimplified characters in Cihai, there are 675 primitive components; among the 11,834 characters excluding the simplified traditional characters, there are 648 primitive components.
In Chinese Character Information Dictionary, among the 7,785 China Mainland standard characters, a total of 623 primitive components have been divided out.
serial numbercomponentscharacters composedfrequency
1labels=no240920.3579%
2labels=no127910.8089%
3labels=no8126.8622%
4labels=no7916.6841%
5labels=no7746.5404%
6labels=no7666.4736%
7labels=no6915.8391%
8labels=no6795.7383%
9labels=no6425.4252%
10labels=no5975.0457%

.

Component standards

Chinese character components are widely used in Chinese character keyboard encoding input methods. Different encoding input methods have different ways for component separation. Therefore, it is necessary to formulate norms or standards related to Chinese character components.
"Chinese Character Component Standard of GB13000.1 Character Set for Information Processing" is a standard released on February 1, 1997, by the National Language Commission of China. It includes a "List of Chinese Character Primitive Components". The list contains 560 primitive components. All the 20,902 CJK Chinese characters in the GB13000.1 character set can be formed with these components. This standard is mainly for Chinese information processing.
Another important standard is the "Specification of Common Modern Chinese Character Components and Component Names" formulated by the National Language Commission in 2009. It includes a list of 514 primitive components of commonly used characters and component names. This standard is mainly for Chinese character education and dictionary collation.