Chinese character classification

are generally logographs, but can be further categorized based on the manner of their creation or derivation. Some characters may be analysed structurally as compounds created from smaller components, while some are not decomposable in this way. A small number of characters originate as pictographs and ideographs, but the vast majority are what are called phono-semantic compounds, which involve an element of pronunciation in their meaning.
A traditional six-fold classification scheme was originally popularized in the 2nd century CE, and remained the dominant lens for analysis for almost two millennia, but with the benefit of a greater body of historical evidence, recent scholarship has variously challenged and discarded those categories. In older literature, Chinese characters are often referred to as "ideographs", inheriting a historical misconception of Egyptian hieroglyphs.

Overview

Chinese characters have been used in several different writing systems throughout history. The concept of a writing system includes both the written symbols themselves, called graphemes—which may include characters, numerals, or punctuation—as well as the rules by which they are used to record language. Chinese characters are logographs, which are graphemes that represent units of meaning in a language. Specifically, characters represent the smallest units of meaning in a language, which are referred to as morphemes. Morphemes in Chinese—and therefore the characters used to write them—are nearly always a single syllable in length. In some special cases, characters may denote non-morphemic syllables as well; due to this, written Chinese is often characterised as morphosyllabic. Logographs may be contrasted with letters in an alphabet, which generally represent phonemes, the distinct units of sound used by speakers of a language. Despite their origins in picture-writing, Chinese characters are no longer ideographs capable of representing ideas directly; their comprehension relies on the reader's knowledge of the particular language being written.
The areas where Chinese characters were historically used—sometimes collectively termed the Sinosphere—have a long tradition of lexicography attempting to explain and refine their use; for most of history, analysis revolved around a model first popularized in the 2nd-century Shuowen Jiezi dictionary. More recent models have analysed the methods used to create characters, how characters are structured, and how they function in a given writing system.

Structural analysis

Most characters can be analysed structurally as compounds made of smaller components, which are often independent characters in their own right, adjusted to occupy a given position in the compound. Components within a character may serve a specific function: phonetic components provide a hint for the character's pronunciation, and semantic components indicate some element of the character's meaning. Components that serve neither function may be classified as pure signs with no particular meaning, other than their presence distinguishing one character from another.
A straightforward structural classification scheme may consist of three pure classes of semantographs, phonographs and signs—having only semantic, phonetic, and form components respectively, as well as classes corresponding to each combination of component types. Of the characters that are frequently used in Standard Chinese, pure semantographs are estimated to be the rarest, accounting for about 5% of the lexicon, followed by pure signs with 18%, and semantic–form and phonetic–form compounds together accounting for 19%. The remaining 58% are phono-semantic compounds.
The Chinese palaeographer Qiu Xigui presents three principles of character function adapted from earlier proposals by and Chen Mengjia, with semantographs describing all characters whose forms are wholly related to their meaning, regardless of the method by which the meaning was originally depicted, phonographs that include a phonetic component, and loangraphs encompassing existing characters that have been borrowed to write other words. Qiu also acknowledges the existence of character classes that fall outside of these principles, such as pure signs.

Semantographs

Pictographs

Most of the oldest characters are pictographs, representational pictures of physical objects. Examples include , , and . Over time, the forms of pictographs have been simplified in order to make them easier to write. As a result, it is often no longer evident what thing was originally being depicted by a pictograph; without knowing the context of its origin in picture-writing, it may be interpreted instead as a pure sign. However, if its use in compounds still reflects a pictograph's original meaning, as with in, it can still be analysed as a semantic component.

Indicatives

Indicatives depict an abstract idea with an iconic form, including iconic modification of pictographs. In the examples below, the numerals representing small numbers are represented a corresponding number of strokes, directions are represented by a graphical indication above or below a line. Parts of a tree are communicated by indicating the corresponding part of the pictogram meaning 'tree'.

Character
Pinyin
Gloss	'one'	'two'	'three'	'up'	'below'	'root'	'apex'

Compound ideographs

Compound ideographs, also called associative compounds, logical aggregates, or syssemantographs, are compounds of two or more pictographic or ideographic characters to suggest the meaning of the word to be represented.
Xu Shen gave two examples:

, formed from and
, formed from and

Other characters commonly explained as compound ideographs include:

, composed of two trees
, composed of three trees
, depicting a man by a tree
, depicting a hand on a bush
, depicting a hand above an eye
, depicting the sun disappearing into the grass, originally written as enclosing 日—later written 暮.

Many characters formerly classed as compound ideographs are now believed to have been misidentified. For example, Xu's example 信 representing the word ← 'truthful', is usually considered a phono-semantic compound, with ← as phonetic and as a signific. In many cases, reduction of a character has obscured its original phono-semantic nature. For example, the character is often presented as a compound of and. However this form is probably a simplification of an attested alternative form 朙, which can be viewed as a phono-semantic compound.
Peter A. Boodberg and William G. Boltz have argued that no ancient characters were compound ideographs. Boltz accounts for the remaining cases by suggesting that some characters could represent multiple unrelated words with different pronunciations, as in Sumerian cuneiform and Egyptian hieroglyphs, and the compound characters are actually phono-semantic compounds based on an alternative reading that has since been lost. For example, the character ← 'peace' is often cited as a compound of with. Boltz speculates that the character 女 could represent both the word ← 'woman' and the word ← 'settled', and that the signific was later added to disambiguate the latter usage. In support of this second reading, he points to other characters with the same 女 component that had similar pronunciations in Old Chinese: ← 'tranquil', ← 'to quarrel' and ← 'licentious'. Other scholars reject these arguments for alternative readings and consider other explanations of the data more likely, for example viewing 妟 as a reduced form of 晏, which can be analysed as a phono-semantic compound with 安 as phonetic. They consider the characters 奻 and 姦 to be implausible phonetic compounds, both because the proposed phonetic and semantic elements are identical and because the widely differing initial consonants and would not normally be accepted in a phonetic compound. Notably, Christopher Button has shown how more sophisticated palaeographical and phonological analyses can account for the examples of Boodberg and Boltz without relying on polyphony.
While compound ideographs are a limited source of Chinese characters, they form many created in Japan to represent native words. Examples include:

'to work', formed from 人 'person' and 動 'move'
'mountain pass', formed from 山 'mountain', 上 'up' and 下 'down'

As Japanese creations, such characters had no Chinese or Sino-Japanese readings, but a few have been assigned invented Sino-Japanese readings. For example, the common character 働 has been given the reading, taken from, and even borrowed into modern written Chinese with the reading.

Loangraphs

The phenomenon of existing characters being adapted to write other words with similar pronunciations was necessary in the initial development of Chinese writing, and has continued throughout its history. Some loangraphs are introduced to represent words previously lacking another written form—this is often the case with abstract grammatical particles such as and. For example, the character was originally a pictograph of a wheat plant, with the meaning 'wheat'. As this was pronounced similar to the Old Chinese word 'to come', 來 was loaned to write this verb. Eventually, 'to come' became established as the default reading, and a new character was devised for 'wheat'. When a character is used as a rebus this way, it is called a, translatable as 'phonetic loan character' or 'rebus character'.
The process of characters being borrowed as loangraphs should not be conflated with the distinct process of semantic extension, where a word acquires additional senses, which often remain written with the same character. As both processes often result in a single character form being used to write several distinct meanings, loangraphs are often misidentified as being the result of semantic extension, and vice versa.
As with Egyptian hieroglyphs and cuneiform, early Chinese characters were used as rebuses to express abstract meanings that were not easily depicted. Thus, many characters represented more than one word. In some cases the extended use would take over completely, and a new character would be created for the original meaning, usually by modifying the original character with a determinative. For instance, originally meant 'right hand', but was borrowed to write the abstract adverb. Modern usage is exclusively the latter sense, while, which adds the radical, represents the sense meaning 'right'. This process of graphical disambiguation is a common source of phono-semantic compound characters.
Loangraphs are also used to write words borrowed from other languages, such as the various Buddhist terminology introduced to China in antiquity, as well as contemporary non-Chinese words and names. For example, each character in the name is often used as a loangraph for its respective syllable. However, the barrier between a character's pronunciation and meaning is never total: when transcribing into Chinese, loangraphs are often chosen deliberately as to create certain connotations. This is regularly done with corporate brand names: for example, Coca-Cola's Chinese name is.

Character	Rebus	Original	New character
四	'four'	'nostrils'	泗
枼	'flat', 'thin'	'leaf'	葉
北	'north'	'back '	背
要	'to want'	'waist'	腰
少	'few'	'sand'	沙 and 砂
永	'forever'	'swim'	泳

While the word jiajie has been used since the Han dynasty, the related term tongjia is first attested during the Ming dynasty. The two terms are commonly used as synonyms, but there is a distinction between jiajiezi being a phonetic loan character for a word that did not originally have a character, such as using for , and being an interchangeable character used for an existing homophonous character, such as using for.
According to Bernhard Karlgren, "One of the most dangerous stumbling-blocks in the interpretation of pre-Han texts is the frequent occurrence of loan characters."