Japanese phonology


Japanese phonology is the system of sounds used in the pronunciation of the Japanese language. Unless otherwise noted, this article describes the standard variety of Japanese based on the Tokyo dialect.
There is no overall consensus on the number of contrastive individual sounds. Common approaches recognize at least 12 distinct consonants and 5 distinct vowels,. Phonetic length is contrastive for both vowels and consonants, and the total length of Japanese words can be measured in a unit of timing called the mora. Only limited types of consonant clusters are permitted. There is a pitch accent system where the position or absence of a pitch drop may determine the meaning of a word: , , .
Japanese phonology has been affected by the presence of several layers of vocabulary in the language. In addition to native Japanese vocabulary, Japanese has a large amount of Chinese-based vocabulary and loanwords from other languages. Different layers of vocabulary allow different possible sound sequences.

Lexical strata

Many generalizations about the sound system of Japanese have exceptions when recent loanwords are taken into account. For example, the consonant generally does not occur at the start of native or Chinese-derived words, but it occurs freely in this position in mimetic and foreign words. Because of exceptions like this, discussions of Japanese phonology often refer to layers, or "strata," of vocabulary. The following four strata may be distinguished:

Yamato

Called or in Japanese, this category consists of inherited native vocabulary. Morphemes in this category show a number of restrictions on structure that may be violated by vocabulary in other layers.

Mimetic

Japanese possesses a variety of mimetic words that make use of sound symbolism to serve an expressive function. Like Yamato vocabulary, these words are also of native origin, and can be considered to belong to the same overarching group. However, words of this type show some phonological peculiarities that cause some theorists to regard them as a separate layer of Japanese vocabulary.

Sino-Japanese

Called in Japanese, words in this stratum originate from several waves of large-scale borrowing from Chinese that occurred from the 6th-14th centuries AD. They comprise 60% of dictionary entries and 20% of ordinary spoken Japanese, ranging from formal vocabulary to everyday words. Most Sino-Japanese words are composed of more than one Sino-Japanese morpheme. Sino-Japanese morphemes have a limited phonological shape: each has a length of at most two [|moras], which argue reflects a restriction in size to a single [|prosodic foot]. These morphemes represent the Japanese phonetic adaptation of Middle Chinese monosyllabic morphemes, each generally represented in writing by a single Chinese character, taken into Japanese as kanji. Japanese writers also repurposed kanji to represent native vocabulary; as a result, there is a distinction between Sino-Japanese readings of kanji, called On'yomi, and native readings, called Kun'yomi.
The moraic nasal is relatively common in Sino-Japanese, and contact with Middle Chinese is often described as being responsible for the presence of in Japanese, although also came to exist in native Japanese words as a result of sound changes.

Foreign

Called in Japanese, this layer of vocabulary consists of non-Sino-Japanese words of foreign origin, mostly borrowed from Western languages after the 16th century; many of them entered the language in the 20th century. In words of this stratum, a number of consonant-vowel sequences that did not previously exist in Japanese are tolerated, which has led to the introduction of new spelling conventions and complicates the phonemic analysis of these consonant sounds in Japanese.

Consonants

Different linguists analyze the Japanese inventory of consonant phonemes in significantly different ways. recognizes only 12 underlying consonants, whereas recognizes 16, equivalent to Smith's 12 plus the following 4, and recognizes 21, equivalent to Smith's 12 plus the following 9. Consonants inside parentheses in the table can be analyzed as allophones of other phonemes, at least in native words. In loanwords, sometimes occur phonemically.
In some analyses, the glides/semivowels are not interpreted as consonant phonemes. In non-loanword vocabulary, they generally occur only in the sequences and, which are sometimes analyzed as rising diphthongs rather than as consonant-vowel sequences. analyzes the glides as non-syllabic variants of the high vowel phonemes, arguing that the use of vs. may be predictable if both phonological and morphological context is taken into account.

Phonetic notes

Details of articulation

  • are variously described as lamino-alveolar, apico-alveolar or apico-dental, or simply dental or denti-alveolar.
  • are lamino-alveolar.
  • are lamino-alveolopalatal. The affricates are sometimes transcribed broadly as . The palatalized allophone of before or is also lamino-alveolopalatal or prepalatal, and so can be transcribed as, or more broadly as. reports its place of articulation as dentoalveolar or alveolar.
  • is traditionally described as a velar approximant or labialized velar approximant or something between the two, or as the semivocalic equivalent of with little to no rounding, while a 2020 real-time MRI study found it is better described as a bilabial approximant.
  • is before and , and before , coarticulated with the labial compression of that vowel. When not preceded by a pause, it often may be breathy-voiced rather than voiceless.
  • Realization of the liquid phoneme varies greatly depending on environment and dialect. The prototypical and most common pronunciation is an apical tap, either alveolar or postalveolar. Utterance-initially and after, the tap is typically articulated in such a way that the tip of the tongue is at first momentarily in light contact with the alveolar ridge before being released rapidly by airflow. This sound is described variably as a tap, a "variant of ", "a kind of weak plosive", and "an affricate with short friction, ". The apical alveolar or postalveolar lateral approximant is a common variant in all conditions, particularly utterance-initially and before. According to, utterance-initially and intervocalically, the lateral variant is better described as a tap rather than an approximant. The retroflex lateral approximant is also found before. In Tokyo's Shitamachi dialect, the alveolar trill is a variant marked with vulgarity. Other reported variants include the alveolar approximant, the alveolar stop, the retroflex flap, the lateral fricative, and the retroflex stop.

    Voice onset time

At the start of a word, the voiceless stops are slightly aspirated—less so than English stops, but more than those in Spanish. Word-medial seem to be unaspirated on average. Phonetic studies in the 1980s observed an effect of accent as well as word position, with longer voice onset time in accented syllables than in unaccented syllables.
A 2019 study of young adult speakers found that after a pause, word-initial may be pronounced as plosives with zero or low positive voice onset time ; while significantly less aspirated on average than word-initial, some overlap in voice onset time was observed. A secondary cue to the distinction between and in word-initial position is a pitch offset on the following vowel: vowels after word-initial start out with a higher pitch compared to vowels after, even when the latter are phonetically devoiced. Word-medial are normally fully voiced, but may become non-plosives through lenition.

Lenition

The phonemes have weakened non-plosive pronunciations that can be broadly transcribed as voiced fricatives, although they may be realized instead as voiced approximants. There is no context where the non-plosive pronunciations are consistently used, but they occur most often between vowels:
These weakened pronunciations can occur after a vowel in the middle of a word, or when a word starting with follows a vowel-final word with no intervening pause. found that, as with the pronunciation of as vs., the use of plosive vs. non-plosive realizations of is closely correlated with the time available to a speaker to articulate the consonant, which is affected by speech rate as well as the identity of the preceding sound. All three show a high rate of plosive pronunciations after or after a pause; after, plosive pronunciations occur at high rates for and, but less frequently for, probably because word-medial after is often pronounced instead as a [|velar nasal] . Across contexts, generally has a higher rate of plosive realizations than and.

Moraic consonants

Certain consonant sounds are called "moraic" because they count for a mora, a unit of timing or prosodic length. The phonemic analysis of moraic consonants is disputed. One approach, particularly popular among Japanese scholars, analyzes moraic consonants as the phonetic realization of special "mora phonemes" : a mora nasal, called the hatsuon, and a mora obstruent consonant, called the sokuon. The pronunciation of these sounds varies depending on context: because of this, they may be analyzed as "placeless" phonemes with no phonologically specified place of articulation. A competing approach rejects the transcriptions and and the identification of moraic consonants as their own phonemes, treating them instead as the syllable-final realizations of other consonant phonemes.

Moraic nasal

The moraic nasal or mora nasal can be interpreted as a syllable-final nasal consonant. Aside from [|certain marginal exceptions], it is found only after a vowel, which is phonetically [|nasalized in this context]. It can be followed by a consonant, a vowel, or the end of a word:
Its pronunciation varies depending on the sound that follows it.
  • Before a plosive, affricate, nasal, or liquid, it is pronounced as a nasal consonant assimilated to the place of the following consonant:
  • Before a vowel, approximant, or voiceless fricative, it is a nasalized vowel or moraic semivowel that can be broadly transcribed as . This pronunciation may also occur before the voiced fricatives, although more often, they are pronounced as affricates when preceded by the moraic nasal.
At the end of an utterance, the moraic nasal is pronounced as a nasal segment with a variable place of articulation and variable degree of constriction. Its pronunciation in this position is traditionally described and transcribed as uvular, sometimes with the qualification that it is, or approaches, velar after front vowels. Some descriptions state that it may have incomplete occlusion and can potentially be realized as a nasalized vowel, as in intervocalic position. Instrumental studies in the 2010s showed that there is considerable variability in its pronunciation and that it often involves a lip closure or constriction. A study of real-time MRI data collected between 2017 and 2019 found that the pronunciation of the moraic nasal in utterance-final position most often involves vocal tract closure with a tongue position that can range from uvular to alveolar: it is assimilated to the position of the preceding vowel, but the range of overlap observed between similar vowel pairs suggests this assimilation is not a categorical allophonic rule, but a gradient phonetic process. 5% of the utterance-final samples of the moraic nasal were realized as nasalized vowels with no closure: in this case, appreciable tongue raising was observed only when the preceding vowel was.
There are a variety of competing phonemic analyses of the moraic nasal. It may be transcribed with the non-IPA symbol and analyzed as a "placeless" nasal. Some analysts do not categorize it as a phonological consonant. Alternatively, it may be analyzed as a uvular nasal, based on the traditional description of its pronunciation before a pause. It is sometimes analyzed as a syllable-final allophone of the coronal nasal consonant, but this requires treating syllable or mora boundaries as potentially distinctive, because there is a clear contrast in pronunciation between the moraic nasal and non-moraic before a vowel or before :
Alternatively, in an analysis that treats syllabification as distinctive, the moraic nasal can be interpreted as an archiphoneme, since there is no contrast in syllable-final position between and.
Thus, depending on the analysis, a word like, pronounced phonetically as, could be phonemically transcribed as,, or.