Romanization


In linguistics, romanization or romanisation is the conversion of text from a different writing system to the Roman script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

Methods

There are many consistent or standardized romanization systems. They can be classified by their characteristics. A particular system's characteristics may make it better-suited for various, sometimes contradictory applications, including document retrieval, linguistic analysis, easy readability, faithful representation of pronunciation.
  • Source, or donor language – A system may be tailored to romanize text from a particular language, or a series of languages, or for any language in a particular writing system. A language-specific system typically preserves language features like pronunciation, while the general one may be better for cataloguing international texts.
  • Target, or receiver language – Most systems are intended for an audience that speaks or reads a particular language.
  • Simplicity – Since the basic Latin alphabet has a smaller number of letters than many other writing systems, digraphs, diacritics, or special characters must be used to represent them all in Latin script. This affects the ease of creation, digital storage and transmission, reproduction, and reading of the romanized text.
  • Reversibility – Whether or not the original can be restored from the converted text. Some reversible systems allow for an irreversible simplified version.

    Transliteration

If the romanization attempts to transliterate the original script, the guiding principle is a one-to-one mapping of characters in the source language into the target script, with less emphasis on how the result sounds when pronounced according to the reader's language. For example, the Nihon-shiki romanization of Japanese allows the informed reader to reconstruct the original Japanese kana syllables with 100% accuracy, but requires additional knowledge for correct pronunciation.

Transcription

Phonemic

Most romanizations are intended to enable the casual reader who is unfamiliar with the original script to pronounce the source language reasonably accurately. Such romanizations follow the principle of phonemic transcription and attempt to render the significant sounds of the original as faithfully as possible in the target language. The popular Hepburn Romanization of Japanese is an example of a transcriptive romanization designed for English speakers.

Phonetic

A phonetic conversion goes one step further and attempts to depict all phones in the source language, sacrificing legibility if necessary by using characters or conventions not found in the target script. In practice such a representation almost never tries to represent every possible allophone—especially those that occur naturally due to coarticulation effects—and instead limits itself to the most significant allophonic distinctions. The International Phonetic Alphabet is the most common system of phonetic transcription.

Compromise

For most language pairs, building a usable romanization involves a trade-off between the two extremes. Pure transcriptions are generally not possible, as the source language usually contains sounds and distinctions not found in the target language, but which must be shown for the romanized form to be comprehensible. Furthermore, due to diachronic and synchronic variance no written language represents any spoken language with perfect accuracy and the vocal interpretation of a script may vary by a great degree among languages. In modern times the chain of transcription is usually spoken foreign language, written foreign language, written native language, spoken native language. Reducing the number of those processes, i.e. removing one or both steps of writing, usually leads to more accurate oral articulations. In general, outside a limited audience of scholars, romanizations tend to lean more towards transcription. As an example, consider the Japanese martial art 柔術: the Nihon-shiki romanization zyûzyutu may allow someone who knows Japanese to reconstruct the kana syllables, but most native English speakers, or rather readers, would find it easier to guess the pronunciation from the Hepburn version, jūjutsu.

Romanization of specific writing systems

Arabic

The Arabic script is used to write Arabic, Persian, Urdu, Pashto and Sindhi as well as numerous other languages in the Muslim world, particularly African and Asian languages without alphabets of their own. Romanization standards include the following:

Arabic

UnicodeFinalMedialInitialIsolatedIPADMG ALA-LC BGN/PCGN EI UN UN Pronunciation
U+064EـَـَاَاَaaaaaaA as in cat
U+064FـُـُاُاُooouooO as in go
U+0648 U+064FـوـوooouooO as in go
U+0650ـِـِاِاِeieeeeE as in ten
U+064E U+0627ـَاـَاآآāāāāāāO as in hot
U+0622ـآـآآآā, ʾāā, ʼāāāāāO as in hot
U+064E U+06CCـَیāááāáāO as in hot
U+06CC U+0670ـیٰāááāāāO as in hot
U+064F U+0648ـُوـُواُواُوūūūu, ōūuU as in actual
U+0650 U+06CCـیـیـایـایīīīi, ēīiY as in happy
U+064E U+0648ـَوـَواَواَوauawowow, awowowO as in go
U+064E U+06CCـَیـَیـاَیـاَیaiayeyey, ayeyeyAy as in play
U+064E U+06CCـیِ–e, –ye–i, –yi–e, –ye–e, –ye–e, –ye–e, –yeYe as in yes
U+06C0ـهٔ–ye–ʼi–ye–ye–ye–yeYe as in yes

Notes:

Armenian

Georgian

Georgian letterIPANational system
BGN/PCGN
ISO 9984
ALA-LC
Unofficial systemKartvelo translitNGR2
aaaaaaa
bbbbbbb
ggggggg
ddddddd
eeeeeee
vvvvvvv
zzzzzzz
eyēēéej
tT or ttt / t̊
iiiiiii
kkkkǩ
lllllll
mmmmmmm
nnnnnnn
jyyjĩ
ooooooo
pppp
zhzhžžJ, zh or jž
rrrrrrr
sssssss
tttt
wwŭ
uuuuuuu
pp or fpp / p̊
kq or kq or kk / k̊
ghghġg, gh or Rg, gh or R
qqqyqq
shshššsh or Sšx
chchʼč̕čʻch or Cč
tstsʼc or tscc
dzdzjżdz or Zʒ
tsʼtsccw, c or tsʃ
chʼchččW, ch or tchʃ̌
khkhxxx or kh x
jjǰjj-j
hhhhhhh
ōōȯ


Notes: