Romanization
In linguistics, romanization or romanisation is the conversion of text from a different writing system to the Roman script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.
Methods
There are many consistent or standardized romanization systems. They can be classified by their characteristics. A particular system's characteristics may make it better-suited for various, sometimes contradictory applications, including document retrieval, linguistic analysis, easy readability, faithful representation of pronunciation.- Source, or donor language – A system may be tailored to romanize text from a particular language, or a series of languages, or for any language in a particular writing system. A language-specific system typically preserves language features like pronunciation, while the general one may be better for cataloguing international texts.
- Target, or receiver language – Most systems are intended for an audience that speaks or reads a particular language.
- Simplicity – Since the basic Latin alphabet has a smaller number of letters than many other writing systems, digraphs, diacritics, or special characters must be used to represent them all in Latin script. This affects the ease of creation, digital storage and transmission, reproduction, and reading of the romanized text.
- Reversibility – Whether or not the original can be restored from the converted text. Some reversible systems allow for an irreversible simplified version.
Transliteration
Transcription
Phonemic
Most romanizations are intended to enable the casual reader who is unfamiliar with the original script to pronounce the source language reasonably accurately. Such romanizations follow the principle of phonemic transcription and attempt to render the significant sounds of the original as faithfully as possible in the target language. The popular Hepburn Romanization of Japanese is an example of a transcriptive romanization designed for English speakers.Phonetic
A phonetic conversion goes one step further and attempts to depict all phones in the source language, sacrificing legibility if necessary by using characters or conventions not found in the target script. In practice such a representation almost never tries to represent every possible allophone—especially those that occur naturally due to coarticulation effects—and instead limits itself to the most significant allophonic distinctions. The International Phonetic Alphabet is the most common system of phonetic transcription.Compromise
For most language pairs, building a usable romanization involves a trade-off between the two extremes. Pure transcriptions are generally not possible, as the source language usually contains sounds and distinctions not found in the target language, but which must be shown for the romanized form to be comprehensible. Furthermore, due to diachronic and synchronic variance no written language represents any spoken language with perfect accuracy and the vocal interpretation of a script may vary by a great degree among languages. In modern times the chain of transcription is usually spoken foreign language, written foreign language, written native language, spoken native language. Reducing the number of those processes, i.e. removing one or both steps of writing, usually leads to more accurate oral articulations. In general, outside a limited audience of scholars, romanizations tend to lean more towards transcription. As an example, consider the Japanese martial art 柔術: the Nihon-shiki romanization zyûzyutu may allow someone who knows Japanese to reconstruct the kana syllables, but most native English speakers, or rather readers, would find it easier to guess the pronunciation from the Hepburn version, jūjutsu.Romanization of specific writing systems
Arabic
The Arabic script is used to write Arabic, Persian, Urdu, Pashto and Sindhi as well as numerous other languages in the Muslim world, particularly African and Asian languages without alphabets of their own. Romanization standards include the following:Arabic
- Deutsche Morgenländische Gesellschaft : Adopted by the International Convention of Orientalist Scholars in Rome. It is the basis for the very influential Hans Wehr dictionary.
- BS 4280 : Developed by the British Standards Institution
- SATTS : A one-for-one substitution system, a legacy from the Morse code era
- UNGEGN
- DIN 31635 : Developed by the Deutsches Institut für Normung
- ISO 233. Transliteration.
- Qalam : A system that focuses upon preserving the spelling, rather than the pronunciation, and uses mixed case
- ISO 233-2 : Simplified transliteration.
- Buckwalter transliteration : Developed at Xerox by Tim Buckwalter; does not require unusual diacritics
- ALA-LC
- Arabic chat alphabet
Persian
| Unicode | Final | Medial | Initial | Isolated | IPA | DMG | ALA-LC | BGN/PCGN | EI | UN | UN | Pronunciation |
| U+064E | ـَ | ـَ | اَ | اَ | a | a | a | a | a | a | A as in cat | |
| U+064F | ـُ | ـُ | اُ | اُ | o | o | o | u | o | o | O as in go | |
| U+0648 U+064F | ـو | ـو | — | — | o | o | o | u | o | o | O as in go | |
| U+0650 | ـِ | ـِ | اِ | اِ | e | i | e | e | e | e | E as in ten | |
| U+064E U+0627 | ـَا | ـَا | آ | آ | ā | ā | ā | ā | ā | ā | O as in hot | |
| U+0622 | ـآ | ـآ | آ | آ | ā, ʾā | ā, ʼā | ā | ā | ā | ā | O as in hot | |
| U+064E U+06CC | ـَی | — | — | — | ā | á | á | ā | á | ā | O as in hot | |
| U+06CC U+0670 | ـیٰ | — | — | — | ā | á | á | ā | ā | ā | O as in hot | |
| U+064F U+0648 | ـُو | ـُو | اُو | اُو | ū | ū | ū | u, ō | ū | u | U as in actual | |
| U+0650 U+06CC | ـی | ـیـ | ایـ | ای | ī | ī | ī | i, ē | ī | i | Y as in happy | |
| U+064E U+0648 | ـَو | ـَو | اَو | اَو | au | aw | ow | ow, aw | ow | ow | O as in go | |
| U+064E U+06CC | ـَی | ـَیـ | اَیـ | اَی | ai | ay | ey | ey, ay | ey | ey | Ay as in play | |
| U+064E U+06CC | ـیِ | — | — | — | –e, –ye | –i, –yi | –e, –ye | –e, –ye | –e, –ye | –e, –ye | Ye as in yes | |
| U+06C0 | ـهٔ | — | — | — | –ye | –ʼi | –ye | –ye | –ye | –ye | Ye as in yes |
Notes:
Armenian
Georgian
| Georgian letter | IPA | National system | BGN/PCGN | ISO 9984 | ALA-LC | Unofficial system | Kartvelo translit | NGR2 |
| ა | a | a | a | a | a | a | a | |
| ბ | b | b | b | b | b | b | b | |
| გ | g | g | g | g | g | g | g | |
| დ | d | d | d | d | d | d | d | |
| ე | e | e | e | e | e | e | e | |
| ვ | v | v | v | v | v | v | v | |
| ზ | z | z | z | z | z | z | z | |
| ჱ | ey | ē | ē | é | ej | ẽ | ||
| თ | t | tʼ | t̕ | tʻ | T or t | t | t / t̊ | |
| ი | i | i | i | i | i | i | i | |
| კ | kʼ | k | k | k | k | ǩ | k̉ | |
| ლ | l | l | l | l | l | l | l | |
| მ | m | m | m | m | m | m | m | |
| ნ | n | n | n | n | n | n | n | |
| ჲ | j | y | y | j | ĩ | |||
| ო | o | o | o | o | o | o | o | |
| პ | pʼ | p | p | p | p | p̌ | p̉ | |
| ჟ | zh | zh | ž | ž | J, zh or j | ž | g̃ | |
| რ | r | r | r | r | r | r | r | |
| ს | s | s | s | s | s | s | s | |
| ტ | tʼ | t | t | t | t | t̆ | t̉ | |
| ჳ | w | w | ŭ | f̃ | ||||
| უ | u | u | u | u | u | u | u | |
| ფ | p | pʼ | p̕ | pʻ | p or f | p | p / p̊ | |
| ქ | k | kʼ | k̕ | kʻ | q or k | q or k | k / k̊ | |
| ღ | gh | gh | ḡ | ġ | g, gh or R | g, gh or R | q̃ | |
| ყ | qʼ | q | q | q | y | q | q | |
| შ | sh | sh | š | š | sh or S | š | x | |
| ჩ | ch | chʼ | č̕ | čʻ | ch or C | č | c̃ | |
| ც | ts | tsʼ | c̕ | cʻ | c or ts | c | c | |
| ძ | dz | dz | j | ż | dz or Z | ʒ | d̃ | |
| წ | tsʼ | ts | c | c | w, c or ts | ʃ | c̉ | |
| ჭ | chʼ | ch | č | č | W, ch or tch | ʃ̌ | j̉ | |
| ხ | kh | kh | x | x | x or kh | x | k̃ | |
| ჴ | qʼ | ẖ | x̣ | q̌ | q̊ | |||
| ჯ | j | j | ǰ | j | j | - | j | |
| ჰ | h | h | h | h | h | h | h | |
| ჵ | ō | ō | ȯ | h̃ |
Notes: