Romanization of Arabic


The romanization of Arabic is the systematic rendering of written and spoken Arabic in the Latin script. Romanized Arabic is used for various purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used instead of or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters, are used in academic settings for the benefit of non-speakers, contrasting with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.
Different systems and strategies have been developed to address the inherent problems of rendering various Arabic varieties in the Latin script. Examples of such problems are the symbols for Arabic phonemes that do not exist in English or other European languages; the means of representing the Arabic definite article, which is always spelled the same way in written Arabic but has numerous pronunciations in the spoken language depending on context; and the representation of short vowels.

Method

Romanization is often termed "transliteration", but this is not technically correct. Transliteration is the direct representation of foreign letters using Latin symbols, while most systems for romanizing Arabic are actually transcription systems, which represent the sound of the language, since short vowels and geminate consonants, for example, do not usually appear in Arabic writing. As an example, the following rendering “'” of is a transcription, indicating the pronunciation; an example of transliteration would be '.

Romanization standards and systems

Principal standards and systems are:

Early Romanization

Early Romanization of the Arabic language was standardized in the various bilingual Arabic-European dictionaries of the 17–19th centuries:
  • Pedro de Alcalá, Vocabulista, 1505. A Spanish-Arabic glossary using a systematic transcription.
  • Valentin Schindler, Lexicon Pentaglotton: Hebraicum, Chaldicum, Syriacum, Talmudico-Rabbinicum, et Arabicum, 1612. Arabic lemmas were printed in Hebrew characters.
  • Franciscus Raphelengius, , Leiden 1613. The first printed dictionary of the Arabic language in Arabic characters.
  • Jacobus Golius, Lexicon Arabico-Latinum, Leiden 1653. The dominant Arabic dictionary in Europe for almost two centuries.
  • Georg Freytag, Lexicon Arabico-Latinum, praesertim ex Djeuharii Firuzubadiique et aliorum libris confectum I–IV, Halle 1830–1837
  • Edward William Lane, Arabic–English Lexicon, 8 vols, London-Edinburgh 1863–1893. Highly influential, but incomplete

    Mixed digraphic and diacritical

  • BGN/PCGN romanization.
  • UNGEGN. United Nations Group of Experts on Geographical Names, or "Variant A of the Amended Beirut System". Adopted from BGN/PCGN.
  • * IGN System 1973 or "Variant B of the Amended Beirut System", that conforms to the French orthography and is preferred to the Variant A in French-speaking countries as in Maghreb and Lebanon.
  • * romanization is different from UNGEGN in two ways: ظ is d͟h instead of z̧; the cedilla is replaced by a sub-macron in all the characters with the cedilla.
  • ALA-LC, from the American Library Association and the Library of Congress. This romanization is close to the romanization of the Deutsche Morgenländische Gesellschaft and Hans Wehr, which is used internationally in scientific publications by Arabists.
  • * IJMES, used by International Journal of Middle East Studies, very similar to ALA-LC.
  • * EI, Encyclopaedia of Islam.

    Fully diacritical

  • DMG, adopted by the International Convention of Orientalist Scholars in Rome.
  • * DIN 31635, developed by the German Institute for Standardization.
  • * Hans Wehr transliteration, a modification to DIN 31635.
  • * EALL, Encyclopedia of Arabic Language and Linguistics.
  • * Spanish romanization, identical to DMG/DIN with the exception of three letters: ǧ > ŷ, ḫ > j, ġ > g.
  • ISO 233, letter-to-letter; vowels are transliterated only if they are shown with diacritics, otherwise they are omitted.
  • * ISO 233-2, simplified transliteration; vowels are always shown.
  • BS 4280, developed by the British Standards Institution.

    ASCII-based

  • ArabTeX has been modelled closely after the transliteration standards ISO/R 233 and DIN 31635.
  • Buckwalter Transliteration, developed at ALPNET by Tim Buckwalter; does not require diacritics.
  • Arabic chat alphabet: an ad hoc solution for conveniently entering Arabic using a Latin keyboard.

    Comparison table

  • Hans Wehr transliteration does not capitalize the first letter at the beginning of sentences nor in proper names.
  • The chat table is only a demonstration and is based on the spoken varieties which vary considerably from Literary Arabic on which the IPA table and the rest of the transliterations are based.
  • Review hamzah for its various forms.
  • Neither standard defines which code point to use for ' and Ayin|. Appropriate Unicode points would be modifier letter apostropheʼ〉 and modifier letter turned commaʻ〉 or modifier letter reversed commaʽ〉, all of which Unicode defines as letters. Often right and left single quotation marks⟩, ⟨⟩ are used instead, but Unicode defines those as punctuation marks, and they can cause compatibility issues. The glottal stop in these romanizations is not written word-initially.
  • In Encyclopaedia of Islam digraphs are underlined, that is t͟h, d͟j, k͟h, d͟h, s͟h, g͟h. On the contrary the sequences ـتـهـ, ـكـهـ, ـدهـ, ـسهـ may be romanized with middle dot as t·h, k·h, d·h, s·h respectively in BGN/PCGN, with the prime symbol tʹh, kʹh, dʹh, sʹh respectively in ALA-LC.
  • In the original German edition of his dictionary Wehr used ǧ, ḫ, ġ for j, ḵ, ḡ respectively. The variant presented in the table is from the English translation of the dictionary.
  • BGN/PCGN allows use of underdots instead of cedilla.
  • ' and ' are traditionally written in Northwestern Africa as ڢ and ڧـ ـڧـ ـٯ, respectively, while the latter's dot is only added initially or medially.
  • In Egypt, Sudan, and sometimes in other regions, the standard form for final-Yāʼ| is only ى in handwriting and print, for both final and final. ى for the latter pronunciation, is called ألف ليّنة ', 'flexible alif'.
  • The sun and moon letters and hamzat waṣl pronunciation rules apply, although it is acceptable to ignore them. The UN system and ALA-LC prefer lowercase a and hyphens: al-Baṣrah, ar-Riyāḍ; BGN/PCGN prefers uppercase A and no hyphens: Al Baṣrah, Ar Riyāḍ.
  • The EALL suggests ẓ "in proper names".
  • BGN/PCGN, UNGEGN, ALA-LC, and DIN 31635 use a normal for when romanizing Egyptian names or toponyms that are expectedly pronounced with.
  • BGN/PCGN, UNGEGN, ALA-LC, and DIN 31635 use the French-based for in Francophone Arabic speaking countries in names and toponyms.
  • Nunation is ignored in all romanizations in names and toponyms.

    Romanization issues

Any romanization system has to make a number of decisions which are dependent on its intended field of application.

Vowels

One basic problem is that written Arabic is normally unvocalized; i.e., many of the vowels are not written out, and must be supplied by a reader familiar with the language. Hence unvocalized Arabic writing does not give a reader unfamiliar with the language sufficient information for accurate pronunciation. As a result, a pure transliteration, e.g., rendering قطر as ', is meaningless to an untrained reader. For this reason, transcriptions are generally used that add vowels, e.g. '. However, unvocalized systems match exactly to written Arabic, unlike vocalized systems such as Arabic chat, which some claim detracts from one's ability to spell.

Transliteration vs. transcription

Most uses of romanization call for transcription rather than transliteration: Instead of transliterating each written letter, they try to reproduce the sound of the words according to the orthography rules of the target language: Qaṭar. This applies equally to scientific and popular applications. A pure transliteration would need to omit vowels, making the result difficult to interpret except for a subset of trained readers fluent in Arabic. Even if vowels are added, a transliteration system would still need to distinguish between multiple ways of spelling the same sound in the Arabic script, e.g. ' 2=ا vs. ' ى for the sound ', and the six different ways of writing the glottal stop. This sort of detail is needlessly confusing, except in a very few situations.
Most issues related to the romanization of Arabic are about transliterating vs. transcribing; others, about what should be romanized:
  • Some transliterations ignore assimilation of the definite article al- before the "sun letters", and may be easily misread by non-Arabic speakers. For instance, "the light" النور an-nūr would be more literally transliterated along the lines of alnūr. In the transcription an-nūr, a hyphen is added and the unpronounced removed for the convenience of the uninformed non-Arabic speaker, who would otherwise pronounce an, perhaps not understanding that in nūr is geminated. Alternatively, if the shaddah is not transliterated, a strictly literal transliteration would be alnūr, which presents similar problems for the uninformed non-Arabic speaker.
  • A transliteration should render the "closed tāʼ" faithfully. Many transcriptions render the sound as a or ah and t when it denotes.
  • *ISO 233 has a unique symbol, ẗ.
  • "Restricted alif" should be transliterated with an acute accent, á, differentiating it from regular alif ا, but it is transcribed in many schemes like alif, ā, because it stands for.
  • Nunation: what is true elsewhere is also true for nunation: transliteration renders what is seen, transcription what is heard, when in the Arabic script, it is written with diacritics, not by letters, or omitted.
A transcription may reflect the language as spoken, typically rendering names, for example, by the people of Baghdad, or the official standard as spoken by a preacher in the Mosque or a TV newsreader. A transcription is free to add phonological or morphological information. Transcriptions will also vary depending on the writing conventions of the target language; compare English Omar Khayyam with German Omar Chajjam, both for عمر خيام, .
A transliteration is ideally fully reversible: a machine should be able to transliterate it back into Arabic. A transliteration can be considered as flawed for any one of the following reasons:
  • A "loose" transliteration is ambiguous, rendering several Arabic phonemes with an identical transliteration, or such that digraphs for a single phoneme may be confused with two adjacent consonants—but this problem is resolved in the ALA-LC romanization system, where the prime symbol ʹ is used to separate two consonants when they do not form a digraph; for example: أَكْرَمَتْها ', in which the t and h are two distinct consonantal sounds, or where the middle dot is used in the same way in the BGN/PCGN romanization.
  • Symbols representing phonemes may be considered too similar ;
  • ASCII transliterations using capital letters to disambiguate phonemes are easy to type, but may be considered unaesthetic.
A fully accurate transcription may not be necessary for native Arabic speakers, as they would be able to pronounce names and sentences correctly anyway, but it can be very useful for those not fully familiar with spoken Arabic and who are familiar with the Roman alphabet. An accurate transliteration serves as a valuable stepping stone for learning, pronouncing correctly, and distinguishing phonemes. It is a useful tool for anyone who is familiar with the sounds of Arabic but not fully conversant in the language.
One criticism is that a fully accurate system would require special learning that most do not have to actually pronounce names correctly, and that with a lack of a universal romanization system they will not be pronounced correctly by non-native speakers anyway. The precision will be lost if special characters are not replicated and if a reader is not familiar with Arabic pronunciation.