Hindi–Urdu transliteration


Hindi–Urdu is the lingua franca of modern-day Northern India and Pakistan. Modern Standard Hindi is officially registered in India as a standard written using the Devanagari script, and Standard Urdu is officially registered in Pakistan as a standard written using an extended Perso-Arabic script.
Hindi–Urdu transliteration is the process of converting text written in Devanagari script into Perso-Arabic script, or vice versa. It focuses on representing the shared phonemes between those writing systems or using other writing systems, primarily Latin alphabet, in their stead. Transliteration is theoretically possible because of the common Hindustani phonology underlying Hindi-Urdu. In the present day, the Hindustani language is seen as a unifying language, as initially proposed by Mahatma Gandhi to resolve the Hindi–Urdu controversy.
Technically, a direct one-to-one script mapping or rule-based lossless transliteration of Hindi-Urdu is not possible, primarily because Hindi is written in an abugida script and Urdu is written in an abjad script, and also because of other constraints like multiple similar characters from Perso-Arabic mapping onto a single character in Devanagari. However, there have been dictionary-based mapping attempts which have yielded very high accuracy, providing near-to-perfect transliterations. For literary domains, a mere transliteration between Hindi-Urdu will not suffice as formal Hindi is more inclined towards Sanskrit vocabulary whereas formal Urdu is more inclined towards Persian and Arabic vocabulary; hence a system combining transliteration and translation would be necessary for such cases.
In addition to Hindi-Urdu, there have been attempts to design Indo-Pakistani transliteration systems for digraphic languages like Sindhi, Punjabi, Saraiki and Kashmiri.

Consonants

Hindustani has a rich set of consonants in its full-alphabet, since it has a mixed-vocabulary derived from Old Hindi, with loanwords from Parsi and Arabic languages, all of which itself are from 3 different language-families respectively: Indo-Aryan, Iranian and Semitic.
The following table provides an approximate one-to-one mapping for Hindi-Urdu consonants, especially for computational purposes. Note that this direct script conversion will not yield correct spellings, but rather a readable text for both the readers. Note that Hindi–Urdu transliteration schemes can be used for Punjabi as well, for Gurmukhi to Shahmukhi conversion, since Shahmukhi is a superset of the Urdu alphabet and the Gurmukhi script can be easily converted to the Devanagari script. Moreover a Fort William College document has shown the equivalent of the 'ع' sound of Hindustani.
PersoArabicLatinDevanagariComments
کk
کھkh
قqक़The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as क
خk͟hख़The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ख
گg
غġग़The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ग
گھgh
چc
چھch
جj
جھjh
زzज़The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as ज
ذज़
ضżज़
ظज़
ژžझ़Used in direct Farsi loan-words
ٹ
ٹھṭh
ڈ
ڈھḍh
ڑड़Colloquially, ṛ is often confused with ḍ and vice versa
ڑھṛhढ़Colloquially, ṛh is often confused with ḍh and vice versa
تt
تھth
ط
دd
دھdh
نn
پp
پھph
فfफ़The nuqta, in colloquial settings, is sometimes ignored in Hindi and written as फ
بb
بھbh
مm
یy
رr
لl
وvو is transcribed as /w/ for Arabic words and /v/ for Indo-Iranian words
وwव़و is transcribed as /w/ for Arabic words and /v/ for Indo-Iranian words
شś-
سs
ص
ث
ہh
ح
ۃ
ھh્हھ is generally only used for aspirated consonants. Any individual usage is generally considered an error and to be taken as ہ
عa'अ़Sometimes glottal stop, sometimes silent.

Sanskrit consonants

The following consonants are mostly used in words that are directly borrowed or adapted from Sanskrit.
Perso-ArabicLatinDevanagariRemarks
ن٘
ݩñݩ was introduced to write Gojri
ݨݨ was introduced to write Shahmukhi
لؕRarely used in Shahmukhi
ݜݜ was introduced to write Shina
ڔّ

Implosive consonants

These consonants are mostly found only in languages like Sindhi and Saraiki.
Perso-ArabicLatinDevanagari
ڳ
ڄ
ݙ/ڏ
ٻॿ

Punctuations & Symbols

ScriptPeriodQuestion MarkCommaSemi-colonSlashPercentEnd of verse
Perso-Arabic۔؟،؛؍٪۝
Modern Devanagari?,;/%

Sample text

The following is an excerpt from the Hindustani poem Tarānah-e-Hindi written by Muhammad Iqbal.
Perso-ArabicDevanagariLatinEnglish translation
सारे जहाँ से अच्छा,
हिन्दुसिताँ हमारा।
हम बुलबुलें हैं इसकी,
यह गुलसिताँ हमारा॥
sāre jahā̃ se acchā,
hindusitā̃ hamārā.
ham bulbulẽ ha͠i iskī,
yah gulsitā̃ hamārā..
Better than the entire world,
is our India.
We are its nightingales,
and it our garden abode.