Arabic script in Unicode
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand developed from a ligature in which the handwritten Latin letters e and t were combined.
As of Unicode, the Arabic script is contained in the following blocks:
- Arabic
- Arabic Supplement
- Arabic Extended-B
- Arabic Extended-A
- Arabic Presentation Forms-A
- Arabic Presentation Forms-B
- Rumi Numeral Symbols
- Arabic Extended-C
- Indic Siyaq Numbers
- Ottoman Siyaq Numbers
- Arabic [Mathematical Alphabetic Symbols]
The Arabic Supplement range encodes letter variants mostly used for writing African languages.
The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages.
The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages.
The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms.
The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.
The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions.
The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the Mughal Empire by the 17th century through the middle of the 20th century.
The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as Siyakat numbers, used for accounting in Ottoman Turkish documents.
Contextual forms
Below is a demonstration for the basic alphabet used in Modern Standard Arabic illustrating how Arabic letters are expected to appear in different contexts.Codepoints listed as contextual forms should "should not be used in general interchange". Unicode has other methods of encoding the difference if necessary, such as Zero-width joiner.
Punctuation and ornaments
Only the Arabic question mark ⟨؟⟩ and the Arabic comma ⟨،⟩ are used in regular Arabic script typing and the comma is often substituted for the Latin script comma ⟨,⟩ which is also used as the decimal separator when the Eastern Arabic numerals are used.٭Word ligatures
Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature.- , as in the phrase ''''