Indo-Aryan languages


The Indo-Aryan languages are a branch of the Indo-Iranian languages in the Indo-European language family. As of 2024, there are more than 1.5 billion speakers, primarily concentrated east of the Indus River in South Asia, spread across Eastern Pakistan, Northern India, southern Nepal, Bangladesh, Sri Lanka, and Maldives. Moreover, apart from the Indian subcontinent, large immigrant and expatriate Indo-Aryan–speaking communities live in Northwestern Europe, Western Asia, North America, the Caribbean, Southeast Africa, Polynesia and Australia, along with several million speakers of Romani languages primarily concentrated in Southeastern Europe. There are over 200 known Indo-Aryan languages.
Modern Indo-Aryan languages descend from Old Indo-Aryan languages such as early Vedic Sanskrit, Sanskrit through Middle Indo-Aryan languages. The largest such languages in terms of first-speakers are Hindustani , Bengali, Punjabi, Marathi, and Gujarati. A 2005 estimate placed the total number of native speakers of the Indo-Aryan languages at nearly 900 million people. Other estimates are higher, suggesting a figure of 1.5 billion speakers of Indo-Aryan languages.

Classification

Theories

The Indo-Aryan family as a whole is thought to represent a dialect continuum, where languages are often transitional towards neighbouring varieties. Because of this, the division into languages vs. dialects is in many cases somewhat arbitrary. The classification of the Indo-Aryan languages is controversial, with many transitional areas that are assigned to different branches depending on classification. There are concerns that a tree model is insufficient for explaining the development of New Indo-Aryan, with some scholars suggesting the wave model.

Subgroups

The following table of proposals is expanded from , and also includes subsequent classification proposals. The table lists only some modern Indo-Aryan languages.
Anton I. Kogan, in 2016, conducted a lexicostatistical study of the New Indo-Aryan languages based on a 100-word Swadesh list, using techniques developed by the glottochronologist and comparative linguist Sergei Starostin. That grouping system is notable for Kogan's exclusion of Dardic from Indo-Aryan on the basis of his previous studies showing low lexical similarity to Indo-Aryan and negligible difference with similarity to Iranian. He also calculated Sinhala–Dhivehi to be the most divergent Indo-Aryan branch. Nevertheless, the modern consensus of Indo-Aryan linguists tends towards the inclusion of Dardic based on morphological and grammatical features.

Inner–Outer hypothesis

The Inner–Outer hypothesis argues for a core and periphery of Indo-Aryan languages, with Outer Indo-Aryan representing an older stratum of Old Indo-Aryan that has been mixed to varying degrees with the newer stratum that is Inner Indo-Aryan. It is a contentious proposal with a long history, with varying degrees of claimed phonological and morphological evidence. Since its proposal by Rudolf Hoernlé in 1880 and refinement by George Grierson it has undergone numerous revisions and a great deal of debate, with the most recent iteration by Franklin Southworth and Claus Peter Zoller based on robust linguistic evidence. Some of the theory's sceptics include Suniti Kumar Chatterji and Colin P. Masica.

Groups

The below classification follows, and.

Dardic

The Dardic languages are a group of Indo-Aryan languages largely spoken in the northwestern extremities of the Indian subcontinent. Dardic was first formulated by George Abraham Grierson in his Linguistic Survey of India but he did not consider it to be a subfamily of Indo-Aryan. The Dardic group as a genetic grouping has been scrutinised and questioned to a degree by recent scholarship: Southworth, for example, says "the viability of Dardic as a genuine subgroup of Indo-Aryan is doubtful" and "the similarities among may result from subsequent convergence".
The Dardic languages are thought to be transitional with Punjabi and Pahari, as well as non-Indo-Aryan Nuristani; and are renowned for their relatively conservative features in the context of Proto-Indo-Aryan.
The Northern Indo-Aryan languages, also known as the Pahari languages, are spoken throughout the Himalayan regions of the subcontinent.
Northwestern Indo-Aryan languages are spoken in the northwestern region of India and eastern region of Pakistan. Punjabi is spoken predominantly in the Punjab region and is the official language of the northern Indian state of Punjab, in addition to being the most widely-spoken language in Pakistan. Sindhi and its variants are spoken natively in the Pakistani province of Sindh and neighbouring regions. Northwestern languages are ultimately thought to be descended from Shauraseni Prakrit, with influence from Persian and Arabic.
Western Indo-Aryan languages are spoken in central and western India, in states such as Madhya Pradesh and Rajasthan, in addition to contiguous regions in Pakistan. Gujarati is the official language of Gujarat, and is spoken by over 50 million people. In Europe, various Romani languages are spoken by the Romani people, an itinerant community who historically migrated from India. The Western Indo-Aryan languages are thought to have diverged from their northwestern counterparts, although they have a common antecedent in Shauraseni Prakrit.
Within India, Central Indo-Aryan languages are spoken primarily in the western Gangetic plains, including Delhi and parts of the Central Highlands, where they are often transitional with neighbouring lects. Many of these languages, including Braj and Awadhi, have rich literary and poetic traditions. Urdu, a Persianised derivative of Dehlavi descended from Shauraseni Prakrit, is the official language of Pakistan and also has strong historical connections to India, where it also has been designated with official status. Hindi, a standardised and Sanskritised register of Dehlavi, is the official language of the Government of India. Together with Urdu, it is the third most-spoken language in the world.
The Eastern Indo-Aryan languages, also known as Magadhan languages, are spoken throughout the eastern subcontinent, alongside other regions surrounding the northwestern Himalayan corridor. Bengali is the seventh most-spoken language in the world, and has a strong literary tradition; the national anthems of India and Bangladesh are written in Bengali. Assamese and Odia are the official languages of Assam and Odisha, respectively. The Eastern Indo-Aryan languages descend from Magadhan Apabhraṃśa and ultimately from Magadhi Prakrit. Eastern Indo-Aryan languages display many morphosyntactic features similar to those of Munda languages, which are largely absent in western Indo-Aryan languages. It is suggested that "proto-Munda" languages may have once dominated the eastern Indo-Gangetic Plain, and were then absorbed by Indo-Aryan languages at an early date as Indo-Aryan spread east.
Marathi-Konkani languages are ultimately descended from Maharashtri Prakrit, whereas Insular Indo-Aryan languages are descended from Elu Prakrit and possess several characteristics that markedly distinguish them from most of their mainland Indo-Aryan counterparts. Insular Indo-Aryan languages started developing independently and diverging from the continental Indo-Aryan languages from around 5th century BCE.