Indo-European languages

The Indo-European languages are a language family native to the northern Indian subcontinent, most of Europe, and the Iranian plateau, with additional native branches found in regions such as parts of Central Asia, southern Indian subcontinent and Armenia. Historically, Indo-European languages were also spoken in Anatolia and Northwestern China. Some European languages of this family—English, French, Portuguese, Italian, Russian, Spanish, and Dutch—have expanded through colonialism in the modern period and are now spoken across several continents. The Indo-European family is divided into several branches or sub-families, including Albanian, Armenian, Balto-Slavic, Celtic, Germanic, Hellenic, Indo-Iranian, and Italic, all of which contain present-day living languages, as well as many more extinct branches.
Today the individual Indo-European languages with the most native speakers are English, Spanish, Portuguese, Russian, Hindi, Bengali, Punjabi, French, and German; many others spoken by smaller groups are in danger of extinction. Over 3.4 billion people speak an Indo-European language as a first language—by far the most of any language family. There are about 446 living Indo-European languages, according to an estimate by Ethnologue, of which 313 belong to the Indo-Iranian branch.
All Indo-European languages are descended from a single prehistoric language, linguistically reconstructed as Proto-Indo-European, spoken sometime during the Neolithic or early Bronze Age. The geographical location where it was spoken, the Proto-Indo-European homeland, has been the object of many competing hypotheses; the academic consensus supports the Kurgan hypothesis, which posits the homeland to be the Pontic–Caspian steppe in what is now Ukraine and Southern Russia, associated with the Yamnaya culture and other related archaeological cultures during the 4th and early 3rd millennia BC. By the time the first written records appeared, Indo-European had already evolved into numerous languages, spoken across much of Europe, South Asia, and part of Western Asia. Written evidence of Indo-European appeared during the Bronze Age in the form of Mycenaean Greek and the Anatolian languages of Hittite and Luwian. The oldest records are isolated Hittite words and names, interspersed in texts that are otherwise in the unrelated Akkadian language found in texts of the Assyrian colony of Kültepe in eastern Anatolia dating to the 20th century BC. Although no older written records of the original Proto-Indo-European population remain, some aspects of their culture and their religion can be reconstructed from later evidence in the daughter cultures. The Indo-European family is significant to the field of historical linguistics as it possesses the second-longest recorded history of any known family after Egyptian and the Semitic languages, which belong to the Afroasiatic language family. The analysis of the family relationships between the Indo-European languages, and the reconstruction of their common source, was central to the development of the methodology of historical linguistics as an academic discipline in the 19th century.
The Indo-European language family is not considered by the current academic consensus in the field of linguistics to have any genetic relationships with other language families, although several [|disputed hypotheses] propose such relations.

History of Indo-European linguistics

During the 16th century, European visitors to the Indian subcontinent began to notice similarities among Indo-Aryan, Iranian, and European languages. In 1583, English Jesuit missionary and Konkani scholar Thomas Stephens wrote a letter from Goa to his brother—published in the 20th century—in which he noted similarities between North Indian languages and Greek and Latin.
Another account was made by Filippo Sassetti, a merchant born in Florence in 1540, who travelled to the Indian subcontinent. Writing in 1585, he noted some word similarities between Sanskrit and Italian. However, neither Stephens' nor Sassetti's observations led to further scholarly inquiry.
In 1647, Dutch linguist and scholar Marcus Zuerius van Boxhorn noted the similarity among certain Asian and European languages and theorized that they were derived from a primitive common language that he called Scythian. He included in his hypothesis Dutch, Albanian, Greek, Latin, Persian, and German, later adding Slavic, Celtic, and Baltic languages. However, Van Boxhorn's suggestions did not become widely known and did not stimulate further research.
Ottoman Turkish traveller Evliya Çelebi visited Vienna in 1665–1666 as part of a diplomatic mission and noted a few similarities between words in German and in Persian.
Gaston Coeurdoux and others made observations of the same type. Coeurdoux made a thorough comparison of Sanskrit, Latin, and Greek conjugations in the late 1760s to suggest a relationship among them. Meanwhile, Mikhail Lomonosov compared different language groups, including Slavic, Baltic, Iranian, Finnish, Chinese, "Hottentot", and others, noting that related languages must have separated in antiquity from common ancestors.
The hypothesis reappeared in 1786 when Sir William Jones first lectured on the striking similarities among three of the oldest languages known in his time: Latin, Greek, and Sanskrit, to which he tentatively added Gothic, Celtic, and Persian, though his classification contained some inaccuracies and omissions. In one of the most famous quotations in linguistics, Jones made the following prescient statement in a lecture to the Asiatic Society of Bengal in 1786, conjecturing the existence of an earlier ancestor language, which he called "a common source" but did not name:
Thomas Young first used the term "Indo-European" in 1813, deriving it from the geographical extremes of the language family: from Western Europe to North India. A synonym is Indo-Germanic, specifying the family's southeasternmost and northwesternmost branches. This first appeared in French in 1810 in the work of Conrad Malte-Brun; in most languages this term is now dated or less common than Indo-European, although in German indogermanisch remains the standard scientific term. A number of other synonymous terms have also been used.
Franz Bopp wrote in 1816 "On the conjugational system of the Sanskrit language compared with that of Greek, Latin, Persian and Germanic" and between 1833 and 1852 he wrote Comparative Grammar. This marks the beginning of Indo-European studies as an academic discipline. The classical phase of Indo-European comparative linguistics leads from this work to August Schleicher's 1861 Compendium and up to Karl Brugmann's Grundriss, published in the 1880s. Brugmann's neogrammarian reevaluation of the field and Ferdinand de Saussure's development of the laryngeal theory may be considered the beginning of "modern" Indo-European studies. The generation of Indo-Europeanists active in the last third of the 20th century developed a better understanding of morphology and of ablaut in the wake of Kuryłowicz's 1956 Apophony in Indo-European, who in 1927 wrote about the existence of the Hittite consonant ḫ. Kuryłowicz's discovery supported Ferdinand de Saussure's 1879 proposal of the existence of coefficients sonantiques, elements de Saussure reconstructed to account for vowel length alternations in Indo-European languages. This led to the so-called laryngeal theory, a major step forward in Indo-European linguistics and a confirmation of de Saussure's theory.

Classification

The various subgroups of the Indo-European language family include ten major branches, listed below in alphabetical order:

Albanian, attested from the 13th century; Proto-Albanian evolved from an ancient Paleo-Balkan language, traditionally thought to be Illyrian, or otherwise a totally unattested Balkan Indo-European language that was closely related to Illyrian and Messapic.
Anatolian, extinct by Late Antiquity, spoken in Anatolia, attested in isolated terms in Luwian/Hittite mentioned in Semitic Old Assyrian texts from the 20th and 19th centuries BC, Hittite texts from about 1650 BC. Among the text is the Anitta text in the Hittite language, which is also the oldest known text in an Indo-European language. It is dated to 1700 BCE.
Armenian, attested from the early 5th century AD. It evolved from the Proto-Armenian language which, according to the Armenian hypothesis, developed in situ from the Proto-Indo-European language of the 3rd millennium BC.
Balto-Slavic, believed by most Indo-Europeanists to form a phylogenetic unit, while a minority ascribes similarities to prolonged language-contact.
* Slavic, attested from the 9th century AD, earliest texts in Old Church Slavonic. Slavic languages include Bulgarian, Russian, Polish, Czech, Slovak, Silesian, Kashubian, Macedonian, Serbo-Croatian, Sorbian, Slovenian, Ukrainian, Belarusian, and Rusyn.
* Baltic, attested from the 14th century; although attested relatively recently, they retain many archaic features attributed to Proto-Indo-European. Living examples are Lithuanian and Latvian.
Celtic, attested since the 6th century BC; Lepontic inscriptions date as early as the 6th century BC; Celtiberian from the 2nd century BC; Primitive Irish Ogham inscriptions from the 4th or 5th century AD, earliest inscriptions in Old Welsh from the 7th century AD. Modern Celtic languages include Welsh, Cornish, Breton, Scottish Gaelic, Irish and Manx.
Germanic, earliest attestations in runic inscriptions from around the 2nd century AD, earliest coherent texts in Gothic, 4th century AD. Old English manuscript tradition from about the 8th century AD. Includes English, Frisian, German, Dutch, Scots, Danish, Swedish, Norwegian, Afrikaans, Yiddish, Low German, Icelandic, Elfdalian, and Faroese.
Hellenic ; fragmentary records in Mycenaean Greek from between 1450 and 1350 BC have been found. Homeric texts date to the 8th century BC.
Indo-Iranian, attested, descended from Proto-Indo-Iranian.
* Indo-Aryan, attested from around 1400 BC in Hittite texts from Anatolia, showing traces of Indo-Aryan words. Epigraphically from the 3rd century BC in the form of Prakrit. The Rigveda is assumed to preserve intact records via oral tradition dating from c. the mid-2nd millennium BC in the form of Vedic Sanskrit. Includes a wide range of modern languages from North India, Eastern Pakistan and Bangladesh, including Hindustani, Bengali, Odia, Assamese, Punjabi, Kashmiri, Gujarati, Marathi, Sindhi and Nepali, as well as Sinhala of Sri Lanka and Dhivehi of the Maldives and Minicoy.
* Iranian or Iranic, attested from roughly 1000 BC in the form of Avestan. Epigraphically from 520 BC in the form of Old Persian. Includes Persian, Pashto, Kurdish, Balochi, Luri, Tajik, and Ossetian.
* Nuristani, attested since the 20th century, are among the newest Indo-European languages to be studied. Includes Katë, Prasun, Ashkun, Nuristani Kalasha, Tregami, and Zemiaki.
Italic, attested from the 7th century BC. Includes the ancient Osco-Umbrian languages, Faliscan, as well as Latin and its descendants, the Romance languages, such as Italian and French.
Tocharian, with proposed links to the Afanasevo culture of Southern Siberia. Extant in two dialects, attested during roughly the 6th–9th centuries AD. Marginalized by the Old Turkic Uyghur Khaganate and probably extinct by the 10th century.

In addition to the classical ten branches listed above, several extinct and little-known languages and language-groups have existed or are proposed to have existed:

Ancient Belgian: hypothetical language associated with the proposed Nordwestblock cultural area. Speculated to be connected to Italic or Venetic, and to have certain phonological features in common with Lusitanian.
Cimmerian: possibly Iranic, Thracian, or Celtic
Dacian: possibly very close to Thracian
Elymian: Poorly-attested language spoken by the Elymians, one of the three indigenous tribes of Sicily. Indo-European affiliation widely accepted, possibly related to Italic or Anatolian.
Illyrian: possibly related to Albanian, Messapian, or both
Liburnian: evidence too scant and uncertain to determine anything with certainty
Ligurian: possibly close to or part of Celtic.
Lusitanian: possibly related to Celtic, Ligurian, or Italic
Ancient Macedonian: proposed relationship to Greek.
Messapic: not conclusively deciphered, often considered to be related to Albanian as the available fragmentary linguistic evidence shows common characteristic innovations and a number of significant lexical correspondences between the two languages
Paionian: extinct language once spoken north of Macedon
Phrygian: language of the ancient Phrygians. Very likely, but not certainly, a sister group to Hellenic.
Sicel: an ancient language spoken by the Sicels, one of the three indigenous tribes of Sicily. Proposed relationship to Latin or Proto-Illyrian at an earlier stage.
Sorothaptic: proposed, pre-Celtic, Iberian language
Thracian: possibly including Dacian
Venetic: shares several similarities with Latin and the Italic languages, but also has some affinities with other IE languages, especially Germanic and Celtic.

Membership of languages in the Indo-European language family is determined by genealogical relationships, meaning that all members are presumed descendants of a common ancestor, Proto-Indo-European. Membership in the various branches, groups, and subgroups of Indo-European is also genealogical, but here the defining factors are shared innovations among various languages, suggesting a common ancestor that split off from other Indo-European groups. For example, what makes the Germanic languages a branch of Indo-European is that much of their structure and phonology can be stated in rules that apply to all of them. Many of their common features are presumed innovations that took place in Proto-Germanic, the source of all the Germanic languages.
In the 21st century, several attempts have been made to model the phylogeny of Indo-European languages using Bayesian methodologies similar to those applied to problems in biological phylogeny. Although there are differences in absolute timing between the various analyses, there is much commonality between them, including the result that the first known language groups to diverge were the Anatolian and Tocharian language families, in that order.

Tree versus wave model

The "tree model" is considered an appropriate representation of the genealogical history of a language family if communities do not remain in contact after their languages have started to diverge. In this case, subgroups defined by shared innovations form a nested pattern. The tree model is not appropriate in cases where languages remain in contact as they diversify; in such cases subgroups may overlap, and the wave model is a more accurate representation. Most approaches to Indo-European subgrouping to date have assumed that the tree model is by-and-large valid for Indo-European; however, there is also a long tradition of wave-model approaches.
In addition to genealogical changes, many of the early changes in Indo-European languages can be attributed to language contact. It has been asserted, for example, that many of the more striking features shared by Italic languages might well be areal features. More certainly, very similar-looking alterations in the systems of long vowels in the West Germanic languages greatly postdate any possible notion of a proto-language innovation. In a similar vein, there are many similar innovations in Germanic and Balto-Slavic that are far more likely areal features than traceable to a common proto-language, such as the uniform development of a high vowel before the PIE syllabic resonants *ṛ, *ḷ, *ṃ, *ṇ, unique to these two groups among IE languages, which is in agreement with the wave model. The Balkan sprachbund even features areal convergence among members of very different branches.
An extension to the Ringe-Warnow model of language evolution suggests that early IE had featured limited contact between distinct lineages, with only the Germanic subfamily exhibiting a less treelike behaviour as it acquired some characteristics from neighbours early in its evolution. The internal diversification of especially West Germanic is cited to have been radically non-treelike.

Proposed subgroupings

Specialists have postulated the existence of higher-order subgroups such as Italo-Celtic, Graeco-Armenian, Graeco-Aryan or Graeco-Armeno-Aryan, and Balto-Slavo-Germanic. However, unlike the ten traditional branches, these are all controversial to a greater or lesser degree.
The Italo-Celtic subgroup was at one point uncontroversial, considered by Antoine Meillet to be even better established than Balto-Slavic. The main lines of evidence included the genitive suffix -ī; the superlative suffix -m̥mo; the change of /p/ to /kʷ/ before another /kʷ/ in the same word ; and the subjunctive morpheme -ā-. This evidence was prominently challenged by Calvert Watkins, while Michael Weiss has argued for the subgroup.
Evidence for a relationship between Greek and Armenian includes the regular change of the second laryngeal to a at the beginnings of words, as well as terms for "woman" and "sheep". Greek and Indo-Iranian share innovations mainly in verbal morphology and patterns of nominal derivation. Relations have also been proposed between Phrygian and Greek, and between Thracian and Armenian. Some fundamental shared features, like the aorist having the perfect active particle -s fixed to the stem, link this group closer to Anatolian languages and Tocharian. Shared features with Balto-Slavic languages, on the other hand, might be due to later contacts.
The Indo-Hittite hypothesis proposes that the Indo-European language family consists of two main branches: one represented by the Anatolian languages and another branch encompassing all other Indo-European languages. Features that separate Anatolian from all other branches of Indo-European have been interpreted alternately as archaic debris or as innovations due to prolonged isolation. Points proffered in favour of the Indo-Hittite hypothesis are the Indo-European agricultural terminology in Anatolia and the preservation of laryngeals. However, in general this hypothesis is considered to attribute too much weight to the Anatolian evidence. According to another view, the Anatolian subgroup left the Indo-European parent language comparatively late, approximately at the same time as Indo-Iranian and later than the Greek or Armenian divisions. A third view, especially prevalent in the so-called French school of Indo-European studies, holds that extant similarities in non-satem languages in general, including Anatolian, might be due to their peripheral location in the Indo-European language-area and to early separation, rather than indicating a special ancestral relationship. Hans J. Holm, based on lexical calculations, arrives at a picture roughly replicating the general scholarly opinion and refuting the Indo-Hittite hypothesis.

Satem and centum languages

The division of the Indo-European languages into satem and centum groups was put forward by Peter von Bradke in 1890, although Karl Brugmann did propose a similar type of division in 1886. In the satem languages, which include the Balto-Slavic and Indo-Iranian branches, as well as Albanian and Armenian, the reconstructed Proto-Indo-European palatovelars remained distinct and were fricativized, while the labiovelars merged with the 'plain velars'. In the centum languages, the palatovelars merged with the plain velars, while the labiovelars remained distinct. The results of these alternative developments are exemplified by the words for "hundred" in Avestan and Latin —the initial palatovelar developed into a fricative in the former, but became an ordinary velar in the latter.
Rather than being a genealogical separation, the centum–satem division is commonly seen as resulting from innovative changes that spread across PIE dialect-branches over a particular geographical area; the centum–satem isogloss intersects a number of other isoglosses that mark distinctions between features in the early IE branches. It may be that the centum branches in fact reflect the original state of affairs in PIE, and only the satem branches shared a set of innovations, which affected all but the peripheral areas of the PIE dialect continuum. Kortlandt proposes that the ancestors of Balts and Slavs took part in satemization before being drawn later into the western Indo-European sphere.

Proposed external relations

From the beginning of Indo-European studies, there have been attempts to link the Indo-European languages genealogically to other languages and language families. No theory has gained majority support, and are sceptical or agnostic about such proposals.
Proposals linking the Indo-European languages with a single language family include:

Indo-Uralic, joining Indo-European with Uralic
Pontic, postulated by John Colarusso, which joins Indo-European with Northwest Caucasian

Other proposed families include:

Nostratic, comprising all or some of the Eurasiatic languages and the Kartvelian, Dravidian—or wider, Elamo-Dravidian—and Afroasiatic language families
Eurasiatic, a theory by Joseph Greenberg, comprising the Uralic, Altaic and various Paleosiberian families and possibly others

Nostratic and Eurasiatic, in turn, have been included in wider groupings, such as Borean, a language family separately proposed by Harold C. Fleming and Sergei Starostin that encompasses almost all of the world's natural languages with the exception of those native to sub-Saharan Africa, New Guinea, Australia, and the Andaman Islands.

Evolution

Proto-Indo-European

The proposed Proto-Indo-European language is the reconstructed common ancestor of the Indo-European languages, spoken by the Proto-Indo-Europeans. During the 1960s, knowledge of Anatolian became certain enough to establish its relationship to PIE. Using the method of internal reconstruction, an earlier stage, called Pre-Proto-Indo-European, has been proposed.
PIE is an inflected language, in which the grammatical relationships between words were signalled through inflectional morphemes, usually endings. The roots of PIE are basic morphemes carrying a lexical meaning. By addition of suffixes, they form stems, and by addition of endings, these form grammatically inflected words, such as nouns or verbs. The reconstructed Indo-European verb system is complex and, like the noun, exhibits a system of ablaut.

Diversification

The diversification of the parent language into the attested branches of daughter languages is historically unattested. The timeline of the evolution of the various daughter languages is mostly undisputed.
Using a mathematical analysis borrowed from evolutionary biology, Donald Ringe and Tandy Warnow proposed the following evolutionary tree of Indo-European branches:

Pre-Anatolian before 3500 BC
Pre-Tocharian
Pre-Italic and Pre-Celtic before 2500 BC
Pre-Armenian and Pre-Greek after 2500 BC
Proto-Indo-Iranian
Pre-Germanic and Pre-Balto-Slavic; Proto-Germanic

David Anthony proposes the following sequence:

Pre-Anatolian
Pre-Tocharian
Pre-Germanic
Pre-Italic and Pre-Celtic
Pre-Armenian
Pre-Balto-Slavic
Pre-Greek
Proto-Indo-Iranian ; split into Iranian and Old Indic

From 1500 BC the following sequence may be given:

1500–1000 BC:
*The Nordic Bronze Age of Scandinavia developed pre-Proto-Germanic, and the Proto-Celtic Urnfield and Hallstatt cultures emerged in Central Europe, introducing the Iron Age.
*Migration of the Proto-Italic speakers into the Italian peninsula.
*Migration of Aryans to India followed by the redaction of the Rigveda; rise of the Vedic civilization and beginning of Iron Age in the Punjab.
*The Mycenaean civilization gave way to the Greek Dark Ages.
*Hittite went extinct.
*Iranian speakers started migrating southwards to Greater Iran.
*Balto-Slavic split into ancestors of modern Baltic and Slavic.
1000–500 BC:
*The Celtic languages spread over Central and Western Europe, including Britain.
*Baltic languages were spoken in a large area from present-day Poland to Moscow.
*Pre-Proto-Germanic gave rise to Proto-Germanic in southern Scandinavia.
*Homer and the beginning of Classical Antiquity.
*The Vedic civilization gave way to the Mahajanapadas as the Indo-Aryan tongue reaches eastwards, giving rise to the Greater Magadha cultural sphere, where Mahavira preached Jainism and Siddhartha Gautama preached Buddhism.
*Zoroaster composed the Gathas, rise of the Achaemenid Empire, replacing the Elamites and Babylonia.
*Separation of Proto-Italic into Osco-Umbrian, Latin-Faliscan, and possibly Venetic and Siculian.
*A variety of Paleo-Balkan languages besides Greek were spoken in Southern Europe, including Thracian, Dacian and Illyrian, and in Anatolia.
*Development of Prakrits across the northern Indian subcontinent, as well as migration of Indo-Aryan speakers to Sri Lanka and the Maldives.
500–1 BC, Classical Antiquity:
*Spread of Greek and Latin throughout the Mediterranean and, during the Hellenistic period, to Central Asia and the Hindukush.
*The Magadhan power and influence rose in ancient India, especially with the conquests of the Nandan and Mauryan empires.
*Germanic speakers started migrating southwards to occupy formerly Celtic territories.
*Scythian cultures extended from Eastern Europe to Northwest China.
1 BC–AD 500; Late Antiquity, Gupta period:
*Attestation of Armenian. Proto-Slavic.
*The Roman Empire and then the Germanic migrations marginalized the Celtic languages to the British Isles.
*Sogdian, an eastern Iranian language, became the lingua franca of the Silk Road in Central Asia leading to China, due to the proliferation of Sogdian merchants there.
*Greek settlements and Byzantine rule made the last Anatolian languages extinct.
*Turkic languages started replacing Scythian languages.
500–1000, Early Middle Ages:
*The Viking Age formed an Old Norse koine spanning Scandinavia, the British Isles and Iceland.
*Phrygian became extinct.
*The Islamic conquests and the Turkic expansion resulted in the Arabization and Turkification of significant areas where Indo-European languages were spoken, and Persian developed under Islamic rule and extended into Afghanistan and Tajikistan.
*Due to further Turkic migrations, Tocharian became fully extinct while Scythian languages were overwhelmingly replaced.
*Slavic languages spread over wide areas in central, eastern and southeastern Europe, largely replacing Romance in the Balkans—with the exception of Romanian—and whatever was left of the Paleo-Balkan languages—with the exception of Albanian.
*Pannonian Basin was taken by the Magyars from the western Slavs.
1000–1500, Late Middle Ages:
*Attestation of Albanian and Baltic.
*Modern dialects of Indo-European languages started emerging.
1500–2000, early modern period to present:
*Colonialism resulted in the spread of Indo-European languages to every habitable continent, most notably Romance, West Germanic, and Russian to Central Asia and North Asia.

Key languages for reconstruction

In reconstructing the history of the Indo-European languages and the form of the Proto-Indo-European language, some languages have been of particular importance. These generally include the ancient Indo-European languages that are both well-attested and documented at an early date, although some languages from later periods are important if they are particularly linguistically conservative, most notably, Lithuanian. Early poetry is of special significance because of the rigid poetic meter normally employed, which makes it possible to reconstruct a number of features, e.g. vowel length, that were either unwritten or corrupted in the process of transmission down to the earliest extant written manuscripts.
Most noticeably:

Vedic Sanskrit. This language is unique in that its source documents were all composed orally, and were passed down through oral tradition for c. 2,000 years before being written down. The oldest documents are all in poetic form; oldest and most important of all is the Rigveda ). The oldest inscriptions in the language of the Rigveda, are found in northern Syria, where the Mitanni kingdom was located.
Ancient Greek. Mycenaean Greek is the oldest recorded form, but its value is lessened by the limited material, restricted subject matter, and highly ambiguous writing system.More important is Ancient Greek, documented extensively beginning with the two Homeric poems.
Hittite. This is the earliest recorded of all Indo-European languages, and highly divergent from the others due to the early separation of the Anatolian languages from the remainder. It possesses some highly archaic features found fragmentarily, if at all, in other languages. It appears to have undergone many early phonological and grammatical changes which, combined with the ambiguities of its writing system, hinder its usefulness somewhat.

Other primary sources:

Latin, attested in a large amount of poetic and prose material in the Classical period and limited Old Latin material from as early as.
Gothic, along with the combined witness of the other old Germanic languages: most importantly, Old English, Old High German and Old Norse.
Old Avestan and Younger Avestan ). Documentation is sparse, but nonetheless quite important due to its highly archaic nature.
Modern Lithuanian, with limited records in Old Lithuanian.
Old Church Slavonic.

Other secondary sources, due to poor attestation:

Luwian, Lycian, Lydian and other Anatolian languages.
Oscan, Umbrian and other Old Italic languages ).
Old Persian.
Old Prussian ; more archaic than Lithuanian.

Other secondary sources, due to extensive phonological changes and relatively limited attestation:

Old Irish.
Tocharian, underwent large phonetic shifts and mergers in the proto-language, and has an almost entirely reworked declension system.
Classical Armenian.
Albanian.

Sound changes

As speakers of Proto-Indo-European dispersed, the language's sound system diverged as well, changing according to various sound laws evidenced in the daughter languages.
PIE is normally reconstructed with a complex system of 15 stop consonants, including an unusual three-way phonation or voicing distinction between voiceless, voiced and "voiced aspirated", i.e. breathy voiced, stops, and a three-way distinction among velar consonants—k-type sounds—between palatal ḱ ǵ ǵh, plain velar k g gh and labiovelar kʷ gʷ gʷh. The correctness of the terms palatal and plain velar is disputed. All daughter languages have reduced the number of distinctions among these sounds, often in divergent ways.
As an example, in English, one of the Germanic languages, the following are some of the major changes that happened:
None of the daughter-language families, except possibly Anatolian, particularly Luvian, reflect the plain velar stops differently from the other two series, and there is even a certain amount of dispute whether this series existed in PIE. The major distinction between centum and satem languages corresponds to the outcome of the PIE plain velars:

The central satem languages—Indo-Iranian, Balto-Slavic, Albanian, and Armenian—reflect both plain velar and labiovelar stops as plain velars, often with secondary palatalization before a front vowel. The palatal stops are palatalized and often appear as sibilants, usually distinct from the secondarily palatalized stops.
The peripheral centum languages—Germanic, Italic, Celtic, Greek, Anatolian and Tocharian—reflect both "

palatal and plain velar stops as plain velars, while the labiovelars continue unchanged, often with later reduction into plain labial or velar consonants.
The three-way PIE distinction between voiceless, voiced and voiced aspirated stops is considered extremely unusual from the perspective of linguistic typology—particularly in the existence of voiced aspirated stops without a corresponding series of voiceless aspirated stops. None of the various daughter-language families continue it unchanged, with numerous resolutions to the unstable PIE situation:

The Indo-Aryan languages preserve the three series unchanged and have evolved a fourth series of voiceless aspirated consonants.
The Iranian languages probably passed through the same stage, subsequently changing the aspirated stops into fricatives.
Greek converted the voiced aspirates into voiceless aspirates.
Italic probably passed through the same stage, and reflects the voiced aspirates as f or h, or sometimes plain voiced stops in Latin.
Celtic, Balto-Slavic, Anatolian, and Albanian merge the voiced aspirated into plain voiced stops.
Germanic and Armenian change all three series in a chain shift, e.g. with bh b p becoming b p f, known as Grimm's law in Germanic.

Among the other changes affecting consonants are:

The Ruki sound law, in which s becomes before r, u, k, i in the satem languages.
Loss of prevocalic p in Proto-Celtic.
Development of prevocalic s to h in Proto-Greek, with later loss of h between vowels.
Verner's law in Proto-Germanic.
Grassmann's law, the dissimilation of aspirates, independently in Proto-Greek and Proto-Indo-Iranian.

There are various basic outcomes of PIE consonants in some of the most important daughter languages for the purposes of reconstruction.

C- At the beginning of a word.

-C- Between vowels.
-C At the end of a word.
`-C- Following an unstressed vowel.
-C- Between vowels, or between a vowel and r, l.C^T Before a stop.C^T− After a obstruent.C Before or after an obstruent.C^H Before an original laryngeal.C^E Before a front vowel.C^E' Before secondary front-vowels.C^e Before e.C Before or after a u.C Before or after a o, u.Cⁿ⁻ After n.C^R Before a sonorant.C Before or after a sonorant.C^,l,u− Before r, l or after r, u.C^ruki− After r, u, k, i.C^..Ch Before an aspirated consonant in the next syllable.C^E..Ch Before a front vowel as well as before an aspirated consonant in the next syllable.C^..Ch Before or after a u as well as before an aspirated consonant in the next syllable.

Comparison of conjugations

Aa comparison of conjugations of the thematic present indicative of the verbal root bʰer- of the English verb to bear and its reflexes in various early attested IE languages and their modern descendants or relatives, shows that all languages had in the early stage an inflectional verb system.

	Proto-Indo-European
I	bʰéroh₂
You	bʰéresi
He/She/It	bʰéreti
We two	bʰérowos
You two	bʰéreth₁es
They two	bʰéretes
We	bʰéromos
You	bʰérete
They	bʰéronti

Similarities are visible between the modern descendants and relatives of these ancient languages, and the differences have increased over time. Some IE languages have moved from synthetic verb systems to largely periphrastic systems. Some of these verbs have undergone a change in meaning as well.

In Modern Irish beir usually only carries the meaning to bear in the sense of bearing a child; its common meanings are to catch, grab. Apart from the first person, the comparative forms are dialectical or obsolete. The second and third person forms are typically instead conjugated periphrastically by adding a pronoun after the verb: beireann tú, beireann sé/sí, beireann sibh, beireann siad.
The Hindustani verb bʰarnā, the continuation of the Sanskrit verb, can have a variety of meanings, but the most common is "to fill". The comparative forms are etymologically derived from the present indicative, and now have the meaning of future subjunctive. The loss of the present indicative in Hindustani is roughly compensated by the periphrastic habitual indicative construction, using the habitual participle and an auxiliary: ma͠i bʰartā hū̃, tū bʰartā hai, vah bʰartā hai, ham bʰarte ha͠i, tum bʰarte ho, ve bʰarte ha͠i.
The Gothic forms are a close approximation of what the early West Germanic forms of would have looked like. The descendant of Proto-Germanic *beraną survives in German only in the compound gebären, meaning "bear ".
The Latin verb ferre is irregular, and not representative of a normal thematic verb. In most Romance languages such as Portuguese, other verbs now mean "to carry" and ferre was borrowed and nativized only in compounds such as sofrer "to suffer" and conferir "to confer".
In Modern Greek, phero φέρω "to bear" is still in specific contexts and is most common in such compounds as αναφέρω, διαφέρω, εισφέρω, εκφέρω, καταφέρω, προφέρω, προαναφέρω, προσφέρω etc. The form that is very common today is pherno φέρνω meaning "to bring". Additionally, the perfective form of pherno, used for the subjunctive voice and also for the future tense, is also phero.
The dual forms are archaic in standard Lithuanian, and are now used only in some dialects, e.g. Samogitian.
Among modern Slavic languages, only Slovene continues to have a dual number in the standard variety.

Present distribution

Today, Indo-European languages are spoken by billions of native speakers across all inhabited continents, the largest number by far for any recognized language family. Of the 20 languages with the largest numbers of speakers according to Ethnologue, ten are Indo-European: English, Hindustani, Spanish, Bengali, French, Russian, Portuguese, German, Persian and Punjabi, each with 100 million speakers or more. Additionally, hundreds of millions of persons worldwide study Indo-European languages as secondary or tertiary languages, including in cultures which have completely different language families and historical backgrounds—there are around 600 million learners of English, for example.
The success of the language family, including the large number of speakers and the vast portions of the Earth that they inhabit, is due to several factors. The ancient Indo-European migrations and widespread dissemination of Indo-European culture throughout Eurasia, including that of the Proto-Indo-Europeans themselves, and that of their daughter cultures including the Indo-Aryans, Iranian peoples, Celts, Greeks, Romans, Germanic peoples, and Slavs, led to these peoples' branches of the language family already taking a dominant foothold in virtually all of Eurasia except for swathes of the Near East, North and East Asia, replacing many of the previously-spoken pre-Indo-European languages of this extensive area. Semitic languages remain dominant in much of the Middle East and North Africa, and Caucasian languages in much of the Caucasus region. Similarly in Europe and the Urals the Uralic languages, such as Hungarian, Finnish and Estonian, remain, as does Basque, a pre-Indo-European isolate.
Before becoming aware of the common linguistic origin, diverse groups of Indo-European speakers continued to culturally dominate and often replace the indigenous languages of the western two-thirds of Eurasia. By the beginning of the Common Era, Indo-European peoples controlled almost the entirety of this area: the Celts western and central Europe, the Romans southern Europe, the Germanic peoples northern Europe, the Slavs eastern Europe, the Iranian peoples most of western and central Asia and parts of eastern Europe, and the Indo-Aryan peoples in the Indian subcontinent, with the Tocharians inhabiting the Indo-European frontier in western China. By the medieval period, the Semitic, Dravidian, Caucasian, and Uralic languages, and the language isolate Basque remained of the indigenous languages of Europe and the western half of Asia.
Alongside medieval invasions by Eurasian nomads, a group to which the Proto-Indo-Europeans had once belonged, Indo-European expansion reached another peak in the early modern period with the dramatic increase in the population of the Indian subcontinent and European expansionism throughout the globe during the Age of Discovery, as well as the continued replacement and assimilation of surrounding non-Indo-European languages and peoples due to increased state centralization and nationalism. These trends compounded throughout the modern period due to the general global population growth and the results of European colonization of the Western Hemisphere and Oceania, leading to an explosion in the number of Indo-European speakers as well as the territories inhabited by them.
Due to colonization and the modern dominance of Indo-European languages in the fields of politics, global science, technology, education, finance, and sports, many modern countries whose populations largely speak non-Indo-European languages have Indo-European languages as official languages, and the majority of the global population speaks at least one Indo-European language. The overwhelming majority of languages used on the Internet are Indo-European, with English continuing to lead the group; English in general has in many respects become the lingua franca of global communication.

Databases

.
, an online collection of introductory videos to Ancient Indo-European languages produced by the University of Göttingen