Proto-Indo-European language

Proto-Indo-European is the reconstructed common ancestor of the Indo-European language family. No direct record of Proto-Indo-European has been discovered; its proposed features have been derived by linguistic reconstruction from documented Indo-European languages. Far more work has gone into reconstructing PIE than any other proto-language. The majority of linguistic work during the 19th century was devoted to the reconstruction of PIE and its daughter languages, and many of the modern techniques of linguistic reconstruction were developed as a result.
PIE is hypothesized to have been spoken as a single language from approximately 4500 BCE to 2500 BCE during the Late Neolithic to Early Bronze Age, though estimates vary by more than a thousand years. According to the prevailing Kurgan hypothesis, the original homeland of the Proto-Indo-Europeans may have been in the Pontic–Caspian steppe of eastern Europe. The linguistic reconstruction of PIE has provided insight into the pastoral culture and patriarchal religion of its speakers. As speakers of Proto-Indo-European became isolated from each other through the Indo-European migrations, the regional dialects of Proto-Indo-European spoken by the various groups diverged, as each dialect underwent shifts in pronunciation, morphology, and vocabulary. Over many centuries, these dialects transformed into the known ancient Indo-European languages. From there, further linguistic divergence led to the evolution of their current descendants, the modern Indo-European languages.
PIE is believed to have had an elaborate system of morphology that included inflectional suffixes as well as ablaut and accent. PIE nominals and pronouns had a complex system of declension, and verbs similarly had a complex system of conjugation. The PIE phonology, particles, numerals, and copula are also well-reconstructed. Asterisks are used by linguists as a conventional mark of reconstructed words, such as wódr̥, ḱwn̥tós, or tréyes; these forms are the reconstructed ancestors of the modern English words water, hound, and three, respectively.

Development of the hypothesis

No direct evidence of the Proto-Indo-European language exists; scholars have reconstructed PIE from its present-day descendants using the comparative method. For example, compare the pairs of words in Italian and English: piede and foot, padre and father, pesce and fish. Since there is a consistent correspondence of the initial consonants that emerges far too frequently to be unrelated coincidence, one can infer that these languages stem from a common parent language. Detailed analysis suggests a system of sound laws to describe the phonetic and phonological changes from the hypothetical ancestral words to the modern ones. These laws have become so detailed and reliable as to support the Neogrammarian hypothesis: the Indo-European sound laws apply without exception.
William Jones, an Anglo-Welsh philologist and puisne judge in Bengal, caused an academic sensation when in 1786 he postulated the common ancestry of Sanskrit, Greek, Latin, Gothic, the Celtic languages, and Old Persian, but he was not the first to state such a hypothesis. In the 16th century, European visitors to the Indian subcontinent became aware of similarities between Indo-Iranian languages and European languages, and as early as 1653, Marcus Zuerius van Boxhorn had published a proposal for a proto-language for the following language families: Germanic, Romance, Greek, Baltic, Slavic, Celtic, and Iranian. In a memoir sent to the Académie des Inscriptions et Belles-Lettres in 1767, Gaston-Laurent Coeurdoux, a French Jesuit who spent most of his life in India, had specifically demonstrated the analogy between Sanskrit and European languages. According to current academic consensus, Jones's famous work of 1786 was less accurate than his predecessors', as he erroneously included Egyptian, Japanese and Chinese in the Indo-European languages, while omitting Hindi.
In 1818, Danish linguist Rasmus Christian Rask elaborated the set of correspondences in his prize essay Undersøgelse om det gamle Nordiske eller Islandske Sprogs Oprindelse, where he argued that Old Norse was related to the Germanic languages, and had even suggested a relation to the Baltic, Slavic, Greek, Latin and Romance languages. In 1816, Franz Bopp published On the System of Conjugation in Sanskrit, in which he investigated the common origin of Sanskrit, Persian, Greek, Latin, and German. In 1833, he began publishing the Comparative Grammar of Sanskrit, Zend, Greek, Latin, Lithuanian, Old Slavic, Gothic, and German.
In 1822, Jacob Grimm formulated what became known as Grimm's law as a general rule in his Deutsche Grammatik. Grimm showed correlations between the Germanic and other Indo-European languages and demonstrated that sound change systematically transforms all words of a language. From the 1870s, the Neogrammarians proposed that sound laws have no exceptions, as illustrated by Verner's law, published in 1876, which resolved apparent exceptions to Grimm's law by exploring the role of accent in language change.
August Schleicher's A Compendium of the Comparative Grammar of the Indo-European, Sanskrit, Greek and Latin Languages represented an early attempt to reconstruct the Proto-Indo-European language.
By the early 1900s, Indo-Europeanists had developed well-defined descriptions of PIE which scholars still accept today. Later, the discovery of the Anatolian and Tocharian languages added to the corpus of descendant languages. A subtle new principle won wide acceptance: the laryngeal theory, which explained irregularities in the reconstruction of Proto-Indo-European phonology as the effects of hypothetical sounds which no longer exist in all languages documented prior to the excavation of cuneiform tablets in Anatolian. This theory was first proposed by Ferdinand de Saussure in 1879 on the basis of internal reconstruction only, and progressively won general acceptance after Jerzy Kuryłowicz's discovery of consonantal reflexes of these reconstructed sounds in Hittite.
Julius Pokorny's Indogermanisches etymologisches Wörterbuch gave a detailed, though conservative, overview of the lexical knowledge accumulated by 1959. Jerzy Kuryłowicz's 1956 Apophonie gave a better understanding of Indo-European ablaut. From the 1960s, knowledge of Anatolian became robust enough to establish its relationship to PIE.
In The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World, Mallory and Adams illustrate the resemblance with the following examples of cognate forms :

Historical and geographical setting

Scholars have proposed multiple hypotheses about when, where, and by whom PIE was spoken. The Kurgan hypothesis, first put forward in 1956 by Marija Gimbutas, has become the most popular.
Other theories include the Anatolian hypothesis, which posits that PIE spread out from Anatolia with agriculture beginning 7500–6000 BCE, the Armenian hypothesis, the Paleolithic continuity paradigm, and the indigenous Aryans theory. The last two of these theories are not regarded as credible within academia. Out of all the theories for a PIE homeland, the Kurgan and Anatolian hypotheses are the ones most widely accepted, and also the ones most debated against each other. Following the publication of several studies on ancient DNA in 2015, Colin Renfrew, the original author and proponent of the Anatolian hypothesis, has accepted the reality of migrations of populations speaking one or several Indo-European languages from the Pontic steppe towards Northwestern Europe.

Descendants

The antiquity of the earliest attestation of each Indo-European group is: 2000–1500 BCE for Anatolian; 1500–1000 BCE for Indo-Aryan and Greek; 1000–500 BCE for Iranic, Celtic, Italic, Phrygian, Illyric, Messapic, South Picene, and Venetic; 500–1 BCE for Thracian and Ancient Macedonian; 1–500 CE for Germanic, Armenian, Lusitanian, and Tocharian; 500–1000 CE for Slavic; 1500–2000 CE for Albanian and Baltic.
The table lists the main Indo-European language families, comprising the languages descended from Proto-Indo-European.

Clade	Proto-language	Description	Historical languages	Modern descendants
Anatolian	Proto-Anatolian	All now extinct, the best attested being the Hittite language.	Hittite, Luwian, Palaic, Lycian, Lydian, Carian, Pisidian, Sidetic, Milyan	There are no living descendants of Proto-Anatolian.
Tocharian	Proto-Tocharian	An extinct branch known from manuscripts dating from the 6th to the 8th century AD and found in northwest China.	Tocharian A, Tocharian B	There are no living descendants of Proto-Tocharian.
Italic	Proto-Italic	This included many languages, but only descendants of Latin survive.	Latin, Faliscan, Umbrian, Oscan, African Romance, Dalmatian, Volscian, Marsi, Pre-Samnite, Paeligni, Sabine	Portuguese, Galician, Spanish, Ladino, Catalan, Occitan, French, Italian, Friulian, Romansh, Romanian, Aromanian, Sardinian, Corsican, Venetian, Latin, Picard, Mirandese, Aragonese, Walloon, Piedmontese, Lombard, Neapolitan, Sicilian, Emilian-Romagnol, Ligurian, Ladin
Celtic	Proto-Celtic	Once spoken across Europe and Anatolia, but now mostly confined to Europe's northwestern edge.	Gaulish, Lepontic, Noric, Pictish, Cumbric, Old Irish, Middle Welsh, Gallaecian, Galatian, Celtiberian	Irish, Scottish Gaelic, Welsh, Breton, Cornish, Manx
Germanic	Proto-Germanic	Branched into three subfamilies: West Germanic, East Germanic, and North Germanic.	Old English, Old Norse, Gothic, Old High German, Old Saxon, Vandalic, Burgundian, Crimean Gothic, Norn, Greenlandic Norse	English, German, Afrikaans, Dutch, Yiddish, Norwegian, Danish, Swedish, Frisian, Icelandic, Faroese, Luxembourgish, Scots, Limburgish, Wymysorys, Elfdalian
Balto-Slavic	Proto-Balto-Slavic	Branched into the Baltic languages and the Slavic languages.	Old Prussian, Old Church Slavonic, Sudovian, Semigallian, Selonian, Skalvian, Galindian, Polabian, Knaanic	Baltic: Latvian, Latgalian and Lithuanian; Slavic: Russian, Ukrainian, Belarusian, Polish, Czech, Slovak, Sorbian, Serbo-Croatian, Bulgarian, Slovenian, Macedonian, Kashubian, Rusyn
Indo-Iranian	Proto-Indo-Iranian	Branched into the Indo-Aryan, Iranian and Nuristani languages.	Vedic Sanskrit, Pali, Prakrit languages; Old Persian, Parthian, Old Azeri, Median, Elu, Sogdian, Saka, Avestan, Bactrian	Indo-Aryan: Hindustani, Marathi, Sylheti, Bengali, Assamese, Odia, Konkani, Gujarati, Nepali, Dogri, Romani, Sindhi, Maithili, Sinhala, Dhivehi, Punjabi, Kashmiri, Sanskrit ; Iranian: Persian, Pashto, Balochi, Kurdish, Zaza, Ossetian, Luri, Talyshi, Tati, Gilaki, Mazandarani, Semnani, Yaghnobi; Nuristani: Katë, Prasun, Ashkun, Nuristani Kalasha, Tregami, Zemiaki
Armenian	Proto-Armenian	Armenian is the only surviving representative of the Armenian branch of the Indo-European language family.	Classical Armenian	Armenian
Hellenic	Proto-Greek	Modern Greek and Tsakonian are the only surviving varieties of Greek.	Ancient Greek, [Ancient Macedonian language\|Macedonian language\|Ancient Macedonian]	Greek, Tsakonian
Albanian	Proto-Albanian	Albanian is the only surviving representative of the Albanoid branch of the Indo-European language family.	Illyrian ; Daco-Thracian	Albanian

Commonly proposed subgroups of Indo-European languages include Italo-Celtic, Graeco-Aryan, Graeco-Armenian, Graeco-Phrygian, Daco-Thracian, and Thraco-Illyrian.
There are numerous lexical similarities between the Proto-Indo-European and Proto-Kartvelian languages due to early language contact, as well as some morphological similarities—notably the Indo-European ablaut, which is remarkably similar to the root ablaut system reconstructible for Proto-Kartvelian.

Marginally attested languages

The Lusitanian language was a marginally attested language spoken in areas near the border between present-day Portugal and Spain. The Venetic and Liburnian languages known from the North Adriatic region are sometimes classified as Italic.
Albanian and Greek are the only surviving Indo-European descendants of a Paleo-Balkan language area, named for their occurrence in or in the vicinity of the Balkan peninsula. Most of the other languages of this area—including Illyrian, Thracian, and Dacian—do not appear to be members of any other subfamilies of PIE, but are so poorly attested that proper classification of them is not possible. Forming an exception, Phrygian is sufficiently well-attested to allow proposals of a particularly close affiliation with Greek, and a Graeco-Phrygian branch of Indo-European is becoming increasingly accepted.

Phonology

Proto-Indo-European phonology has been reconstructed in some detail. Notable features of the most widely accepted reconstruction include:

three series of stop consonants reconstructed as voiceless, voiced, and breathy voiced;
sonorant consonants that could be used syllabically;
three so-called laryngeal consonants, whose exact pronunciation is not well-established but which are believed to have existed in part based on their detectable effects on adjacent sounds;
the fricative
a vowel system in which and were the most frequently occurring vowels. The existence of as a separate phoneme is debated.

Notation

Vowels

The vowels in commonly used notation are:

Type	front	back
Mid	e, ē	o, ō
Low		-

Consonants

The corresponding consonants in commonly used notation are:
All sonorants can appear in syllabic position. The syllabic allophones of *y and *w are realized as the surface vowels *i and *u respectively.
is an allophone of s when next to a voiced consonant.
is an allophone of n before velar consonants.

Accent

The Proto-Indo-European accent is reconstructed today as having had variable lexical stress, which could appear on any syllable and whose position often varied among different members of a paradigm. Stressed syllables received a higher pitch and it is often said that PIE had a pitch accent. The location of the stress is associated with ablaut variations, especially between full-grade vowels and zero-grade, but not entirely predictable from it.
The accent is best preserved in Vedic Sanskrit and Ancient Greek, and indirectly attested in a number of phenomena in other IE languages, such as Verner's Law in the Germanic branch. Sources for Indo-European accentuation are also the Balto-Slavic accentual system and plene spelling in Hittite cuneiform. To account for mismatches between the accent of Vedic Sanskrit and Ancient Greek, as well as a few other phenomena, a few historical linguists prefer to reconstruct PIE as a tone language where each morpheme had an inherent tone; the sequence of tones in a word then evolved, according to that hypothesis, into the placement of lexical stress in different ways in different IE branches.

Morphology

Proto-Indo-European, like its earliest attested descendants, was a highly inflected, fusional language. Suffixation and ablaut were the main methods of marking inflection, both for nominals and verbs. The subject of a sentence was in the nominative case and agreed in number and person with the verb, which was additionally marked for voice, tense, aspect, and mood.

Root

Proto-Indo-European nominals and verbs were primarily composed of roots – affix-lacking morphemes that carried the core lexical meaning of a word. They were used to derive related words. As a rule, roots were monosyllabic, and had the structure CVC, where the symbols C stand for consonants, V stands for a variable vowel, and optional components are in parentheses. All roots ended in a consonant and, although less certain, they appear to have started with a consonant as well.
A root plus a suffix formed a word stem, and a word stem plus an inflectional ending formed a word. Proto-Indo-European was a fusional language, in which inflectional morphemes signaled the grammatical relationships between words. This dependence on inflectional morphemes means that roots in PIE, unlike those in English, were rarely used without affixes.

Ablaut

Many morphemes in Proto-Indo-European had short e as their inherent vowel; the Indo-European ablaut is the change of this short e to short o, long e, long o, or no vowel. The forms are referred to as the "ablaut grades" of the morpheme—the e-grade, o-grade, zero-grade, etc. This variation in vowels occurred both within inflectional morphology and derivational morphology.
Categories that PIE distinguished through ablaut were often also identifiable by contrasting endings, but the loss of these endings in some later Indo-European languages has led them to use ablaut alone to identify grammatical categories, as in the Modern English words sing, sang, sung.

Noun

Proto-Indo-European nouns were probably declined for eight or nine cases:

nominative: marks the subject of a verb. Words that follow a linking verb and restate the subject of that verb also use the nominative case. The nominative is the dictionary form of the noun.
accusative: used for the direct object of a transitive verb.
genitive: marks a noun as modifying another noun.
dative: used to indicate the indirect object of a transitive verb, such as Jacob in Maria gave Jacob a drink.
instrumental: marks the instrument or means by, or with, which the subject achieves or accomplishes an action. It may be either a physical object or an abstract concept.
ablative: used to express motion away from something.
locative: expresses location, corresponding vaguely to the English prepositions in, on, at, and by.
vocative: used for a word that identifies an addressee. A vocative is a noun of address where the identity of the party spoken to is set forth expressly within a sentence. For example, in the sentence, "I don't know, John", John is a noun of address, indicating the party being addressed.
allative: used as a type of locative case that expresses movement towards something. It was preserved in Anatolian, and fossilized traces of it have been found in Greek. It is also present in Tocharian. Its PIE shape is uncertain, with candidates including *-h₂, *-h₂, or *-a.

Late Proto-Indo-European had three grammatical genders:

masculine
feminine
neuter

This system is probably derived from an older two-gender system, attested in Anatolian languages: common and neuter gender. The feminine gender only arose in the later period of the language. Neuter nouns collapsed the nominative, vocative and accusative into a single form, the plural of which used a special collective suffix . This same collective suffix in extended forms and became used to form feminine nouns from masculines.
All nominals distinguished three numbers:

singular
dual
plural

These numbers were also distinguished in verbs, requiring agreement with their subject nominal.

Pronoun

Proto-Indo-European pronouns are difficult to reconstruct, owing to their variety in later languages. PIE had personal pronouns in the first and second grammatical person, but not the third person, where demonstrative pronouns were used instead. The personal pronouns had their own unique forms and endings, and some had two distinct stems; this is most obvious in the first person singular where the two stems are still preserved in English I and me. There were also two varieties for the accusative, genitive and dative cases, a stressed and an enclitic form.

Verb

Proto-Indo-European verbs, like the nouns, exhibited an ablaut system.
The most basic categorisation for the reconstructed Indo-European verb is grammatical aspect. Verbs are classed as:

stative: verbs that depict a state of being
imperfective: verbs depicting ongoing, habitual or repeated action
perfective: verbs depicting a completed action or actions viewed as an entire process.

Verbs have at least four grammatical moods:

indicative: indicates that something is a statement of fact; in other words, to express what the speaker considers to be a known state of affairs, as in declarative sentences.
imperative: forms commands or requests, including the giving of prohibition or permission, or any other kind of advice or exhortation.
subjunctive: used to express various states of unreality such as wish, emotion, possibility, judgment, opinion, obligation, or action that has not yet occurred
optative: indicates a wish or hope. It is similar to the cohortative mood and is closely related to the subjunctive mood.

Verbs had two grammatical voices:

active: used in a clause whose subject expresses the main verb's agent.
mediopassive: for the middle voice and the passive voice.

Verbs had three grammatical persons: first, second and third.
Verbs had three grammatical numbers:

singular
dual: referring to precisely two of the entities identified by the noun or pronoun.
plural: a number other than singular or dual.

Verbs were probably marked by a highly developed system of participles, one for each combination of tense and voice, and an assorted array of verbal nouns and adjectival formations.
The following table shows a possible reconstruction of the PIE verb endings from Sihler, which largely represents the current consensus among Indo-Europeanists.

Numbers

Proto-Indo-European numerals are generally reconstructed as follows:

Number	Sihler
one	óynos/óywos/óykos; sḗm, sm̥-
two	dwóh₁, dwi-
three	tréyes, tri-
four	kʷetwóres, kʷtwr̥-
five	pénkʷe
six	séḱs; originally perhaps wéḱs, with *s- under the influence of septḿ̥
seven	septḿ̥
eight	oḱtṓ or h₃eḱtṓ
nine	h₁néwn̥
ten	déḱm̥

Rather than specifically 100, ḱm̥tóm may originally have meant "a large number".

Particle

Proto-Indo-European particles were probably used both as adverbs and as postpositions. These postpositions became prepositions in most daughter languages.
Reconstructed particles include for example, upo "under, below"; the negators ne, mē; the conjunctions kʷe "and", wē "or" and others; and an interjection, wai!, expressing woe or agony.

Derivational morphology

Proto-Indo-European employed various means of deriving words from other words, or directly from verb roots.

Internal derivation

Internal derivation was a process that derived new words through changes in accent and ablaut alone. It was not as productive as external derivation, but is firmly established by the evidence of various later languages.

Possessive adjectives

Possessive or associated adjectives were probably created from nouns through internal derivation. Such words could be used directly as adjectives, or they could be turned back into a noun without any change in morphology, indicating someone or something characterised by the adjective. They were probably also used as the second elements in compounds. If the first element was a noun, this created an adjective that resembled a present participle in meaning, e.g. "having much rice" or "cutting trees". When turned back into nouns, such compounds were Bahuvrihis or semantically resembled agent nouns.
In thematic stems, creating a possessive adjective seems to have involved shifting the accent one syllable to the right, for example:

*tómh₁-o-s "slice" > *tomh₁-ó-s "cutting" > *dr-u-tomh₁-ó-s "cutting trees".
*wólh₁-o-s "wish" > *wolh₁-ó-s "having wishes".

In athematic stems, there was a change in the accent/ablaut class. The reconstructed four classes followed an ordering in which a derivation would shift the class one to the right:
The reason for this particular ordering of the classes in derivation is not known. Some examples:

Acrostatic *krót-u-s ~ *krét-u-s "strength".
Hysterokinetic *ph₂-tḗr ~ *ph₂-tr-és "father" > amphikinetic *h₁su-péh₂-tōr ~ *h₁su-ph₂-tr-és "having a good father".
Vṛddhi

A vṛddhi derivation, named after the Sanskrit grammatical term, signifying "of, belonging to, descended from". It was characterised by "upgrading" the root grade, from zero to full or from full to lengthened. When upgrading from zero to full grade, the vowel could sometimes be inserted in an unexpected location, creating a different stem from the original full grade.
Examples:

full grade *swéḱuro-s "father-in-law" > lengthened grade *swēḱuró-s "relating to one's father-in-law".
full grade *dyḗw-s > zero grade *diw-és "sky" > new full grade *deyw-o-s "god, sky god". Note the difference in vowel placement, *dyew- in the full-grade stem of the original noun, but in the vṛddhi derivative.
Nominalization

Adjectives with accent on the thematic vowel could be turned into nouns by moving the accent back onto the root. A zero grade root could remain so, or be "upgraded" to full grade like in a vṛddhi derivative. Some examples:

PIE *ǵn̥h₁-tó-s > *ǵénh₁-to-.
Ancient Greek λευκός > λεῦκος .
Sanskrit कृ॒ष्ण > कृष्ण॑स् .

This kind of derivation is likely related to the possessive adjectives, and can be seen as essentially the reverse of it.

Syntax

The syntax of the older Indo-European languages has been studied in earnest since at least the late nineteenth century, by such scholars as Hermann Hirt and Berthold Delbrück. In the second half of the twentieth century, interest in the topic increased and led to reconstructions of Proto-Indo-European syntax.
Since all the early attested IE languages were inflectional, PIE is thought to have relied primarily on morphological markers, rather than word order, to signal syntactic relationships within sentences. Still, a default word order is thought to have existed in PIE. In 1892, Jacob Wackernagel reconstructed PIE's word order as subject–verb–object, based on evidence in Vedic Sanskrit.
Winfred P. Lehmann, on the other hand, reconstructs PIE as a subject–object–verb language. He posits that the presence of person marking in PIE verbs motivated a shift from OV to VO order in later dialects. Many of the descendant languages have VO order: modern Greek, Romance and Albanian prefer SVO, Insular Celtic has VSO as the default order, and even the Anatolian languages show some signs of this word order shift. Tocharian and Indo-Iranian, meanwhile, retained the conservative OV order. Lehmann attributes the context-dependent order preferences in Baltic, Slavic and Germanic to outside influences. Donald Ringe, however, attributes these to internal developments instead.
Paul Friedrich disagrees with Lehmann's analysis. He reconstructs PIE with the following syntax:

basic SVO word order
adjectives before nouns
head nouns before genitives
prepositions rather than postpositions
no dominant order in comparative constructions
main clauses before relative clauses

Friedrich notes that even among those Indo-European languages with basic OV word order, none of them are rigidly OV. He also notes that these non-rigid OV languages mainly occur in parts of the IE area that overlap with OV languages from other families, whereas VO is predominant in the central parts of the IE area. For these reasons, among others, he argues for a VO common ancestor.
Hans Henrich Hock reports that the SVO hypothesis still has some adherents, but the "broad consensus" among PIE scholars is that PIE would have been an SOV language. The SOV default word order with other orders used to express emphasis is attested in Old Indo-Aryan, Old Iranian, Old Latin and Hittite, while traces of it can be found in the enclitic personal pronouns of the Tocharian languages.