Albanian language

Albanian is an Indo-European language spoken by the Albanians in the Balkans and the Albanian diaspora in the Americas, Europe and Oceania. With about 7.5 million speakers, it comprises an independent branch within the Indo-European languages and is not closely related to any other language.
First attested in the 15th century, it is the last Indo-European branch to appear in written records. This is one of the reasons why its still-unknown origin has long been a matter of dispute among linguists and historians. Albanian is considered to be the descendant of one of the Paleo-Balkan languages of antiquity. For more historical and geographical reasons than specifically linguistic ones, there are various modern historians and linguists who believe that the Albanian language may have descended from a southern Illyrian dialect spoken in much the same region in classical times. Alternative hypotheses hold that Albanian may have descended from Thracian or Daco-Moesian, other ancient languages spoken farther east than Illyrian.
Not enough is known of these languages to completely prove or disprove the various hypotheses.
The two main Albanian dialects, Gheg and Tosk, are primarily distinguished by phonological differences and are mutually intelligible, with Gheg spoken to the north and Tosk spoken to the south of the Shkumbin river. Their characteristics in the treatment of both native words and loanwords indicate the dialectal split occurred after Christianisation of the region and at the time of the Slavic migration to the Balkans, with the historic boundary between Gheg and Tosk being the Shkumbin which straddled the Jireček line. Standard Albanian is a standardised form of spoken Albanian based on the Tosk dialect. It is the official language of Albania and Kosovo and a co-official language in North Macedonia as well as a minority language of Italy, Montenegro, Romania and Serbia.
Centuries-old communities speaking Albanian dialects can be found scattered in Croatia, Greece, Italy as well as in Romania, Turkey and Ukraine. Two varieties of the Tosk dialect, Arvanitika in Greece and Arbëresh in Southern Italy, have preserved archaic elements of the language.

Geographic distribution

The language is spoken by approximately 6 million people in the Balkans, primarily in Albania, Kosovo, North Macedonia, Serbia, Montenegro and Greece. However, due to old communities in Italy and the large Albanian diaspora, the worldwide total of speakers is much higher than in Southern Europe and numbers approximately 7.5 million.


The Albanian language is the official language of Albania and Kosovo and co-official in North Macedonia. Albanian is a recognised minority language in Croatia, Italy, Montenegro, Romania and in Serbia. Albanian is also spoken by a minority in Greece, specifically in the Thesprotia and Preveza regional units and in a few villages in Ioannina and Florina regional units in Greece. It is also spoken by 450,000 Albanian immigrants in Greece.
Albanian is the third most spoken language in Italy. This is due to a substantial Albanian immigration to Italy. Italy has a historical Albanian minority of about 500,000, scattered across Southern Italy, known as Arbëreshë. Approximately 1 million Albanians from Kosovo are dispersed throughout Germany, Switzerland and Austria. These are mainly immigrants from Kosovo who migrated during the 1990s. In Switzerland, the Albanian language is the sixth most spoken language with 176,293 native speakers.
Albanian became an official language in North Macedonia on 15 January 2019.


There are large numbers of Albanian speakers in the United States, Argentina, Chile, Uruguay and Canada. Some of the first ethnic Albanians to arrive in the United States were Arbëreshë. Arbëreshe have a strong sense of identity and are unique in that they speak an archaic dialect of Tosk Albanian called Arbëreshë.
In North America, there are approximately 250,000 Albanian speakers. It is spoken in the eastern area of the United States in cities like New York City, New Jersey, Boston, Chicago, Philadelphia and Detroit, as well as in parts of the states of Ohio and Connecticut. Greater New Orleans has a large Arbëresh community. Oftentimes, wherever there are Italians, there are a few Arbëreshe mixed with them. Arbëreshe Americans, therefore are often indistinguishable from Italian Americans due to being assimilated into the Italian American community.
In Argentina, there are nearly 40,000 Albanian speakers, mostly in Buenos Aires.

Asia and Oceania

Approximately 1.3 million people of Albanian ancestry live in Turkey, and more than 500,000 recognizing their ancestry, language and culture. There are other estimates, however, that place the number of people in Turkey with Albanian ancestry and or background upward to 5 million. However, the vast majority of this population is assimilated and no longer possesses fluency in the Albanian language, though a vibrant Albanian community maintains its distinct identity in Istanbul to this day.
In Egypt there are around 18,000 Albanians, mostly Tosk speakers. Many are descendants of the Janissary of Muhammad Ali Pasha, an Albanian who became Wāli, and self-declared Khedive of Egypt and Sudan. In addition to the dynasty that he established, a large part of the former Egyptian and Sudanese aristocracy was of Albanian origin. In addition to the recent emigrants, there are older diasporic communities around the world.
Albanian is also spoken by Albanian diaspora communities residing in Australia and New Zealand.


The Albanian language has two distinct dialects, Tosk which is spoken in the south, and Gheg spoken in the north. Standard Albanian is based on the Tosk dialect. The Shkumbin River is the rough dividing line between the two dialects.
Gheg is divided into four sub-dialects, in Northwest Gheg, Northeast Gheg, Central Gheg, and Southern Gheg. It is primarily spoken in northern Albania and throughout Montenegro, Kosovo and northwestern North Macedonia. One fairly divergent dialect is the Upper Reka dialect, which is however classified as Central Gheg. There is also a diaspora dialect in Croatia, the Arbanasi dialect.
Tosk is divided into five sub-dialects, including Northern Tosk, Labërisht, Çam, Arvanitika, and Arbëresh. Tosk is spoken in southern Albania, southwestern North Macedonia and northern and southern Greece. Cham Albanian is spoken in North-western Greece, while Arvanitika is spoken by the Arvanites in southern Greece. In addition, Arbëresh is spoken by the Arbëreshë people, descendants of 15th and 16th century migrants who settled in southeastern Italy, in small communities in the regions of Sicily and Calabria.


The Albanian language has been written using many different alphabets since the earliest records from the 14th century. The history of Albanian language orthography is closely related to the cultural orientation and knowledge of certain foreign languages among Albanian writers. The earliest written Albanian records come from the Gheg area in makeshift spellings based on Italian or Greek. Originally, the Tosk dialect was written in the Greek alphabet and the Gheg dialect was written in the Latin script. Both dialects had also been written in the Ottoman Turkish version of the Arabic script, Cyrillic, and some local alphabets. More specifically, the writers from northern Albania and under the influence of the Catholic Church used Latin letters, those in southern Albania and under the influence of the Greek Orthodox church used Greek letters, while others throughout Albania and under the influence of Islam used Arabic letters. There were initial attempts to create an original Albanian alphabet during the 1750–1850 period. These attempts intensified after the League of Prizren and culminated with the Congress of Manastir held by Albanian intellectuals from 14 to 22 November 1908, in Manastir, which decided on which alphabet to use, and what the standardized spelling would be for standard Albanian. This is how the literary language remains. The alphabet is the Latin alphabet with the addition of the letters <ë>, <ç>, and ten digraphs: dh, th, xh, gj, nj, ng, ll, rr, zh and sh.
According to Robert Elsie:
The hundred years between 1750 and 1850 were an age of astounding orthographic diversity in Albania. In this period, the Albanian language was put to writing in at least ten different alphabets – most certainly a record for European languages.... the diverse forms in which this old Balkan language was recorded, from the earliest documents to the beginning of the twentieth century... consist of adaptations of the Latin, Greek, Arabic, and Cyrillic alphabets and a number of locally invented writing systems. Most of the latter alphabets have now been forgotten and are unknown, even to the Albanians themselves.


The Albanian language occupies an independent branch of the Indo-European language tree. In 1854, Albanian was demonstrated to be an Indo-European language by the philologist Franz Bopp. Albanian was formerly compared by a few Indo-European linguists with Germanic and Balto-Slavic, all of which share a number of isoglosses with Albanian. Other linguists linked the Albanian language with Latin, Greek and Armenian, while placing Germanic and Balto-Slavic in another branch of Indo-European.


The first written mention of the Albanian language was on 14 July 1284 in Dubrovnik in modern Croatia when a crime witness named Matthew testified: "I heard a voice shouting on the mountainside in the Albanian language". The oldest document written in Albanian dates back to 1462, while the first audio recording in the language was made by Norbert Jokl on 4 April 1914 in Vienna.
During the five-century period of the Ottoman presence in Albania, the language was not officially recognized until 1909, when the Congress of Dibra decided that Albanian schools would finally be allowed.

Linguistic affinities

Albanian is considered an isolate within the Indo-European language family; no other language has been conclusively linked to its branch. The only other languages that are sole surviving members of a branch of Indo-European are Armenian and Greek.
The Albanian language is part of the Indo-European language group and is considered to have evolved from one of the Paleo-Balkan languages of antiquity,
although it is still uncertain which particular Paleo-Balkan language represents the ancestor of Albanian, or where in Southern Europe that population lived. In general there is insufficient evidence to connect Albanian with one of those languages, whether one of the Illyrian languages, or Thracian and Dacian. Among these possibilities, Illyrian is typically held to be the most probable, though insufficient evidence still clouds the discussion.
Although Albanian shares lexical isoglosses with Greek, Germanic, and to a lesser extent Balto-Slavic, the vocabulary of Albanian is quite distinct. In 1995, Taylor, Ringe and Warnow, using quantitative linguistic techniques, found that Albanian appears to comprise a "subgroup with Germanic". However, they argued that this fact is hardly significant, as Albanian has lost much of its original vocabulary and morphology, and so this "apparently close connection to Germanic rests on only a couple of lexical cognates – hardly any evidence at all".

Historical presence and location

The place and the time where the Albanian language was formed is uncertain. American linguist Eric Hamp stated that during an unknown chronological period a pre-Albanian population inhabited areas stretching from Poland to the southwestern Balkans. Further analysis has suggested that it was in a mountainous region rather than on a plain or seacoast: while the words for plants and animals characteristic of mountainous regions are entirely original, the names for fish and for agricultural activities are borrowed from other languages.
A deeper analysis of the vocabulary, however, shows that this could be a consequence of a prolonged Latin domination of the coastal and plain areas of the country, rather than evidence of the original environment where the Albanian language was formed. For example, the word for 'fish' is borrowed from Latin, but not the word for 'gills', which is native. Indigenous are also the words for 'ship', 'raft', 'navigation', 'sea shelves' and a few names of fish kinds, but not the words for 'sail', 'row' and 'harbor' – objects pertaining to navigation itself and a large part of sea fauna. This rather shows that Proto-Albanians were pushed away from coastal areas in early times thus losing large parts of sea environment lexicon. A similar phenomenon could be observed with agricultural terms. While the words for 'arable land', 'corn', 'wheat', 'cereals', 'vineyard', 'yoke', 'harvesting', 'cattle breeding', etc. are native, the words for 'ploughing', 'farm' and 'farmer', agricultural practices, and some harvesting tools are foreign. This, again, points to intense contact with other languages and people, rather than providing evidence of a possible Urheimat.
The centre of Albanian settlement remained the Mat river. In 1079, they were recorded farther south in the valley of the Shkumbin river. The Shkumbin, a seasonal stream that lies near the old Via Egnatia, is approximately the boundary of the primary dialect division for Albanian, Tosk and Gheg. The characteristics of Tosk and Gheg in the treatment of the native and loanwords from other languages are evidence that the dialectal split preceded the Slavic migration to the Balkans, which means that in that period, Albanians were occupying nearly the same area around the Shkumbin river, which straddled the Jireček Line.
References to the existence of Albanian as a distinct language survive from the 14th century, but they failed to cite specific words. The oldest surviving documents written in Albanian are the "formula e pagëzimit", Un'te paghesont' pr'emenit t'Atit e t'Birit e t'Spertit Senit. recorded by Pal Engjelli, Bishop of Durrës in 1462 in the Gheg dialect, and some New Testament verses from that period.
Linguists Stefan Schumacher and Joachim Matzinger assert that the first literary records of Albanian date from the 16th century. The oldest known Albanian printed book, Meshari, or "missal", was written in 1555 by Gjon Buzuku, a Roman Catholic cleric. In 1635 Frang Bardhi wrote the first Latin–Albanian dictionary. The first Albanian school is believed to have been opened by Franciscans in 1638 in Pdhanë.
One of the earliest dictionaries of Albanian language was written in 1693 which was an Italian language manuscript authored by Montenegrin sea captain Julije Balović Pratichae Schrivaneschae and includes a multilingual dictionary of hundreds of the most often used words in everyday life in the Italian, Slavo-Illirico, Greek, Albanian and Turkish languages.

Pre-Indo-European substratum

Pre-Indo-European sites are found throughout the territory of Albania. Such PIE sites existed in Maliq, Vashtëm, Burimas, Barç, Dërsnik in Korçë District, Kamnik in Kolonja, Kolsh in Kukës District, Rashtan in Librazhd and Nezir in Mat District. As in other parts of Europe, these PIE people joined the migratory Indo-European tribes that entered the Balkans and contributed to the formation of the historical Paleo-Balkan tribes. In terms of linguistics, the pre-Indo-European substrate language spoken in the southern Balkans has probably influenced pre-Proto-Albanian, the ancestor idiom of Albanian. The extent of this linguistic impact cannot be determined with precision due to the uncertain position of Albanian among Paleo-Balkan languages and their scarce attestation. Some loanwords, however, have been proposed such as shegë or lëpjetë.

Proto-IE features

Although Albanian has several words that do not correspond to IE cognates, it has retained many proto-IE features: for example, the demonstrative pronoun *ḱi- is ancestral to Albanian ky/kjo, English he, and Russian sej but not to English this or Russian etot.
Albanian is compared to other Indo-European languages below, but note that Albanian has exhibited some notable instances of semantic drift o-*méh₂tēr*swésōr*nókʷts*neh₂-s-*treies*kʷr̥snós
*mel-n-*h₁reudʰ-ó- ~
h₁roudʰ-ó-*ǵʰelh₃-*bʰléh₁-uo-*wĺ̥kʷosEnglishmonthnewmothersisternightnosethreeblackredyellowbluewolfLatinmēnsisnovusmātersorornoct-nāsustrēsāter, nigerruberhelvusflāvuslupusLithuanianmė́nuo / mėnesisnaũjasmotė / motinasesuõnaktìsnósistrỹsjúodasraűdas / raudonasgel̃tas / geltonasmė́lynasvil̃kasOld Church Slavonic
tri, trije
vlьkъAncient Greekμην-

Albanian–PIE phonological correspondences

Phonologically, Albanian is not so conservative. Like many IE stocks, it has merged the two series of voiced stops. In addition, voiced stops tend to disappear in between vowels. There is almost complete loss of final syllables and very widespread loss of other unstressed syllables. PIE *o appears as a, while *ē and *ā become o, and PIE *ō appears as e.
The palatals, velars, and labiovelars show distinct developments, with Albanian showing the three-way distinction also found in Luwian. Labiovelars are for the most part differentiated from all other Indo-European velar series before front vowels, but they merge with the "pure" velars elsewhere. The palatal velar series, consisting of Proto-Indo-European * and the merged *ģ and ģʰ, usually developed into th and dh, but were depalatalized to merge with the back velars when in contact with sonorants. Because the original Proto-Indo-European tripartite distinction between dorsals is preserved in such reflexes, Albanian is therefore neither centum nor satem, despite having a "satem-like" realization of the palatal dorsals in most cases. Thus PIE *, *k, and * become th, q, and s, respectively.
A minority of scholars reconstruct a fourth laryngeal *h₄ allegedly surfacing as Alb. h word-initially, e.g. Alb. herdhe 'testicles' presumably from PIE *h₄órǵʰi-, but this is generally not followed elsewhere, as h- as arisen elsewhere idiosyncratically.
*pp*pékʷ- 'to cook'pjek 'to bake'
*bʰ / bb*srobʰ-éi̯e- 'to sip, gulp'gjerb 'to sip'

*tt*túh₂ 'thou'ti 'you '
*dd*dih₂tis 'light'ditë 'day'
dh*pérd- 'to fart'pjerdh 'to fart'
g*dl̥h₁-tó- 'long'gjatë 'long'
*dʰd*égʷʰ- 'burn'djeg 'to burn'
dh*gʰórdʰos 'enclosure'gardh 'fence'

*ḱth*éh₁smi 'I say'them 'I say'
s*ḱupo- 'shoulder'sup 'shoulder'
k*sme-r̥ 'chin'mjekër 'chin; beard'
ç/c*entro- 'to stick'çandër 'prop'
dhómbʰos 'tooth, peg'dhëmb 'tooth'
*ǵʰdh*ǵʰed-ioH 'I defecate'dhjes 'I defecate'
d*ǵʰr̥sdʰi 'grain, barley'drithë 'grain'

*kk*kágʰmi 'I catch, grasp'kam 'I have'
q*kluH-i̯o- 'to weep'qaj 'to weep, cry'
*gg*h₃lígos 'sick'ligë 'bad'
gj*h₁reug- 'to retch'regj 'to tan hides'
*gʰg*gʰórdʰos 'enclosure'gardh 'fence'
gj*édn-i̯e/o- 'to get'gjej 'to find'

*kʷk*kʷeh₂sleh₂ 'cough'kollë 'cough'
s*élH- 'to turn'sjell 'to fetch, bring'
q*kʷṓdqë 'that, which'
*gʷg*r̥H 'stone'gur 'stone'
z*gʷréh₂us 'heavy'zor 'hard, difficult'
*gʷʰg*dʰégʷʰ- 'to burn'djeg 'to burn'
z*dʰogʷʰéi̯e- 'to ignite'ndez 'to kindle, light a fire'

*sgj*séḱstis 'six'gjashtë 'six'
h*nosōm 'us' nahe 'us'
sh*bʰreusos 'broken'breshër 'hail'
th*suh₁s 'swine'thi 'pig'
h₁ésmi 'I am'jam 'I am'
*-sd-th*gʷésdos 'leaf'gjeth 'leaf'
*-sḱ-h*sḱi-eh₂ 'shadow'hije 'shadow'
*-sp-f*spélnom 'speech'fjalë 'word'
*-st-sht*h₂osti 'bone'asht 'bone'
*-su̯-d*su̯eíd-r̥- 'sweat'dirsë 'sweat'

*i̯gj*i̯éh₃s- 'to gird'gjesh 'I gird; squeeze, knead'
j*uH 'you' ju 'you '
*trei̯es 'three' tre 'three'
*u̯v*u̯os-éi̯e- 'to dress'vesh 'to wear, dress'
*mm*meh₂tr-eh₂ 'maternal'motër 'sister'
*nn*nōs 'we' ne 'we'
nj*eni-h₁ói-no 'that one'një 'one'
∅ ~ nasal vowel *pénkʷe 'five'pe 'five'
r *ǵʰeimen 'winter'dimër 'winter'
*ll*h₃lígos 'sick'ligë 'bad'
ll*kʷélH- 'turn'sjell 'to fetch, bring'
*rr*repe/o 'take'rjep 'peel'
rr*u̯rh₁ḗn 'sheep'rrunjë 'yearling lamb'
*n̥e*h₁n̥men 'name'emër 'name'
*m̥e*u̯iḱti 'twenty'zet 'twenty'
*l̥li, il / lu, ul*u̯ĺ̥kʷos 'wolf'ujk 'wolf'
*r̥ri, ir / ru, ur*ǵʰsdom 'grain, barley'drithë 'grain'

*h₁*h₁ésmi 'I am'jam 'to be'
*h₂*h₂r̥tḱos 'bear'ari 'bear'
*h₃*h₃ónr̥ 'dream'ëndërr 'dream'

*ii*sínos 'bosom'gji 'bosom, breast'
e*dwigʰeh₂ 'twig'de 'branch'
*ī < *iHi*dih₂tis 'light'di 'day'
*ee*pénkʷe 'five'pe 'five'
je*wétos 'year' vjet 'last year'
o*ǵʰēsreh₂ 'hand'do 'hand'
*aa*bʰaḱeh₂ 'bean'bathë 'bean'
e*h₂élbʰit 'barley'elb 'barley'
*oa*gʰórdʰos 'enclosure'gardh 'fence'
e*h₂oḱtōtis 'eight'te 'eight'
*uu*súpnom 'sleep'gju 'sleep'
*ū < *uHy*suHsos 'grandfather'gjysh 'grandfather'
i*muh₂s 'mouse'mi 'mouse'

Standard Albanian

Since World War II, standard Albanian used in Albania has been based on the Tosk dialect. Kosovo and other areas where Albanian is official adopted the Tosk standard in 1969.

Elbasan-based standard

Until the early 20th century, Albanian writing developed in three main literary traditions: Gheg, Tosk, and Arbëreshë. Throughout this time, an intermediate subdialect spoken around Elbasan served as lingua franca among the Albanians, but was less prevalent in writing. The Congress of Manastir of Albanian writers held in 1908 recommended the use of the Elbasan subdialect for literary purposes and as a basis of a unified national language. While technically classified as a southern Gheg variety, the Elbasan speech is closer to Tosk in phonology and practically a hybrid between other Gheg subdialects and literary Tosk.
Between 1916 and 1918, the Albanian Literary Commission met in Shkodër under the leadership of Luigj Gurakuqi with the purpose of establishing a unified orthography for the language. The commission, made up of representatives from the north and south of Albania, reaffirmed the Elbasan subdialect as the basis of a national tongue. The rules published in 1917 defined spelling for the Elbasan variety for official purposes. The Commission did not, however, discourage publications in one of the dialects, but rather laid a foundation for Gheg and Tosk to gradually converge into one.
When the Congress of Lushnje met in the aftermath of World War I to form a new Albanian government, the 1917 decisions of the Literary Commission were upheld. The Elbasan subdialect remained in use for administrative purposes and many new writers embraced for creative writing. Gheg and Tosk continued to develop freely and interaction between the two dialects increased.

Tosk standard

At the end of World War II, however, the new communist regime radically imposed the use of the Tosk dialect in all facets of life in Albania: administration, education, and literature. Most Communist leaders were Tosks from the south. Standardization was directed by the Albanian Institute of Linguistics and Literature of the Academy of Sciences of Albania. Two dictionaries were published in 1954: an Albanian language dictionary and a Russian–Albanian dictionary. New orthography rules were eventually published in 1967 and 1973 Drejtshkrimi i gjuhës shqipe.
Until 1968, Kosovo and other Albanian-speaking areas in the former Yugoslavia followed the 1917 standard based on the Elbasan dialect, though it was gradually infused with Gheg elements in an effort to develop a Kosovan language separate from communist Albania's Tosk-based standard. Albanian intellectuals in the former Yugoslavia consolidated the 1917 twice in the 1950s, culminating with a thorough codification of orthographic rules in 1964. The rules already provided for a balanced variety that accounted for both Gheg and Tosk dialects, but only lasted through 1968. Viewing divergences with Albania as a threat to their identity, Kosovars arbitrarily adopted the Tosk project that Tirana had published the year before. Although it was never intended to serve outside of Albania, the project became the "unified literary language" in 1972, when approved by a rubberstamp Orthography Congress. Only about 1 in 9 participants were from Kosovo. The Congress, held at Tirana, authorized the orthography rules that came out the following year, in 1973.
More recent dictionaries from the Albanian government are Fjalori Drejtshkrimor i Gjuhës Shqipe and Dictionary of Today's Albanian language . Prior to World War II, dictionaries consulted by developers of the standard have included Lexikon tis Alvanikis glossis, Fjalori i Bashkimit, and Fjalori i Gazullit.

Calls for reform

Since the fall of the communist regime, Albanian orthography has stirred heated debate among scholars, writers, and public opinion in Albania and Kosovo, with hardliners opposed to any changes in the orthography, moderates supporting varying degrees of reform, and radicals calling for a return to the Elbasan dialect. Criticism of Standard Albanian has centred on the exclusion of the 'me+' infinitive and the Gheg lexicon. Critics say that Standard Albanian disenfranchises and stigmatizes Gheg speakers, affecting the quality of writing and impairing effective public communication. Supporters of the Tosk standard view the 1972 Congress as a milestone achievement in Albanian history and dismiss calls for reform as efforts to "divide the nation" or "create two languages." Moderates, who are especially prevalent in Kosovo, generally stress the need for a unified Albanian language, but believe that the 'me+' infinitive and Gheg words should be included. Proponents of the Elbasan dialect have been vocal, but have gathered little support in the public opinion. In general, those involved in the language debate come from diverse backgrounds and there is no significant correlation between one's political views, geographic origin, and position on Standard Albanian.
Many writers continue to write in the Elbasan dialect but other Gheg variants have found much more limited use in literature. Most publications adhere to a strict policy of not accepting submissions that are not written in Tosk. Some print media even translate direct speech, replacing the 'me+' infinitive with other verb forms and making other changes in grammar and word choice. Even authors who have published in the Elbasan dialect will frequently write in the Tosk standard.
In 2013, a group of academics for Albania and Kosovo proposed minor changes to the orthography. Hardline academics boycotted the initiative, while other reformers have viewed it as well-intentioned but flawed and superficial. Media such as Rrokum and Java have offered content that is almost exclusively in the Elbasan dialect. Meanwhile, author and linguist Agim Morina has promoted Shqipe e Përbashkët or Common Albanian, a neostandard or a reformed version of the Tosk standard that aims at reflecting the natural development of the language among all Albanians. Common Albanian incorporates the 'me+' infinitive, accommodates for Gheg features, provides for dialect-neutral rules that favor simplicity, predictability, and usage trends. Many modern writers have embraced Common Albanian to various extents, especially in less formal writing.


Albanian is the medium of instruction in most Albanian schools. The literacy rate in Albania for the total population, age 9 or older, is about 99%. Elementary education is compulsory, but most students continue at least until a secondary education. Students must pass graduation exams at the end of the 9th grade and at the end of the 12th grade in order to continue their education.


Standard Albanian has 7 vowels and 29 consonants. Like English, Albanian has dental fricatives and , written as th and dh, which are rare cross-linguistically.
Gheg uses long and nasal vowels, which are absent in Tosk, and the mid-central vowel ë is lost at the end of the word. The stress is fixed mainly on the last syllable. Gheg n changes to r by rhotacism in Tosk.


DescriptionWritten asEnglish approximation
Bilabial nasalmman
Alveolar nasalnnot
Palatal nasalnj~onion
Velar nasalngbang
Voiceless bilabial plosivepspin
Voiced bilabial plosivebbat
Voiceless alveolar plosivetstand
Voiced alveolar plosiveddebt
Voiceless velar plosivekscar
Voiced velar plosiveggo
Voiceless alveolar affricatechats
Voiced alveolar affricatexgoods
Voiceless postalveolar affricateçchin
Voiced postalveolar affricatexhjet
Voiceless palatal affricateq~china
Voiced palatal affricategj~gem
Voiceless labiodental fricativeffar
Voiced labiodental fricativevvan
Voiceless dental fricativeththin
Voiced dental fricativedhthen
Voiceless alveolar fricativesson
Voiced alveolar fricativezzip
Voiceless postalveolar fricativeshshow
Voiced postalveolar fricativezhvision
Voiceless glottal fricativehhat
Alveolar trillrrSpanish perro
Alveolar taprSpanish pero
Alveolar lateral approximantllean
Velarized alveolar lateral approximantllball
Palatal approximantjyes

DescriptionWritten asEnglish approximation
Close front unrounded voweliseed
Open-mid front unrounded vowelebed
Open central unrounded vowelacow
Schwaëabout, the
Open-mid back rounded vowelolaw
Close front rounded vowelyFrench tu, German Lüge
Close back rounded voweluboot


Although the Indo-European schwa was preserved in Albanian, in some cases it was lost, possibly when a stressed syllable preceded it. Until the standardization of the modern Albanian alphabet, in which the schwa is spelled as ë, as in the work of Gjon Buzuku in the 16th century, various vowels and gliding vowels were employed, including ae by Lekë Matrënga and é by Pjetër Bogdani in the late 16th and early 17th century. The schwa in Albanian has a great degree of variability from extreme back to extreme front articulation. Within the borders of Albania, the phoneme is pronounced about the same in both the Tosk and the Gheg dialect due to the influence of standard Albanian. However, in the Gheg dialects spoken in the neighbouring Albanian-speaking areas of Kosovo and North Macedonia, the phoneme is still pronounced as back and rounded.


Albanian has a canonical word order of SVO like English and many other Indo-European languages. Albanian nouns are categorized by gender and inflected for number and case. There are five declensions and six cases, although the vocative only occurs with a limited number of words, and the forms of the genitive and dative are identical. Some dialects also retain a locative case, which is not present in standard Albanian. The cases apply to both definite and indefinite nouns, and there are numerous cases of syncretism.
The following shows the declension of mal, a masculine noun which takes "i" in the definite singular:
Indefinite singularIndefinite pluralDefinite singularDefinite plural
Nominativenjë mal male mali malet
Accusativenjë malmalemalinmalet
Genitivei/e/të/së një malii/e/të/së malevei/e/të/së maliti/e/të/së maleve
Dativenjë malimalevemalitmaleve
Ablative një mali malesh malit maleve

The following shows the declension of the masculine noun zog, a masculine noun which takes "u" in the definite singular:
Indefinite singularIndefinite pluralDefinite singularDefinite plural
Nominativenjë zog zogj zogu zogjtë
Accusativenjë zogzogjzogunzogjtë
Genitivei/e/të/së një zogui/e/të/së zogjvei/e/të/së zoguti/e/të/së zogjve
Dativenjë zoguzogjvezogutzogjve
Ablative një zogu zogjsh zogut zogjve

The following table shows the declension of the feminine noun vajzë :
Indefinite singularIndefinite pluralDefinite singularDefinite plural
Nominativenjë vajzë vajza vajza vajzat
Accusativenjë vajzëvajzavajzënvajzat
Genitivei/e/të/së një vajzei/e/të/së vajzavei/e/të/së vajzësi/e/të/së vajzave
Dativenjë vajzevajzavevajzësvajzave
Ablative një vajze vajzash vajzës vajzave

The definite article is placed after the noun as in many other Balkan languages, like in Romanian, Macedonian and Bulgarian.
Albanian has developed an analytical verbal structure in place of the earlier synthetic system, inherited from Proto-Indo-European. Its complex system of moods and tenses is distinctive among Balkan languages. There are two general types of conjugations.
Albanian verbs, like those of other Balkan languages, have an "admirative" mood that is used to indicate surprise on the part of the speaker or to imply that an event is known to the speaker by report and not by direct observation. In some contexts, this mood can be translated using English "apparently".
For more information on verb conjugation and on inflection of other parts of speech, see Albanian morphology.

Word order

Albanian word order is relatively free. To say 'Agim ate all the oranges' in Albanian, one may use any of the following orders, with slight pragmatic differences:
However, the most common order is subject–verb–object.
The verb can optionally occur in sentence-initial position, especially with verbs in the non-active form :
Verbal negation in Albanian is mood-dependent, a trait shared with some fellow Indo-European languages such as Greek.
In indicative, conditional, or admirative sentences, negation is expressed by the particles nuk or s' in front of the verb, for example:
Subjunctive, imperative, optative, or non-finite forms of verbs are negated with the particle mos:

Literary tradition

Earliest undisputed texts

The earliest known texts in Albanian:
Albanian scripts were produced earlier than the first attested document, "formula e pagëzimit", but none yet have been discovered. We know of their existence by earlier references. For example, a French monk signed as "Broccardus" notes, in 1332, that "Although the Albanians have another language totally different from Latin, they still use Latin letters in all their books".

Disputed earlier texts

In 1967 two scholars claimed to have found a brief text in Albanian inserted into the Bellifortis text, a book written in Latin dating to 1402–1405.
Dr. Robert Elsie, a specialist in Albanian studies, considers that "The Todericiu/Polena Romanian translation of the non-Latin lines, although it may offer some clues if the text is indeed Albanian, is fanciful and based, among other things, on a false reading of the manuscript, including the exclusion of a whole line."

Ottoman period

In 1635, Frang Bardhi published in Rome his Dictionarum latinum-epiroticum, the first known Latin-Albanian dictionary. Other scholars who studied the language during the 17th century include Andrea Bogdani, author of the first Latin-Albanian grammar book, Nilo Katalanos and others.


Albanian is known within historical linguistics as a case of a language which, although surviving through many periods of foreign rule and multilingualism, saw a "disproportionately high" influx of loans from other languages augmenting and replacing much of its original vocabulary. Some scholars suggest that Albanian seems to have lost more than 90% of its original vocabulary in favour of Latin, Greek, Slavic, Italian and Turkish loanwords, but according to other scholars this percentage is definitely overstated. Of all the foreign influences in Albanian, the deepest reaching and most impactful was the absorption of loans from Latin in the Classical period and its Romance successors afterward, with over 60% of Albanian vocabulary consisting of Latin roots, causing Albanian to once have been mistakenly identified as a Romance language.
Major work in reconstructing Proto-Albanian has been done with the help of knowledge of the original forms of loans from Ancient Greek, Latin and Slavic, while Ancient Greek loanwords are scarce the Latin loanwords are of extreme importance in phonology. The presence of loanwords from more well-studied languages from time periods before Albanian was attested, reaching deep back into the Classical Era, has been of great use in phonological reconstructions for earlier ancient and medieval forms of Albanian. Some words in the core vocabulary of Albanian have no known etymology linking them to Proto-Indo-European or any known source language, and as of 2018 are thus tentatively attributed to an unknown, unattested, pre-Indo-European substrate language; some words among these include zemër and hekur. Some among these putative pre-IE words are thought to be related to putative pre-IE substrate words in neighboring Indo-European languages, such as lule, which has been tentatively linked to Latin lilia and Greek leirion.
Lexical distance of Albanian in a lexicostatistical analysis of the Ukrainian linguist Tyshchenko : 49% Slovenian, 53% Romanian, 56% Greek, 82% French, 86% Macedonian, 86% Bulgarian.

Cognates with Illyrian

The earliest loanwords attested in Albanian come from Doric Greek, whereas the strongest influence came from Latin. According to Matthew C. Curtis, the loanwords do not necessarily indicate the geographical location of the ancestor of Albanian language. However, according to other linguists, the borrowed words can help to get an idea about the place of origin and the evolution of the Albanian language. Some scholars argue that Albanian originated from an area located east of its present geographic spread due to the several common lexical items found between the Albanian and Romanian languages. However it does not necessarily define the genealogical history of Albanian language, and it does not exclude the possibility of Proto-Albanian presence in both Illyrian and Thracian territory.
The period during which Proto-Albanian and Latin interacted was protracted, lasting from the 2nd century BC to the 5th century AD. Over this period, the lexical borrowings can be roughly divided into three layers, the second of which is the largest. The first and smallest occurred at the time of less significant interaction. The final period, probably preceding the Slavic or Germanic invasions, also has a notably smaller number of borrowings. Each layer is characterized by a different treatment of most vowels: the first layer follows the evolution of Early Proto-Albanian into Albanian; while later layers reflect vowel changes endemic to Late Latin. Other formative changes include the syncretism of several noun case endings, especially in the plural, as well as a large-scale palatalization.
A brief period followed, between the 7th and the 9th centuries, that was marked by heavy borrowings from Southern Slavic, some of which predate the "o-a" shift common to the modern forms of this language group. Starting in the latter 9th century, there was a period characterized by protracted contact with the Proto-Romanians, or Vlachs, though lexical borrowing seems to have been mostly one sided: from Albanian into Romanian. Such borrowing indicates that the Romanians migrated from an area where the majority was Slavic to an area with a majority of Albanian speakers. Their movement is presumably related to the expansion of the Bulgarian Empire into Albania around that time.

Early Greek loans

There are some 30 Ancient Greek loanwords in Albanian. Many of these reflect a dialect which voiced its aspirants, as did the Macedonian dialect. Other loanwords are Doric; these words mainly refer to commodity items and trade goods and probably came through trade with a now-extinct intermediary.
In total Latin roots comprise over 60% of the Albanian lexicon. They include many frequently used core vocabulary items, including shumë, pak, ngushtë, pemë, vij, rërë, drejt, kafshë, and larg.
Jernej Kopitar was the first to note Latin's influence on Albanian and claimed "the Latin loanwords in the Albanian language had the pronunciation of the time of Emperor Augustus". Kopitar gave examples such as Albanian qiqer 'chickpea' from Latin cicer, qytet 'city, town' from civitas, peshk 'fish' from piscis, and shigjetë 'arrow' from sagitta. The hard pronunciations of Latin and are retained as palatal and velar stops in the Albanian loanwords. Gustav Meyer and Wilhelm Meyer-Lübke later corroborated this. Meyer noted the similarity between the Albanian verbs shqipoj "to speak clearly, enunciate" and shqiptoj "to pronounce, articulate" and the Latin word excipio. Therefore, he believed that the word Shqiptar "Albanian person" was derived from shqipoj, which in turn was derived from the Latin word excipere. Johann Georg von Hahn, an Austrian linguist, had proposed the same hypothesis in 1854.
Eqrem Çabej also noticed, among other things, the archaic Latin elements in Albanian:
  1. Latin /au/ becomes Albanian /a/ in the earliest loanwords: aurumar 'gold'; gaudiumgaz 'joy'; lauruslar 'laurel'. Latin /au/ is retained in later loans, but is altered in a way similar to Greek: causa 'thing' → kafshë 'thing; beast, brute'; laudlavd.
  2. Latin /oː/ becomes Albanian /e/ in the oldest Latin loans: pōmuspemë 'fruit tree'; hōraora 'hour'. An analogous mutation occurred from Proto-Indo-European to Albanian; PIE *nōs became Albanian ne 'we', PIE *oḱtō + suffix -ti- became Albanian tetë 'eight', etc.
  3. Latin unstressed internal and initial syllables become lost in Albanian: cubituskub 'elbow'; medicusmjek 'physician'; palūdem 'swamp' → VL padūlepyll 'forest'. An analogous mutation occurred from Proto-Indo-European to Albanian. In contrast, in later Latin loanwords, the internal syllable is retained: paganuspagan; plagaplagë 'wound', etc.
  4. Latin /tj/, /dj/, /kj/ palatalized to Albanian /s/, /z/, /c/: vitiumves 'vice; worries'; rationemarsye 'reason'; radiusrreze 'ray; spoke'; faciesfaqe 'face, cheek'; sociusshok 'mate, comrade', shoq 'husband', etc. In turn, Latin /s/ was altered to /ʃ/ in Albanian.
Haralambie Mihăescu demonstrated that:
Other authors have detected Latin loanwords in Albanian with an ancient sound pattern from the 1st century BC, for example, Albanian qingël 'saddle girth; dwarf elder' from Latin cingula and Albanian e vjetër 'old, aged; former' from vjet but influenced by Latin veteris. The Romance languages inherited these words from Vulgar Latin: cingula became Romanian chinga 'girdle; saddle girth', and Vulgar Latin veterānus became Romanian bătrân 'old'.
Albanian, Basque, and the surviving Celtic languages such as Breton and Welsh are the non-Romance languages today that have this sort of extensive Latin element dating from ancient Roman times, which has undergone the sound changes associated with the languages. Other languages in or near the former Roman area either came on the scene later or borrowed little from Latin despite coexisting with it, although German does have a few such ancient Latin loanwords.
Romanian scholars such as Vatasescu and Mihaescu, using lexical analysis of the Albanian language, have concluded that Albanian was heavily influenced by an extinct Romance language that was distinct from both Romanian and Dalmatian. Because the Latin words common to only Romanian and Albanian are significantly fewer in number than those that are common to only Albanian and Western Romance, Mihaescu argues that the Albanian language evolved in a region with much greater contact with Western Romance regions than with Romanian-speaking regions, and located this region in present-day Albania, Kosovo and Western Macedonia, spanning east to Bitola and Pristina.

Gothic loans

Some Gothic loanwords were borrowed through Late Latin, while others came from the Ostrogothic expansion into parts of Praevalitana around Nikšić and the Gulf of Kotor in Montenegro.
It is assumed that Greek and Balkan Latin exerted a great influence on Albanian. Examples of words borrowed from Latin: qytet < civitas, qiell < caelum, mik < amicus, kape ditën < carpe diem.
After the Slavs arrived in the Balkans, the Slavic languages became an additional source of loanwords. The rise of the Ottoman Empire meant an influx of Turkish words; this also entailed the borrowing of Persian and Arabic words through Turkish. Some Turkish personal names, such as Altin, are common. There are some loanwords from Modern Greek, especially in the south of Albania. Many borrowed words have been replaced by words with Albanian roots or modern Latinized words.

Patterns in loaning

Although Albanian is characterized by the absorption of many loans, even, in the case of Latin, reaching deep into the core vocabulary, certain semantic fields nevertheless remained more resistant. Terms pertaining to social organization are often preserved, though not those pertaining to political organization, while those pertaining to trade are all loaned or innovated.
Hydronyms present a complicated picture; the term for "sea" is native and an "Albano-Germanic" innovation referring to the concept of depth, but a large amount of maritime vocabulary is loaned. Words referring to large streams and their banks tend to be loans, but lumë is native, as is rrymë. Words for smaller streams and stagnant pools of water are more often native, but the word for "pond", pellg is in fact a semantically shifted descendant of the old Greek word for "high sea", suggesting a change in location after Greek contact. Albanian has maintained since Proto-Indo-European a specific term referring to a riverside forest, as well as its words for marshes. Curiously, Albanian has maintained native terms for "whirlpool", "water pit" and "deep place", leading Orel to speculate that the Albanian Urheimat likely had an excess of dangerous whirlpools and depths.
Regarding forests, words for most conifers and shrubs are native, as are the terms for "alder", "elm", "oak", "beech", and "linden", while "ash", "chestnut", "birch", "maple", "poplar", and "willow" are loans.
The original kinship terminology of Indo-European was radically reshaped; changes included a shift from "mother" to "sister", and were so thorough that only three terms retained their original function, the words for "son-in-law", "mother-in-law" and "father-in-law". All the words for second-degree blood kinship, including "aunt", "uncle", "nephew", "niece", and terms for grandchildren, are ancient loans from Latin.
The Proto-Albanians appear to have been cattle breeders given the vastness of preserved native vocabulary pertaining to cow breeding, milking and so forth, while words pertaining to dogs tend to be loaned. Many words concerning horses are preserved, but the word for horse itself is a Latin loan.