Haplogroup L-M20


Haplogroup L-M20 is a human Y-DNA haplogroup, which is defined by SNPs M11, M20, M61 and M185. As a secondary descendant of haplogroup K and a primary branch of haplogroup LT, haplogroup L currently has the alternative phylogenetic name of K1a, and is a sibling of haplogroup T.
The presence of L-M20 has been observed at varying levels throughout South Asia, peaking in populations native to the southern Pakistani province of Balochistan, Northern Afghanistan, and Southern India. The clade also occurs in Tajikistan and Anatolia, as well as at lower frequencies in Iran. It has also been present for millennia at very low levels in the Caucasus, Europe and Central Asia. The subclade L2 has been found in Europe and Western Asia, but is extremely rare.

Phylogenetic tree

There are several confirmed and proposed phylogenetic trees available for haplogroup L-M20. The scientifically accepted one is the Y-Chromosome Consortium one published in Karafet 2008 and subsequently updated. A draft tree that shows emerging science is provided by Thomas Krahn at the Genomic Research Center in Houston, Texas. The International Society of Genetic Genealogy also provides an amateur tree.
This is Thomas Krahn at the Genomic Research Center's Draft tree Proposed Tree for haplogroup L-M20:L-M20 M11, M20, M61, M185, L656, L863, L878, L879
  • * L-M22 M22, M295, PAGES00121
  • ** L-M317 M317, L655
  • *** L-L656 L656
  • **** L-M349 M349
  • *** L-M274 M274
  • *** L-L1310 L1310
  • *** L-SK1412
  • ** L-L1304 L1304
  • *** L-M27 M27, M76, P329.1, L1318, L1319, L1320, L1321
  • *** L-M357 M357, L1307
  • **** L-PK3 PK3
  • **** L-L1305 L1305, L1306, L1307
  • * L-L595 L595
  • ** L-L864 L864, L865, L866, L867, L868, L869, L870, L877

Origins

L-M20 is a descendant of Haplogroup LT, which is a descendant of haplogroup K-M9. According to Dr. Spencer Wells, L-M20 originated in the Eurasian K-M9 clan that migrated eastwards from the Middle East, and later southwards from the Pamir Knot into present-day Pakistan and India. These people arrived in the Indian subcontinent approximately 30,000 years ago. Hence, it is hypothesized that the first bearer of M20 marker was born either in the subcontinent or the Middle East. Other studies have proposed either a Southern Iran or South Asian origin for L-M20 and associated its expansion in the Indus valley to Neolithic farmers. Genetic studies suggest that L-M20 may be one of the haplogroups of the original creators of the Indus Valley Civilisation. Time estimates generated based on seven Y-STR loci within L-M20 lineages for north and south Afghanistan populations are intermediate to those of Pakistan and India. Furthermore, Pakistan displays higher haplotype variance than India, suggesting that L-M20 most likely originated in what is today Pakistan then subsequently spread into southern India.
McElreavy and Quintana-Murci, writing on the Indus Valley Civilisation, state that
Sengupta et al. observed three subbranches of haplogroup L: L1-M76, L2-M317 and L3-M357, with distinctive geographic affiliations. Almost all Indian members of haplogroup L are L1 derived, with L3-M357 occurring only sporadically. Conversely in Pakistan, L3-M357 subclade account for 86% of L-M20 chromosomes and reaches an intermediate frequency of 6.8%, overall. L1-M76 occurs at a frequency of 7.5% in India and 5.1% in Pakistan, exhibiting peak variance distribution in the Maharashtra region in coastal western India.

Proposed Elamo-Dravidian connection

The contentious Elamo-Dravidian hypothesis posits that the proto–Elamo-Dravidian language originated and spread from ancient Elam into the Indian subcontinent with agriculture, and that the Harappan language of the Indus Valley Civilisation was a related one. Some genetic studies propose a link between the migratory pattern of lineages derived from subclade L1-M22 and the spread of Elamo-Dravidian languages. According to Palanichamy et al., the presence of several haplogroup L subclades among Iranian populations and mostly L1a among Dravidian peoples of South India, while being rare among Indo-Aryan speakers, along with the coalescence time, indicate that L1a arrived from Iran through Neolithic farmers, and was probably responsible for the spread of the Dravidian language to India.
According to Pathak et at., subclade L1-M22 originated in West Asia about 20,600 ybp. Based on the findings from recent genetic researches, this study emphasizes that the Iranian element within the Indus valley population came from Caucasus/Iranian hunter-gatherers, who were related to, yet different from Iranian farmers. The study supports the Elamo-Dravidian hypothesis by connecting all L1a lineages to the original CIHG population, whose descendants settled in both Elam and the Indus valley. The study puts forward the view that L1a descendants that remained in West Asia possibly participated in the development of the Elamite language, while the ones that moved from the Iranian plateau to the Indus valley, after 8000 ybp, introduced Dravidian languages to South Asia. In the Indus valley, people of CIHG ancestry mixed with those of Ancient Ancestral South Indian ancestry, to form the Indus Periphery Cline lineage that typically represented the IVC population. It is suggested that after the collapse of the Indus Valley Civilization, the IPC groups moved eastwards and southwards, and mixed with the pre-existing AASI heavy populations, giving rise to the Ancestral South Indians, an ancestry which correlates strongly with the spread of Dravidian languages.

Geographical distribution

In India, L-M20 has a higher frequency among Dravidian castes, but is somewhat rarer in Indo-Aryan castes. In Pakistan, it has a frequency of about 28% in the southern regions including southern Baluchistan, from where the agricultural creators of the Indus valley civilization emerged.
Preliminary evidence gleaned from non-scientific sources, such as individuals who have had their Y-chromosomes tested by commercial labs, suggests that most European examples of Haplogroup L-M20 might belong to the subclade L2-M317, which is, among South Asian populations, generally the rarest of the subclades of Haplogroup L.

South Asia

India

It has higher frequency among Dravidian castes but is somewhat rarer in Indo-Aryan castes. The presence of haplogroup L-M20 is quite rare among tribal groups., Ror and Kamboj. L2a2 is around 62.7% among Brokpa of Ladakh. With a frequency of 54.9%, L-M20 constitutes a major lineage among Indian Parsi priests. It reaches an overall frequency of 21% among Indian Parsis, in general. L-M20 was found at 38% in the Bharwad caste and 21% in Charan caste from Junagarh district in Gujarat. It has also been reported at 17% in the Kare Vokkal tribe from Uttara Kannada in Karnataka. It is also found at low frequencies in other populations from Junagarh district and Uttara Kannada. L-M20 is the single largest male lineage among the Jat people of Northern India and is found at 16.33% among the Gujar's of Jammu and Kashmir. It also occurs at 18.6% among the Konkanastha Brahmins of the Konkan region and at 15% among the Maratha's of Maharashtra. L-M20 is also found at 32.35% in the Vokkaligas and at 17.82% in the Lingayats of Karnataka.
And available data shows that among Tamils, L-M20 is found at 48% among Kallar, 28.57% among Vanniyars, 26% among the Saurashtra people, 25.47% among the Nadars, 20.7% among the Ambalakarar, 20.56% among Tamil Yadavas, 17.2% among the Iyer and 16.7% among the Iyengar castes of Tamil Nadu. L-M11 is found in frequencies of 8-16% among Indian Jews. L-M20 has an overall frequency of 12% in Punjab. 2% of Siddis have also been reported with L-M11. Haplogroup L-M20 is currently present in the Indian population at an overall frequency of ca. 7-15%.

Pakistan

The greatest concentration of Haplogroup L-M20 is along the Indus River in Pakistan where the Indus Valley civilization flourished during 3300–1300 BC with its mature period between 2600 and 1900 BCE. L-M357's highest frequency and diversity is found in the Balochistan province at 28% with a moderate distribution among the general Pakistani population at 11.6%. It is also found in Afghanistan ethnic counterparts as well, such as with the Pashtuns and Balochis. L-M357 is found frequently among Burusho and Pashtuns,
L1a and L1c-M357 are found at 24% among Balochis, L1a and L1c are found at 8% among the Dravidian-speaking Brahui, L1c is found at 25% among Kalash, L1c is found at 15% among Burusho, L1a-M76 and L1b-M317 are found at 2% among the Makranis and L1c is found at 3.6% of Sindhis according to Julie di Cristofaro et al. 2013. L-M20 is found at 17.78% among the Parsis. L3a is found at 23% among the Nuristanis in both Pakistan and Afghanistan.
L-PK3 is found in approximately 23% of Kalash in northwest Pakistan.
In one study, the haplogroup L was found also observed among the Gujars at a frequency of in northwest Pakistan.

Middle East and Anatolia

L-M20 was found in 51% of Syrians from Raqqa, a northern Syrian city whose previous inhabitants were wiped out by Mongol genocides and repopulated in recent times by local Bedouin populations and Chechen war refugees from Russia. In a small sample of Israeli Druze haplogroup L-M20 was found in 7 out of 20. However, studies done on bigger samples showed that L-M20 averages 5% in Israeli Druze, 8% in Lebanese Druze, and it was not found in a sample of 59 Syrian Druze. Haplogroup L-M20 has been found in 2.0% to 5.25% of Lebanese.
PopulationsDistributionSource
Turkey57% in Afshar village, 12% in Black Sea Region, 6.6% of Turks in Turkey, 4.2% ,
Iran54.9% L in Priest Zoroastrian Parsis
22.2% L1b and L1c in South Iran
8% to 16% L2-L595, L1a, L1b and L1c of Kurds in Kordestan
9.1% L-M20 of Persians in Eastern Iran
3.4% L-M76 and 2.6% L-M317
for a total of 6.0% haplogroup L-M20 in Southern Iran
3.0% L-M357 in Northern Iran
4.2% L1c-M357 of Azeris in East Azeris
4.8% L1a and L1b of Persians in Esfahan
,,
Syria51.0% of Syrians in Raqqa, 31.0% of Eastern Syrians
Laz41.7% L1b-M317
Saudi Arabians15.6% 1.91%
Kurds3.2% of Kurds in Southeast Turkey
Iraq3.1% L-M22
Armenians1.63% to 4.3% and
Omanis1% L-M11
Qataris2.8%
UAE Arabs3.0%

Central Asia

Afghanistan

A study on the Pashtun male lineages in Afghanistan, found that Haplogroup L-M20, with an overall frequency of 9.5%, is the second most abundant male lineage among them. It exhibits substantial disparity in its distribution on either side of the Hindu Kush range, with 25% of the northern Afghan Pashtuns belonging to this lineage, compared with only 4.8% of males from the south. Specifically, paragroup L3*-M357 accounts for the majority of the L-M20 chromosomes among Afghan Pashtuns in both the north and south. An earlier study involving a lesser number of samples had reported that L1c comprises 12.24% of the Afghan Pashtun male lineages. L1c is also found at 7.69% among the Balochs of Afghanistan. However, L1a-M76 occurs in a much more higher frequency among the Balochs, and is found at lower levels in Kyrgyz, Tajik, Uzbek and Turkmen populations.
PopulationsDistributionSource
Tajiks22.5%, 11.1% L1a and L1c in Balkh Province, 9.0%, 6.3% L1c in Samangan Province, 5.4% L1c in Badakhshan Province
Uzbeks20% L1c in Balkh Province, 14.3% L1a and L1c in Sar-e Pol Province, 7.5% L1a, L1b and L1c in Jawzjan Province, 3.0% to 3.7% , and
Uyghurs16.7% L1c-M357 in Kyrgyzstan
Pamiris16% of Shugnanis, 12% 3/25 of Ishkashimis, 0/30 Bartangis
Hazaras12.5% L1a in Balkh Province, 1.9% L1a in Bamiyan Province
Yagnobis9.7%
Bukharan Arabs9.5%
Pashtuns9.4% 'L1a and L1b in Kunduz Province, 2.9% L1c in Baghlan Province
Dungans8.3% L1c in Kyrgyzstan
Uyghurs 7.8% L-M357 in Qarchugha Village, Lopnur County, Xinjiang
Karakalpaks4.5%
Uyghurs4.4% and
Turkmens4.1% L1a in Jawzjan Province
Chelkans4.0% and
Kyrgyzes2.7% L1c in Northwest Kyrgyzstan and 2.5% L1a' in Central Kyrgyzstan
Kazan Tatars2.6%
Hui1.9%
Bashkirs0.64%

East Asia

Researchers studying samples of Y-DNA from populations of East Asia have rarely tested their samples for any of the mutations that define Haplogroup L. However, mutations for Haplogroup L have been tested and detected in samples of Balinese, Han Chinese, Dolgans from Sakha and Taymyr and Koreans.

Europe

An article by O. Semino et al. published in the journal Science reported the detection of the M11-G mutation, which is one of the mutations that defines Haplogroup L, in approximately 1% to 3% of samples from Georgia, Greece, Hungary, Calabria, and Andalusia. The sizes of the samples analyzed in this study were generally quite small, so it is possible that the actual frequency of Haplogroup L-M20 among Mediterranean European populations may be slightly lower or higher than that reported by Semino et al., but there seems to be no study to date that has described more precisely the distribution of Haplogroup L-M20 in Southwest Asia and Europe.
PopulationsDistributionSource
Fascia, Italy19.2% of Fascians L-M20
Nonstal, Italy10% of Nonesi L-M20
Samnium, Italy10% of Aquilanis L-M20
Vicenza, Italy10% of Venetians L-M20
South Tyrol, Italy8.9% of Ladin speakers from Val Badia, 8.3% of Val Badia, 2.9% of Puster Valley, 2.2% of German speakers from Val Badia, 2% of German speakers from Upper Vinschgau, 1.9% of German speakers from Lower Vinschgau and 1.7% of Italian speakers from Bolzano and.
Georgians20% of Georgians in Gali, 14.3% of Georgians in Chokhatauri, 12.5% of Georgians in Martvili, 11.8% of Georgians in Abasha, 11.1% of Georgians in Baghdati, 10% of Georgians in Gardabani, 9.1% of Georgians in Adigeni, 6.9% of Georgians in Omalo, 5.9% of Georgians in Gurjaani, 5.9% of Georgians in Lentekhi and 1.5% L-M357 to 1.6% L-M11, and
Daghestan, Russia10% of Chechens in Daghestan, 9.5% of Avars, 8.3% of Tats, 3.7% of Chamalins,
Arkhangelsk Oblast, Russia5.9% of Russians L1c-M357
EstoniaL2-L595 and L1-M22 are found in 5.3%, 3.5%, 1.4% and 0.8% of Estonians and
Balkarians, Russia5.3% L-M317
Portugal5.0% of Coimbra
Bulgaria3.9% of Bulgarians
FlandersL1a*: 3.17% of Mechelen 2.4% of Turnhout and 1.3% of Kempen. L1b*: 0.74% of West Flanders and East Flanders and
Antsiferovo, Novgorod2.3% of Russians
East Tyrol, AustriaL-M20 is found in 1.9% of Tyroleans in Region B
Gipuzkoa, Basque CountryL1b is found in 1.7% of Gipuzkoans
North Tyrol, AustriaL-M20 is found in 0.8% of Tyroleans in Reutte

Southern Africa and the Swahili Coast

Researchers in 2013 studying the origins of the Lemba people - who are of paternal South Arabian ancestry - found that 13.8% of Lemba males carried the Y-DNA L-M20, specifically the subclade L-M349 making it the 4th most common lineage amongst them. A Lemba sample from South Africa submitted to Familytreedna in 2023 was found to carry a yet unnamed L-M349 subclade of L-FT408126 which was closest to 2 samples from Iraq and Iran.
Researchers also found traces of traces of L-M20 on the Swahili coast in Kenya amounting to 4.2% of the total population.

Subclade distribution

L1 (M295)

L-M295 is found from Western Europe to South Asia.
The L1 subclade is also found at low frequencies on the Comoros Islands.

L1a1 (M27)

L-M27 is found in 14.5% of Indians and 15% of Sri Lankans, with a moderate distribution in other populations of Pakistan, southern Iran and Europe, but slightly higher Middle East Arab populations. There is a very minor presence among Siddi's, as well.

L1a2 (M357)

L-M357 is found frequently among Burushos, Kalashas, Brokpa, Jats, Pashtuns, with a moderate distribution among other populations in Pakistan, Georgia, Chechens, Ingushes, northern Iran, India, the UAE, and Saudi Arabia. Brokpa of Ladakh carry Y haplogroup L2a2 around 62.7% according to generetic study of 2019.
A Chinese study published in 2018 found L-M357/L1307 in 7.8% of a sample of Loplik Uyghurs from Qarchugha Village, Lopnur County, Xinjiang.
;L-PK3
L-PK3, which is downstream of L-M357, is found frequently among Kalash.

L1b (M317)

L-M317 is found at low frequency in Central Asia, Southwest Asia, and Europe.
In Europe, L-M317 has been found in Northeast Italians and Greeks.
In Caucasia, L-M317 has been found in Mountain Jews, Avars, Balkarians, Abkhaz, Chamalals, Abazins, Adyghes, Chechens, Armenians, Lezgins, and Ossetes.
L-M317 has been found in Makranis in Pakistan, Iranians, Pashtuns in Afghanistan
, and Uzbeks in Afghanistan.

L1b1 (M349)

L-M349 is found in some Crimean Karaites who are Levites. Some of L-M349's branches are found in West Asia, including L-Y31183 in Lebanon, L-Y31184 in Armenia, and L-Y130640 in Iraq, Iran, Yemen and South Africa. Others are found in Europe, such as L-PAGE116 in Italy, L-FT304386 in Slovenia, and L-FGC36841 in Moldova. 13.8% of Lemba males carry L-M349 under the clade L-Y130640. This percentage is most likely due to a founder effect in their population making them the only group on the African continent with any substantial proportion of L-M20.

L2 (L595)

L2-L595 is extremely rare, and has been identified by private testing in individuals from Europe and Western Asia.
Two confirmed L2-L595 individuals from Iran were reported in a 2020 study supplementary. Possible but unconfirmed cases of L2 include 4% L-M11 in a sample of Iranians in Kordestan and 2% L-M20 in a sample of Shapsugs, among other rare reported cases of L which don't fall into the common branches.
RegionPopulationn/Sample sizePercentageSource
West AsiaAzerbaijan2/2041
Central EuropeGermany1/86410.0000115
Southern EuropeGreece1/7530.1
West AsiaIran2/8000.25
Southern EuropeItaly3/9130.3

Ancient DNA

  • Three individuals from Maykop culture c. 3200 BCE were found to belong to haplogroup L2-L595.
  • Three individuals who lived in the Chalcolithic era, found in the Areni-1 cave in the South Caucasus mountains, were also identified as belonging to haplogroup L1a. One individual's genome indicated that he had red hair and blue eyes. Their genetic data is listed in the table below.
  • Narasimhan et al. analyzed skeletons from the BMAC sites in Uzbekistan and identified 2 individuals as belonging to haplogroup L1a. One of these specimens was found in Bustan and the other in Sappali Tepe; both ascertained to be Bronze Age sites.
  • Skourtanioti et al. analyzed skeletons from Alalakh and identified one individual c. 2006-1777 BC as belonging to haplogroup L-L595. Ingman et al. analyzed more skeletons from Alalakh and identified another individual belonging to haplogroup L-M349.
  • One Iron Age individual from Batman in Upper Mesopotamia belonged to haplogroup L2-L595.
  • An ancient Viking individual that lived in Öland, Sweden circa 847 ± 65 CE was determined to belong to L-L595.

Nomenclature

Prior to 2002, there were in academic literature at least seven naming systems for the Y-Chromosome Phylogenetic tree. This led to considerable confusion. In 2002, the major research groups came together and formed the Y-Chromosome Consortium. They published a joint paper that created a single new tree that all agreed to use. Later, a group of citizen scientists with an interest in population genetics and genetic genealogy formed a working group to create an amateur tree aiming at being above all timely. The table below brings together all of these works at the point of the landmark 2002 YCC Tree. This allows a researcher reviewing older published literature to quickly move between nomenclatures.
YCC 2002/2008 ''''''YCC 2002 YCC 2005 YCC 2008 YCC 2010r ISOGG 2006ISOGG 2007ISOGG 2008ISOGG 2009ISOGG 2010ISOGG 2011ISOGG 2012
L-M2028VIII1U27Eu17H5FL*LLL-------
L-M2728VIII1U27Eu17H5FL1L1L1L1-------

;The Y-Chromosome Consortium tree
This is the official scientific tree produced by the Y-Chromosome Consortium. The last major update was in 2008. Subsequent updates have been quarterly and biannual. The current version is a revision of the 2010 update.
; Original research publications
The following research teams per their publications were represented in the creation of the YCC Tree.