Indo-European languages worldwide by country
  Official or primary language
  Secondary official language
  No use
The approximate present-day distribution of the Indo-European branches within their homelands of Europe and Asia:
  Non-Indo-European languages
Dotted/striped areas indicate where multilingualism is common.
The approximate present-day distribution of Indo-European languages within the Americas by country:

This is a list of languages in the Indo-European language family. It contains a large number of individual languages, together spoken by roughly half the world's population.

Numbers of languages and language groups

The Indo-European languages include some 449 (SIL estimate, 2018 edition[1]) languages spoken by about 3.5 billion people or more (roughly half of the world population). Most of the major languages belonging to language branches and groups in Europe, and western and southern Asia, belong to the Indo-European language family. This is thus the biggest language family in the world by number of mother tongue speakers (but not by number of languages: by this measure it is only the 3rd or 5th biggest). Eight of the top ten biggest languages, by number of native speakers, are Indo-European. One of these languages, English, is the de facto world lingua franca, with an estimate of over one billion second language speakers.

Indo-European language family has 10 known branches or subfamilies, of which eight are living and two are extinct. Most of the subfamilies or linguistic branches in this list contain many subgroups and individual languages. The relationships between these branches (how they are related to one another and branched from the ancestral proto-language) are a matter of further research and not yet fully known. There are some individual Indo-European languages that are unclassified within the language family; they are not yet classified in a branch and could constitute a separate branch.

The 449 Indo-European languages identified in the SIL estimate, 2018 edition,[1] are mostly living languages. If all the known extinct Indo-European languages are added, they number more than 800 or close to one thousand. This list includes all known Indo-European languages, living and extinct.

What constitutes a language?

The distinction between a language and a dialect is not clear-cut and simple: in many areas there is a dialect continuum, with transitional dialects and languages. Further, there is no agreed standard criterion for what amount of differences in vocabulary, grammar, pronunciation and prosody are required to constitute a separate language, as opposed to a mere dialect. Mutual intelligibility can be considered, but there are closely related languages that are also mutual intelligible to some degree, even if it is an asymmetric intelligibility. Or there may be cases where between three dialects, A, B, and C, A and B are mutually intelligible, B and C are mutually intelligible, but A and C are not. In such circumstances grouping the three dielects becomes impossible. Because of this, in this list, several dialect groups and some individual dialects of languages are shown (in italics), especially if a language is or was spoken by a large number of people and over a large land area, but also if it has or had divergent dialects.

Summary of historical development

The ancestral population and language, Proto-Indo-Europeans that spoke Proto-Indo-European, are estimated to have lived about 4500 BCE (6500 BP). At some point in time, starting about 4000 BCE (6000 BP), this population expanded through migration and cultural influence. This started a complex process of population blend or population replacement, acculturation and language change of peoples in many regions of western and southern Eurasia.[2] This process gave origin to many languages and branches of this language family.

By around 1000 BCE, there were many millions of Indo-European speakers, and they lived in a vast geographical area which covered most of western and southern Eurasia (including western Central Asia).

In the following two millennia the number of speakers of Indo-European languages increased even further.

Indo-European languages continued to be spoken in large land areas, although most of western Central Asia and Asia Minor were lost to other language families (mainly Turkic) due to Turkic expansion, conquests and settlement (after the middle of the first millennium AD and the beginning and middle of the second millennium AD respectively) and also to Mongol invasions and conquests (which changed Central Asia ethnolinguistic composition). Another land area lost to non-Indo-European languages was today's Hungary, due to Magyar/Hungarian (Uralic language speakers) conquest and settlement.

However, from about AD 1500 onwards, Indo-European languages expanded their territories to North Asia (Siberia), through Russian expansion, and North America, South America, Australia and New Zealand as the result of the age of European discoveries and European conquests through the expansions of the Portuguese, Spanish, French, English and the Dutch. (These peoples had the biggest continental or maritime empires in the world and their countries were major powers.)

The contact between different peoples and languages, especially as a result of European colonization, also gave origin to the many pidgins, creoles and mixed languages that are mainly based in Indo-European languages (many of which are spoken in island groups and coastal regions).


Dating the split-offs of the main branches

Indo-European migrations as described in The Horse, the Wheel, and Language by David W. Anthony

Although all Indo-European languages descend from a common ancestor called Proto-Indo-European, the kinship between the subfamilies or branches (large groups of more closely related languages within the language family), that descend from other more recent proto-languages, is not the same because there are subfamilies that are closer or further, and they did not split-off at the same time, the affinity or kinship of Indo-European subfamilies or branches between themselves is still an unresolved and controversial issue and being investigated.

However, there is some consensus that Anatolian was the first group of Indo-European (branch) to split-off from all the others and Tocharian was the second in which that happened.[3]

Using a mathematical analysis borrowed from evolutionary biology, Donald Ringe and Tandy Warnow propose the following tree of Indo-European branches:[4]

David W. Anthony, following the methodology of Donald Ringe and Tandy Warnow, proposes the following sequence:[4]

List of Indo-European protolanguages

Scheme of Indo-European language dispersals from c. 4000 to 1000 BCE according to the widely held Kurgan hypothesis.
– Centre (5th.-4th. mill. BCE - Proto-Indo-European): Steppe cultures (West Eurasian Steppe, Pontic–Caspian steppe)
1 (black): Anatolian languages (early / archaic PIE)
2 (black): Afanasievo culture (ancestral to Tocharians and Tocharian languages) (middle PIE)
3 (black): Yamnaya culture expansion (Pontic–Caspian steppe, Danube Valley) (late PIE) (southwest black line): Proto-Italic, Proto-Celtic and other possible Indo-European branches
4A (black): Western Corded Ware
[NN] (black): pre-Proto-Germanic
[NN] (dark yellow): proto-Balto-Slavic (Balto-Slavic languages of the Baltic and Slavic peoples)
4B-C (blue & dark blue): Bell Beaker; adopted by Indo-European speakers
5A-B (Fatyanova-Abashevo) (red): Eastern Corded ware
5C (red): Sintashta (proto-Indo-Iranian)
6 (magenta): Andronovo
7A (purple): Indo-Aryans (Mittani)
7B (purple): Indo-Aryans (Āryāvarta, modern northern India and Pakistan, later expanding towards Sri Lanka and the Maldives)
8 (grey): proto-Greek
9 (yellow): proto-Iranian (Iranian languages of the Iranian peoples)
– [not drawn]: Armenian, expanding from western steppe and settling in the Armenian Highlands by a western or an eastern route. Geographical group of languages known as Paleo-Balkan in the Balkans, included Dacian, Moesian, Thracian, Brygian (Balkan Phrygian), Paeonian, Illyrian, and Dalmato-Pannonian.

Protolanguages that developed into the Indo-European languages

The following is a list of protolanguages of known Indo-European subfamilies and deeper branches.

The list below follows Donald Ringe, Tandy Warnow and Ann Taylor classification tree for Indo-European branches.[5] quoted in Anthony, David W. (2007), The Horse, the Wheel and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World, Princeton University Press.

Anatolian languages (all extinct)

Anatolian languages in 2nd millennium BC; Blue: Luwian, Yellow: Hittite, Red: Palaic.

Tocharian languages (Agni-Kuči languages) (all extinct)

Tocharian languages: A (blue), B (red) and C (green) in the Tarim Basin.[31] Tarim oasis towns are given as listed in the Book of Han (c. 2nd century BC). The areas of the squares are proportional to population.

Albanian language

See also: Albanoid

Distribution of modern Albanian dialects.

Italic languages

Iron Age Italy (c.500 B.C.). Italic languages in green colours.
Length of the Roman rule and the Romance Languages[36]
Romance languages in Europe (major dialect groups are also shown).
European extent of Romance languages in the 20th century
Eastern and Western Romance areas split by the La Spezia–Rimini Line; Southern Romance is represented by Sardinian as an outlier.
Romance languages in the World. Countries and sub-national entities where one or more Romance languages are spoken. Dark colours: First language, Light colours: Official or Co-Official language; Very Light colours: Spoken by a significant minority as first or second language. Blue: French; Green: Spanish; Orange: Portuguese; Yellow: Italian; Red: Romanian.

Celtic languages

Diachronic distribution of Celtic language speakers:
  core Hallstatt territory, by the 6th century BCE
  maximal Celtic expansion, by 275 BCE
  Lusitanian and Vettonian area of Iberian Peninsula where Celtic presence is uncertain, Para-Celtic?
  the six Celtic nations which retained significant numbers of Celtic speakers into the Early Modern period
  areas where Celtic languages remain widely spoken today
Britain and Ireland in the first few centuries of the 1st millennium, before the founding of Anglo-Saxon kingdoms.
  Mainly Goidelic areas.
  Mainly Pictish areas.
  Mainly Brittonic areas.
Goidelic language and culture would eventually become dominant in the Pictish area and far northern Brittonic area.
A map of the modern distribution of the Celtic languages. Red: Welsh; Purple: Cornish; Black: Breton; Green: Irish; Blue: Scottish Gaelic: Yellow: Manx. Areas where languages overlap are shown in stripes.
Map of the Gaelic-speaking world. The red area shows the maximum extent of Old Irish (common ancestor of Irish, Scottish Gaelic and Manx); the orange area shows places with Ogham inscriptions; and the green area are modern Gaelic-speaking areas. Orkney and Shetland islands were never majority Scots Gaelic or Scottish Gaelic speaking.
Linguistic division in early twelfth century Scotland:
  Gaelic speaking ("Scots" here refers to Scots Gaelic not to Germanic Scots)
  Norse-Gaelic zone, characterized by the use of both languages
  English-speaking zone
  Cumbric may have survived in this zone; more realistically a mixture of Cumbric, Gaelic (west) and English (east)

Hellenic languages

Distribution of Greek dialects in Greece in the classical period.[45]
Distribution of Greek dialects in Magna Graecia (Southern Italy and Sicily) in the classical period.
The distribution of major modern Greek dialect areas.
Anatolian Greek until 1923. Demotic in yellow. Pontic in orange. Cappadocian in green. Green dots indicate Cappadocian-Greek-speaking villages in 1910.[46]

Armenian language

Armenian dialects, according to Adjarian (1909) (before 1st World War and Armenian Genocide). In many regions of the contiguous area shown in the map, Armenian speakers were the majority or a significant minority.
Modern geographical distribution of the Armenian language.

Germanic languages

One proposed theory for approximate distribution of the primary Germanic dialect groups in Europe around the year 1 AD. East Germanic Northwest Germanic West Germanic North Germanic
Germanic languages and main dialect groups in Europe after 1945.
Germanic languages in the World. Countries and sub-national entities where one or more Germanic languages are spoken. Dark Red: First language; Red: Official or Co-Official language, Pink: Spoken by a significant minority as second language.

Balto-Slavic languages

Area of Balto-Slavic dialect continuum (purple) with proposed material cultures correlating to speakers Balto-Slavic in Bronze Age (white). Red dots= archaic Slavic hydronyms.
Political map of Europe with countries where a Slavic language is a national language marked in shades of green and where a Baltic language is a national language marked in light orange. Wood green represents East Slavic languages, pale green represents West Slavic languages, and sea green represents South Slavic languages. Contemporary Baltic languages are all from the same group: Eastern Baltic
Baltic languages (extinct languages shown in stripes).
Slavic languages in Europe (2008). Areas where languages overlap are shown in stripes.
Russian Language – Map of all the areas where the Russian language is the language spoken by the majority of the population. Russian is the biggest Slavic language both in number of first language speakers and in geographical area where the language is spoken (a vast land area of Eastern Europe and North AsiaSiberia, i.e. most of Northern Eurasia).

Baltic languages

Slavic languages

Indo-Iranian languages

Geographic distribution of modern Indo-Iranian languages. Blue, dark purple and green colour shades: Iranic languages. Dark pink: Nuristani languages. Red, light purple and orange colour shades: Indo-Aryan languages. Areas where languages overlap are shown in stripes.

Iranian languages

Map of Attested and Hypothetical Old Indo-Iranian Dialects. Indo-Iranian languages descend from the language spoken by the Sintashta Culture people that lived in the plains beyond the southeast Ural Mountains, between the upper Ural and Tobol rivers basins. Old Iranian languages (shown in green), were spoken in a large Eurasian landmass area that included most of south Eastern Europe, south west Siberia, Central Asia, including parts of western China, and the Iranian Plateau. The Scythian languages (including Saka), that belonged to the Northern Eastern Iranian languages subgroup, were the ones with the biggest geographical distribution, they were spoken in most of the steppe and desert areas of Eastern Europe and Central Asia, matching most of the western half of the Eurasian steppe, which corresponds to modern southern European Russia and south Russian west Siberia and parts of southern central Siberia, modern southern Ukraine, an enclave in the east Pannonian Basin, in modern Hungary, all of modern Kazakhstan, parts of modern Xinjiang, in Western China, modern Kyrgyzstan, and parts of modern Uzbekistan and modern Turkmenistan.[62] Later Scythian languages were also present in northern India by migration of part of the ancient Iranian peoples forming the Indo-Scythians. This was the geographical distribution until the first centuries A.D., after that time, Turkic migration and conquests along with Turkification, made many ancient Iranian languages go extinct.
Approximate distribution of Iranic peoples in Central Asia during the Iron Age.
Distribution of modern Iranian Languages (detailed map showing genealogical relations between languages in a table) (grey areas, in the deserts of central Iran and in eastern Tajikistan's Pamir Mountains, are uninhabited; white areas do not have a majority of Iranian languages speakers, although some areas have significant minorities of Iranian languages speakers as first language or as a second language, especially in Iran and in Afghanistan but also in Pakistan and Turkey). (Pamir languages is an areal group not a genealogical one).

Nuristani languages (Kamozian)

Nuristan Province in Afghanistan, where most speakers live.

Transitional Iranian-Indo-Aryan[75][76] (older name: Kafiri) (according to some scholars[77][78] there is the possibility that the older name "Kapisi" that was synonymal of Kambojas, related to the ancient Kingdom of Kapisa, in modern-day Kapisa Province, changed to "Kafiri" and came to be confused and assimilated with "kafiri", meaning "infidel" in Arabic and used in Islam)

Nuristani languages.

Indo-Aryan languages

Present-day geographical distribution of the major Indo-Aryan language groups. Romani, Domari, Kholosi and Lomavren are outside the scope of the map. Colours indicate the branches – yellow is Eastern, purple is Dardic, blue is Northwestern, red is Southern, green is Western, brown is Northern and orange is Central. Data is from "The Indo Aryan Languages" as well as census data and previous linguistic maps.Dardic
  Pashai (Dardic)
  Chitrali (Dardic)
  Shina (Dardic)
  Kohistani (Dardic)
  Kashmiri (Dardic)
  Punjabi (Northwestern)
  Sindhi (Northwestern)
  Rajasthani (Western)
  Gujarati (Western)
  Bhili (Western)
  Khandeshi (Western)
  Himachali-Dogri (= W. Pahari, Northern)
  Garhwali-Kumaoni (= C. Pahari, Northern)
  Nepali (= E. Pahari, Northern)
  Western Hindi (Central)
  Eastern Hindi (Central)
  Bihari (Eastern)
  Bengali-Assamese (Eastern)
  Odia (Eastern)
  Halbi (Eastern)
  Marathi-Konkani (Southern)
  Sinhala-Maldivian (Southern)
(not shown: Kunar (Dardic), Chinali-Lahuli).
Distribution of major Indo-Aryan languages. Urdu is included under Hindi. Romani, Domari, and Lomavren are outside the scope of the map.) Dotted/striped areas indicate where multilingualism is common.
Romani languages and dialects in Europe. Romani languages are part of the Indo-Aryan branch of Indo-European languages but are spoken out of the Indian Subcontinent. They are related to the Domari languages (spoken by the Doma or Dom) and are scattered and minority languages in all regions, overlapping with other peoples and their languages in Europe. The Domari and Romani languages are spoken in a vast geographical area from Southwest Asia to Europe and North Africa but are minoritary and scattered in all the regions in part because Domari and Romani speakers, the Doma and the Roma, were traditionally nomadic peoples.

Unclassified Indo-European languages (all extinct)

Indo-European languages whose relationship to other languages in the family is unclear

Possible Indo-European languages (all extinct)

Unclassified languages that may have been Indo-European or members of other language families (?)

Hypothetical Indo-European languages (all extinct)

According to Allentoft (2015), the Sintashta culture probably derived at least partially from the Corded Ware culture. Nordqvist and Heyd (2020) confirm this.

See also


  1. ^ a b