This is a list of languages by total number of speakers.
It is difficult to define what constitutes a language as opposed to a dialect. For example, Chinese and Arabic are sometimes considered single languages, but each includes several mutually unintelligible varieties, and so they are sometimes considered language families instead. Conversely, colloquial registers of Hindi and Urdu are almost completely mutually intelligible, and are sometimes classified as one language, Hindustani. Such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1]
There is no single criterion for how much knowledge is sufficient to be counted as a second-language speaker. For example, English has about 450 million native speakers but, depending on the criterion chosen, can be said to have as many as two billion speakers. [2]
There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favor of a national language. [3]
The following languages are listed as having 45 million or more total speakers in the 26th edition of Ethnologue published in 2023. [4] This section does not include entries that Ethnologue identifies as macrolanguages encompassing all their respective varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.
Language | Family | Branch | First-language (L1) speakers | Second-language (L2) speakers | Total speakers (L1+L2) |
---|---|---|---|---|---|
English (excl. creole languages) | Indo-European | Germanic | 380 million | 1.077 billion [5] | 1.456 billion |
Mandarin Chinese (incl. Standard Chinese, but excl. other varieties) | Sino-Tibetan | Sinitic | 939 million | 199 million [6] | 1.138 billion |
Hindi (excl. Urdu) | Indo-European | Indo-Aryan | 345 million | 266 million [7] | 610 million |
Spanish (excl. creole languages) | Indo-European | Romance | 485 million | 74 million [8] | 559 million |
French (excl. creole languages) | Indo-European | Romance | 81 million | 229 million [9] | 310 million |
Modern Standard Arabic (excl. dialects) | Afro-Asiatic | Semitic | — [lower-alpha 1] | 274 million [11] | 274 million |
Bengali | Indo-European | Indo-Aryan | 234 million | 39 million [12] | 273 million |
Portuguese (excl. creole languages) | Indo-European | Romance | 236 million | 27 million [13] | 264 million |
Russian | Indo-European | Balto-Slavic | 147 million | 108 million [14] | 255 million |
Urdu (excl. Hindi) | Indo-European | Indo-Aryan | 71 million | 161 million [15] | 232 million |
Indonesian (excl. other Malay) | Austronesian | Malayo-Polynesian | 44 million | 155 million [16] | 199 million |
Standard German | Indo-European | Germanic | 75 million | 58 million [17] | 133 million |
Japanese | Japonic | — | 123 million | 0.2 million [18] | 123 million |
Nigerian Pidgin | English Creole | Krio | 5 million | 116 million [19] | 121 million |
Egyptian Arabic (excl. other Arabic dialects) | Afro-Asiatic | Semitic | 77 million | 25 million [20] | 102 million |
Marathi | Indo-European | Indo-Aryan | 83 million | 16 million [21] | 99 million |
Telugu | Dravidian | South-Central | 83 million | 13 million [22] | 96 million |
Turkish | Turkic | Oghuz | 84 million | 6 million [23] | 90 million |
Tamil | Dravidian | Southern | 79 million | 8 million [24] | 87 million |
Yue Chinese (incl. Cantonese) | Sino-Tibetan | Sinitic | 86 million | 1 million [25] | 87 million |
Vietnamese | Austroasiatic | Vietic | 85 million | 1 million [26] | 86 million |
Wu Chinese (incl. Shanghainese) | Sino-Tibetan | Sinitic | 83 million | 0.1 million [27] | 83 million |
Tagalog [lower-alpha 2] | Austronesian | Malayo-Polynesian | 29 million | 54 million [28] | 83 million |
Korean | Koreanic | — | 82 million | — [29] | 82 million |
Iranian Persian (excl. other Persian dialects) | Indo-European | Iranian | 57 million | 21 million [30] | 79 million |
Hausa | Afro-Asiatic | Chadic | 52 million | 27 million [31] | 79 million |
Swahili | Niger–Congo | Bantu | 16 million | 55 million [32] | 72 million |
Javanese | Austronesian | Malayo-Polynesian | — | — [33] | 68 million |
Italian | Indo-European | Romance | 65 million | 3 million [34] | 68 million |
Western Punjabi (excl. Eastern Punjabi) | Indo-European | Indo-Aryan | — | — [35] | 67 million |
Gujarati | Indo-European | Indo-Aryan | 57 million | 5 million [36] | 62 million |
Thai | Kra–Dai | Zhuang–Tai | 21 million | 40 million [37] | 61 million |
Kannada | Dravidian | Southern | 44 million | 15 million [38] | 59 million |
Amharic | Afro-Asiatic | Semitic | 32 million | 25 million [39] | 58 million |
Bhojpuri | Indo-European | Indo-Aryan | 52 million | 0.2 million [40] | 52 million |
Eastern Punjabi (excl. Western Punjabi) | Indo-European | Indo-Aryan | 48 million | 4 million [41] | 52 million |
Min Nan Chinese (incl. Hokkien) | Sino-Tibetan | Sinitic | 50 million | 0.4 million [42] | 50 million |
Jin Chinese | Sino-Tibetan | Sinitic | — | — [43] | 48 million |
Levantine Arabic (excl. other Arabic dialects) | Afro-Asiatic | Semitic | 47 million | 0.4 million [44] | 48 million |
Yoruba | Niger–Congo | Atlantic–Congo | 44 million | 2 million [45] | 46 million |
The World Factbook , produced by the Central Intelligence Agency (CIA), estimates the ten most spoken languages (L1 + L2) in 2022 as follows: [46]
Language | Percentage of world population (2022) |
---|---|
English | 18.8% |
Mandarin Chinese | 13.8% |
Hindi | 7.5% |
Spanish | 6.9% |
French | 3.4% |
Arabic | 3.4% |
Bengali | 3.4% |
Russian | 3.2% |
Portuguese | 3.2% |
Urdu | 2.9% |
Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).
There are over 250 languages indigenous to Europe, and most belong to the Indo-European language family. Out of a total European population of 744 million as of 2018, some 94% are native speakers of an Indo-European language. The three largest phyla of the Indo-European language family in Europe are Romance, Germanic, and Slavic; they have more than 200 million speakers each, and together account for close to 90% of Europeans.
Modern Standard Hindi, commonly referred to as Hindi, is an Indo-Aryan language from the Indo-European language family. It is the standardised variety of the Hindustani language written in Devanagari script and the official language of India alongside English, as well as the lingua franca of North India. Hindi is considered a Sanskritised register of the Hindustani language, which itself is based primarily on the Khariboli dialect of Delhi and neighbouring areas. It is an official language in nine states and three union territories and an additional official language in three other states. Hindi is also one of the 22 scheduled languages of the Republic of India.
Demographic features of the United Arab Emirates (UAE) include population density, vital statistics, immigration and emigration data, ethnicity, education levels, religions practiced, and languages spoken within the UAE.
Urdu is an Indo-Aryan language spoken chiefly in South Asia. It is the national language and lingua franca of Pakistan, where it is also an official language alongside English. In India, Urdu is an Eighth Schedule language, the status and cultural heritage of which are recognised by the Constitution of India; and it also has an official status in several Indian states. In Nepal, Urdu is a registered regional dialect and in South Africa it is a protected language in the constitution. It is also spoken as a minority language in Afghanistan and Bangladesh, with no official status.
Hindustani is an Indo-Aryan language spoken in North India, Pakistan and the Deccan and used as the official language of India and Pakistan. Hindustani is a pluricentric language with two standard registers, known as Hindi and Urdu. Thus, it is also called Hindi–Urdu. Colloquial registers of the language fall on a spectrum between these standards. In modern times, a third variety of Hindustani with significant English influences has also appeared which is sometimes called Hinglish or Urdish.
Pakistan is a multilingual country with over 70 languages spoken as first languages. The majority of Pakistan's languages belong to the Indo-Iranian group of the Indo-European language family.
Marwari is a language within the Rajasthani language family of the Indo-Aryan languages. Marwari and its closely related varieties like Dhundhari, Shekhawati and Mewari form a part of the broader Marwari language family. It is spoken in the Indian state of Rajasthan, as well as the neighbouring states of Gujarat and Haryana, some adjacent areas in eastern parts of Pakistan, and some migrant communities in Nepal. There are two dozen varieties of Marwari. Marwari is also referred to as simply Rajasthani.
Literary language is the form (register) of a language used when writing in a formal, academic, or particularly polite tone; when speaking or writing in such a tone, it can also be known as formal language. It may be the standardized variety of a language. It can sometimes differ noticeably from the various spoken lects, but the difference between literary and non-literary forms is greater in some languages than in others. If there is a strong divergence between a written form and the spoken vernacular, the language is said to exhibit diglossia.
In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.
A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in mainland China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.
Modern Standard Arabic (MSA) or Modern Written Arabic (MWA) is the variety of standardized, literary Arabic that developed in the Arab world in the late 19th and early 20th centuries, and in some usages also the variety of spoken Arabic that approximates this written standard. MSA is the language used in literature, academia, print and mass media, law and legislation, though it is generally not spoken as a first language, similar to Contemporary Latin. It is a pluricentric standard language taught throughout the Arab world in formal education, differing significantly from many vernacular varieties of Arabic that are commonly spoken as mother tongues in the area; these are only partially mutually intelligible with both MSA and with each other depending on their proximity in the Arabic dialect continuum.
The main languages spoken in Eritrea are Tigrinya, Tigre, Kunama, Bilen, Nara, Saho, Afar, and Beja. The country's working languages are Tigrinya, Arabic, English.
Afghanistan is a linguistically diverse nation, with upwards of 40 distinct languages. However, Dari and Pashto are two of the most prominent languages in the country, and have shared official status under various governments of Afghanistan. Dari, as a shared language between multiple ethnic groups in the country, has served as a historical lingua franca between different linguistic groups in the region and is the most widely understood language in the country. Pashto is also widely spoken in the region; but the language does not have a diverse multi-ethnic population like Dari, and the language is not as commonly spoken by non-Pashtuns. Dari and Pashto are also "relatives", as both are Iranian languages.
Kuwaiti is a Gulf Arabic dialect spoken in Kuwait. Kuwaiti Arabic shares many phonetic features unique to Gulf dialects spoken in the Arabian Peninsula. Due to Kuwait's soap opera industry, knowledge of Kuwaiti Arabic has spread throughout the Arabic-speaking world and become recognizable even to people in countries such as Tunisia and Jordan.
There are a number of languages in Morocco. De jure, the two official languages are Standard Arabic and Standard Moroccan Berber. Moroccan Arabic is by far the primary spoken vernacular and lingua franca, whereas Berber languages serve as vernaculars for significant portions of the country. The languages of prestige in Morocco are Arabic in its Classical and Modern Standard Forms and sometimes French, the latter of which serves as a second language for approximately 33% of Moroccans. According to a 2000–2002 survey done by Moha Ennaji, author of Multilingualism, Cultural Identity, and Education in Morocco, "there is a general agreement that Standard Arabic, Moroccan Arabic, and Berber are the national languages." Ennaji also concluded "This survey confirms the idea that multilingualism in Morocco is a vivid sociolinguistic phenomenon, which is favored by many people."
Many countries and national censuses currently enumerate or have previously enumerated their populations by languages, native language, home language, level of knowing language or a combination of these characteristics.