Human languages ranked by their number of native speakers are as follows. All such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1] For example, a language is often defined as a set of mutually intelligible varieties, but independent national standard languages may be considered separate languages even though they are largely mutually intelligible, as in the case of Danish and Norwegian. [2] Conversely, many commonly accepted languages, including German, Italian and even English encompass varieties that are not mutually intelligible. [1] While Arabic is sometimes considered a single language centred on Modern Standard Arabic, other authors consider its mutually unintelligible varieties separate languages. [3] Similarly, Chinese is sometimes viewed as a single language because of a shared culture and common literary language. [4] It is also common to describe various Chinese dialect groups, such as Mandarin, Wu and Yue, as languages, even though each of these groups contains many mutually unintelligible varieties. [5]
There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favour of a national language. [6]
The following languages are listed as having at least 50 million first-language speakers in the 27th edition of Ethnologue published in 2024. [7] This section does not include entries that Ethnologue identifies as macrolanguages encompassing all their respective varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.
According to the CIA World Factbook , the most-spoken first languages in 2018 were: [8]
Rank | Language | Percentage of world population (2018) |
---|---|---|
1 | Mandarin Chinese | 12.3% |
2 | Spanish | 6.0% |
3 | English | 5.1% |
3 | Arabic | 5.1% |
5 | Hindi | 3.5% |
6 | Bengali | 3.3% |
7 | Portuguese | 3.0% |
8 | Russian | 2.1% |
9 | Japanese | 1.7% |
10 | Western Punjabi | 1.3% |
11 | Javanese | 1.1% |
Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).
Chinese is a group of languages spoken natively by the ethnic Han Chinese majority and many minority ethnic groups in China. Approximately 1.35 billion people, or 17% of the global population, speak a variety of Chinese as their first language.
Dialect refers to two distinctly different types of linguistic relationships.
Ethnologue: Languages of the World is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It was first issued in 1951, and is now published by SIL International, an American evangelical Christian non-profit organization.
Hakka forms a language group of varieties of Chinese, spoken natively by the Hakka people in parts of Southern China and some diaspora areas of Taiwan, Southeast Asia and in overseas Chinese communities around the world.
Arvanitika, also known as Arvanitic, is the variety of Albanian traditionally spoken by the Arvanites, a population group in Greece. Arvanitika was brought to southern Greece during the late Middle Ages by Albanian settlers who moved south from their homeland in present-day Albania in several waves. The dialect preserves elements of medieval Albanian, while also being significantly influenced by the Greek language. Arvanitika is today endangered, as its speakers have been shifting to the use of Greek and most younger members of the community no longer speak it.
Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.
There are hundreds of local Chinese language varieties forming a branch of the Sino-Tibetan language family, many of which are not mutually intelligible. Variation is particularly strong in the more mountainous southeast part of mainland China. The varieties are typically classified into several groups: Mandarin, Wu, Min, Xiang, Gan, Jin, Hakka and Yue, though some varieties remain unclassified. These groups are neither clades nor individual languages defined by mutual intelligibility, but reflect common phonological developments from Middle Chinese.
A dialect continuum or dialect chain is a series of language varieties spoken across some geographical area such that neighboring varieties are mutually intelligible, but the differences accumulate over distance so that widely separated varieties may not be. This is a typical occurrence with widely spread languages and language families around the world, when these languages did not spread recently. Some prominent examples include the Indo-Aryan languages across large parts of India, varieties of Arabic across north Africa and southwest Asia, the Turkic languages, the Chinese languages or dialects, and parts of the Romance, Germanic and Slavic families in Europe. Terms used in older literature include dialect area and L-complex.
Marwari is a language within the Rajasthani language family of the Indo-Aryan languages. Marwari and its closely related varieties like Dhundhari, Shekhawati and Mewari form a part of the broader Marwari language family. It is spoken in the Indian state of Rajasthan, as well as the neighbouring states of Gujarat and Haryana, some adjacent areas in eastern parts of Pakistan, and some migrant communities in Nepal. There are two dozen varieties of Marwari. Marwari is also referred to as simply Rajasthani.
Literary language is the form (register) of a language used when writing in a formal, academic, or particularly polite tone; when speaking or writing in such a tone, it can also be known as formal language. It may be the standardized variety of a language. It can sometimes differ noticeably from the various spoken lects, but the difference between literary and non-literary forms is greater in some languages than in others. If there is a strong divergence between a written form and the spoken vernacular, the language is said to exhibit diglossia.
In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.
A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in mainland China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.
Autonomy and heteronomy are complementary attributes of a language variety describing its functional relationship with related varieties. The concepts were introduced by William A. Stewart in 1968, and provide a way of distinguishing a language from a dialect.
Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.
The Fangyan is a Chinese dictionary compiled in the early 1st century CE by the poet and philosopher Yang Xiong. It was the first Chinese dictionary to include significant regional vocabulary, and is considered the "most significant lexicographic work" of its era. His dictionary's preface explains how he spent 27 years amassing and collating the dictionary. Yang collected regionalisms from many sources, particularly the 'light carriage' surveys made during the Zhou and Qin dynasties, where imperial emissaries were sent into the countryside annually to record folk songs and idioms from across China, reaching as far north as Korea.
Central Asian Arabic or Jugari Arabic refers to a set of four closely-related varieties of Arabic currently facing extinction and spoken predominantly by Arab communities living in portions of Central Asia. These varieties are Bactrian Arabic, Bukhara Arabic, Qashqa Darya Arabic, and Khorasani Arabic.
Rawang, also known as Krangku, Kiutze (Qiuze), and Ch’opa, is a Sino-Tibetan language of India and Burma. Rawang has a high degree of internal diversity, and some varieties are not mutually intelligible. Most, however, understand Mutwang (Matwang), the standard dialect, and basis of written Rawang.