Lexical similarity

Last updated

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

Contents

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue's method of calculation consists in comparing a regionally standardized wordlist (comparable to the Swadesh list) and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialects. [1]

The lexical similarity is only one indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. The variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Indo-European languages

The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by Ethnologue . [2]

Lang.
code
Language 1
Lexical similarity coefficients
ItalianSpanishPortugueseFrenchRomanianCatalanRomanshSardinianEnglishGermanRussian
ita Italian 10.820.800.890.770.870.780.85---
spa Spanish 0.8210.890.750.710.850.740.76---
por Portuguese 0.800.8910.750.720.850.740.76---
fra French 0.890.750.7510.75-0.780.800.270.29-
ron Romanian 0.770.710.720.7510.730.720.74---
cat Catalan 0.870.850.85-0.7310.760.75---
roh Romansh 0.780.740.740.780.720.7610.74---
srd Sardinian 0.850.760.760.800.740.750.741---
eng English ---0.27----10.600.24
deu German ---0.29----0.601-
rus Russian --------0.24-1
ItalianSpanishPortugueseFrenchRomanianCatalanRomanshSardinianEnglishGermanRussian
Language 2 →itaspaporfraroncatrohsrdengdeurus

Notes:

See also

Related Research Articles

Dialect refers to two distinctly different types of linguistic relationships.

<span class="mw-page-title-main">Gallurese</span> Romance language spoken in northeastern Sardinia

Gallurese is a Romance dialect of the Italo-Dalmatian family spoken in the region of Gallura, northeastern Sardinia. Gallurese is variously described as a distinct southern dialect of Corsican or transitional language of the dialect continuum between Corsican and Sardinian. "Gallurese International Day" takes place each year in Palau (Sardinia) with the participation of orators from other areas, including Corsica.

<span class="mw-page-title-main">Campidanese Sardinian</span> Written standard of the Sardinian language

Campidanese Sardinian is one of the two written standards of the Sardinian language, which is often considered one of the most, if not the most conservative of all the Romance languages. The orthography is based on the spoken dialects of central southern Sardinia, identified by certain attributes which are not found, or found to a lesser degree, among the Sardinian dialects centered on the other written form, Logudorese. Its ISO 639-3 code is sro.

<span class="mw-page-title-main">Marwari language</span> Indo-Aryan language

Marwari is a language within the Rajasthani language family of the Indo-Aryan languages. Marwari and its closely related varieties like Dhundhari, Shekhawati and Mewari form a part of the broader Marwari language family. It is spoken in the Indian state of Rajasthan, as well as the neighbouring states of Gujarat and Haryana, some adjacent areas in eastern parts of Pakistan, and some migrant communities in Nepal. There are two dozen varieties of Marwari. Marwari is also referred to as simply Rajasthani.

<span class="mw-page-title-main">Somali language</span> Cushitic language of the Horn of Africa

Somali is an Afroasiatic language belonging to the Cushitic branch. It is spoken as a mother tongue by Somalis in Greater Somalia and the Somali diaspora. Somali is an official language in Somalia and Ethiopia, and a national language in Djibouti as well as in northeastern Kenya. The Somali language is written officially with the Latin alphabet although the Arabic alphabet and several Somali scripts like Osmanya, Kaddare and the Borama script are informally used.

<span class="mw-page-title-main">Mutual intelligibility</span> Closeness of linguistic varieties

In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.

A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in mainland China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.

Ratagnon is a regional language spoken by the Ratagnon people, an indigenous group from Occidental Mindoro. It is a part of the Bisayan language family and is closely related to other Philippine languages. Its speakers are shifting to Tagalog. In 2000, there were only two to five speakers of the language. However, in 2010 Ethnologue had reported there were 310 new speakers.

<span class="mw-page-title-main">Dogon languages</span> Dialect continuum of southeastern Mali

The Dogon languages are a small closely related language family that is spoken by the Dogon people of Mali and may belong to the proposed Niger–Congo family. There are about 600,000 speakers of its dozen languages. They are tonal languages, and most, like Dogul, have two tones, but some, like Donno So, have three. Their basic word order is subject–object–verb.

Dialectology is the scientific study of linguistic dialect. In the 19th century a branch of historical linguistics, dialectology is today by some considered a sub-field of sociolinguistics. It studies variations in language based primarily on geographic distribution and their associated features. Dialectology deals with such topics as divergence of two local dialects from a common ancestor and synchronic variation.

<span class="mw-page-title-main">Linguistic purism</span> Preferring a language variety as purer

Linguistic purism or linguistic protectionism is the prescriptive practice of defining or recognizing one variety of a language as being purer or of intrinsically higher quality than other varieties. Linguistic purism was institutionalized through language academies, and their decisions often have the force of law.

In historical linguistics, sister languages are languages that are descended from a common ancestral language. Every language in a language family that descends from the same language as the others is a sister to them.

Kunjen, or Uw, is a Paman language spoken on the Cape York Peninsula of Queensland, Australia, by the Uw Oykangand, Olkola, and related Aboriginal Australian peoples. It is closely related to Kuuk Thaayorre, and perhaps Kuuk Yak.

Linguistic distance is the measure of how different one language is from another. Although they lack a uniform approach to quantifying linguistic distance between languages, linguists apply the concept to a variety of linguistic contexts, such as second-language acquisition, historical linguistics, language-based conflicts, and the effects of language differences on trade.

Abun, also known as Yimbun, Anden, Manif, or Karon Pantai, is a Papuan language spoken by the Abun people along the northern coast of the Bird's Head Peninsula in Sausapor District, Tambrauw Regency. It is not closely related to any other language, and though Ross (2005) assigned it to the West Papuan family, based on similarities in pronouns, Palmer (2018), Ethnologue, and Glottolog list it as a language isolate.

Khumi, or Khumi Chin, is a Kuki-Chin-Mizo language of Burma, with some speakers across the border in Bangladesh. Khumi shares 75%–87% lexical similarity with Eastern Khumi, and 78-81% similarity with Mro-Khimi.

Pyen is a Loloish language of Myanmar. It is spoken by about 700 people in two villages near Mong Yang, Shan State, Burma, just to the north of Kengtung.

<span class="mw-page-title-main">Dupaningan Agta</span> Austronesian language of the Philippines

Dupaningan Agta, or Eastern Cagayan Agta, is a language spoken by a semi-nomadic hunter-gatherer Negrito people of Cagayan and Isabela provinces in northern Luzon, Philippines. Its Yaga dialect is only partially intelligible.

<span class="mw-page-title-main">Southern Alta language</span> Austronesian language spoken in the Philippines

Southern Alta, is a distinctive Aeta language of the mountains of northern Philippines. Southern Alta is one of many endangered languages that risks being lost if it is not passed on by current speakers. Most speakers of Southern Alta also speak Tagalog.

Gumuz is a dialect cluster spoken along the border of Ethiopia and Sudan. It has been tentatively classified within the Nilo-Saharan family. Most Ethiopian speakers live in Kamashi Zone and Metekel Zone of the Benishangul-Gumuz Region, although a group of 1,000 reportedly live outside the town of Welkite. The Sudanese speakers live in the area east of Er Roseires, around Famaka and Fazoglo on the Blue Nile, extending north along the border. Dimmendaal et al. (2019) suspect that the poorly attested varieties spoken along the river constitute a distinct language, Kadallu.

References

Notes

  1. "Methodology". Ethnologue. 2024-02-21. Retrieved 2024-05-31.
  2. See, for instance, lexical similarity data for French, German, English
  3. 1 2 "Bolognesi, Roberto; Heeringa, Wilbert. Sardegna fra tante lingue, pp.123, 2005, Condaghes" (PDF). Archived from the original (PDF) on 2014-02-11. Retrieved 2017-04-14.
  4. Finkenstaedt, Thomas; Dieter Wolff (1973). Ordered profusion; studies in dictionaries and the English lexicon. C. Winter. ISBN   3-533-02253-6.
  5. "Joseph M. Willams, Origins of the English Language at". Amazon.com. Retrieved 2010-04-21.
  6. Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge University Press. p. 477. ISBN   0-521-80498-1.