In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue's method of calculation consists in comparing a regionally standardized wordlist (comparable to the Swadesh list) and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialects.[1]

The lexical similarity is only one indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. The variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Indo-European languages

The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by Ethnologue.[2]

Lang.
code
Language 1
Lexical similarity coefficients
Italian Spanish Portuguese French Romanian Catalan Romansh Sardinian English German Russian
ita Italian 1 0.82 0.80 0.89 0.77 0.87 0.78 0.85 - - -
spa Spanish 0.82 1 0.89 0.75 0.71 0.85 0.74 0.76 - - -
por Portuguese 0.80 0.89 1 0.75 0.72 0.85 0.74 0.76 - - -
fra French 0.89 0.75 0.75 1 0.75 - 0.78 0.80 0.27 0.29 -
ron Romanian 0.77 0.71 0.72 0.75 1 0.73 0.72 0.74 - - -
cat Catalan 0.87 0.85 0.85 - 0.73 1 0.76 0.75 - - -
roh Romansh 0.78 0.74 0.74 0.78 0.72 0.76 1 0.74 - - -
srd Sardinian 0.85 0.76 0.76 0.80 0.74 0.75 0.74 1 - - -
eng English - - - 0.27 - - - - 1 0.60 0.24
deu German - - - 0.29 - - - - 0.60 1 -
rus Russian - - - - - - - - 0.24 - 1
Italian Spanish Portuguese French Romanian Catalan Romansh Sardinian English German Russian
Language 2 → ita spa por fra ron cat roh srd eng deu rus

Notes:

See also

References

Notes

  1. ^ "Methodology". Ethnologue. 2024-02-21. Retrieved 2024-05-31.
  2. ^ See, for instance, lexical similarity data for French, German, English
  3. ^ a b "Bolognesi, Roberto; Heeringa, Wilbert. Sardegna fra tante lingue, pp.123, 2005, Condaghes" (PDF). Archived from the original (PDF) on 2014-02-11. Retrieved 2017-04-14.
  4. ^ Finkenstaedt, Thomas; Dieter Wolff (1973). Ordered profusion; studies in dictionaries and the English lexicon. C. Winter. ISBN 3-533-02253-6.
  5. ^ "Joseph M. Willams, Origins of the English Language at". Amazon.com. Retrieved 2010-04-21.
  6. ^ Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge University Press. p. 477. ISBN 0-521-80498-1.