This language recognition chart presents a variety of clues one can use to help determine the language in which a text is written.

Characters[edit]

The language of a foreign text can often be identified by looking up characters specific to that language.

Latin alphabet (possibly extended)[edit]

Romance languages

Lots of Latin roots.

French (Français)

Spanish (Español)

Italian (Italiano)

Catalan (Català)

Romanian (Română)

Portuguese (Português)

Walloon (Walon)

Galician (Galego)

Germanic languages

English

Dutch (Nederlands)

West Frisian (Frysk)

Afrikaans (Afrikaans)

German (Deutsch)

Swedish (Svenska)

Danish (Dansk)

Norwegian (Norsk)

Icelandic (Íslenska)

Faroese (Føroyskt)

Baltic languages

Latvian (Latviešu)

Lithuanian (Lietuvių)

Slavic languages

Polish (Polski)

Czech (Čeština)

Slovak (Slovenčina)

Croatian (Hrvatski)

Serbian (Srpski/Српски)

Serbian Latin
Serbian Cyrillic

Celtic languages

Welsh (Cymraeg)

Irish (Gaeilge)

Scottish Gaelic (Gàidhlig)

Albanian (Shqip)

Maltese (Malti)

Iranian languages

Kurdish (Kurdî / كوردی)

Finno-Ugric languages

Finnish (Suomi)

Estonian (Eesti)

Hungarian (Magyar)

Eskimo–Aleut languages

Greenlandic

Southern Athabaskan languages

Navajo (Diné bizaad)

In addition to the above,

(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)

In addition to the above,

Guaraní

Japanese in Romaji (Nihongo/日本語)

(Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)

Hmong (Hmoob) written in Romanized Popular Alphabet

Vietnamese (tiếng Việt)

Vietnamese Quoted-Readable (VIQR)

Vietnamese VNI encoding

Vietnamese Telex

Chinese, Romanized

Standard Mandarin (現代標準漢語)

Pinyin
Wade–Giles
Gwoyeu Romatzyh

Southern Min / Min-Nan (Bân-lâm-gí/Bân-lâm-gú) in Pe̍h-ōe-jī

Austronesian languages

Malay (bahasa Melayu) and Indonesian (bahasa Indonesia)

May contain the following:
Prefixes: me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes: -kan, -an, -i
Others (these almost always written in lowercase): yang, dan, di, ke, oleh, itu

Malay and Indonesian are mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language. See Comparison of Standard Malay and Indonesian.

Frequent use of the letter 'a' (comparable to the frequency of the English 'e').

Polynesian languages

Most Polynesian languages use A E F G H I K L M N O P R S T U V and ʻ (sometimes written ' or Q)

Tongan (lea fakatonga)
Samoan (gagana samoa)
Wallisian (lea faka'uvea)
East Futunan (lea fakafutuna)

Turkic languages

Note that some Turkic languages like Azeri and Turkmen use a similar Latin alphabet (often Jaŋalif) and similar words, and might be confused with Turkish. Azeri has the letters Əə, Xx and Qq not present in the Turkish alphabet, and Türkmen has Ää, Žž, Ňň, Ýý and Ww. Latin Characters uniquely (or nearly uniquely) used for Turkic languages: Əə, Ŋŋ, Ɵɵ, Ьь, Ƣƣ, Ğğ, İ, and ı. All Turkic languages can form long words by adding multiple suffixes.

Turkish (Türkçe/Türkiye_Türkçesi)

Turkish Alphabet

Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z

Common words
Misc.

Azeri (Azərbaycanca)

Azeri can be easily recognized by the frequent use of ə. This letter is not used in any other officially recognized modern Latin alphabet. In addition, it uses the letters x and q, which are not used in Turkish.

Chinese (中文)[edit]

Simplified Chinese (简体) vs Traditional Chinese (繁體)

Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.

Common radicals different between Traditional and Simplified:

Common characters different between Traditional and Simplified:

Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese

See also: zh:粵語

Note: Apart from Hong Kong, there are also Cantonese-speakers in southern Mainland China, Malaysia and Singapore[1], so written Cantonese can be written in either Simplified or Traditional characters.


Common characters in Vernacular Cantonese that do not occur or seldom occur in Mandarin:

Some of the above characters are not supported in all character encodings, so sometimes the 口 radical on the left is substituted with a 0 or o, e.g.


Sometimes, different Chinese characters are used to express the same meaning in Cantonese and Mandarin. If you use the one commonly used in Cantonese to express the same meaning when you are speaking or writing Mandarin, a native speaker may be confused or even find it difficult to understand, and vice versa. Some examples are: (Cantonese vs Mandarin)


There are Chinese words used to construct vocabularies used in Cantonese that are not or seldomly implemented in modern Mandarin. Some examples are: (Cantonese vs Mandarin)


Cantonese vocabularies constructed by Cantonese words are used in daily life in southern China and are not used in modern Mandarin. Some examples are:


Finally, when terms are introduced from other countries(especially the US and the UK) to China, Cantonese and Mandarin often get different translations, where Cantonese often translates according to pronunciation of the terms in English and Mandarin often translates according to the meaning of the terms. Some examples are: (Cantonese vs Mandarin)

Japanese (日本語)[edit]

Korean (한국어/조선말)[edit]

Khmer language ភាសារខ្មែរ[edit]

Khmer is written using the distinctive Khmer alphabet.

Greek (Ελληνικά)[edit]

Modern Greek is written with Greek alphabet in monotonic, polytonic or atonic, either according to Demotic (Mr. Triantafilidis) grammar or Katharevousa grammar. Some people write in Greeklish (Greek with Latin script) which is either Visual-based, orthographic or phonetic or just messed-up (mixed). The only official orthographic forms of Greek language are Monotonic and Polytonic.

Normal Modern Greek (Greek Monotonic)

Pre-1980s Greek (Greek Polytonic)

Katharevousa, Dimotiki (Triantafylidis' grammar)

Ancient Greek

Greek Atonic

Greek in Greeklish

Orthographic Greeklish

Phonetic Greeklish

Visual-based Greeklish

Messed-up (Mixed) Greeklish

Armenian (Հայերեն)[edit]

Armenian can be recognized by its unique 39-letter alphabet:

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք ԵՎ(և) Օ Ֆ

Georgian (ქართული)[edit]

Georgian can be recognised by its unique alphabet (note some characters have fallen out of use).

ა ბ გ დ ე ვ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ

Cyrillic alphabet[edit]

Bolding denotes letters unique to the language

Slavic languages

Belarusian (беларуская)

Bulgarian (български)

Macedonian (македонски)

Russian (русский)

Serbian (српски)

Ukrainian (українська)

Mongolian

Montenegrin

Ossetian

Arabic alphabet[edit]

Arabic (العربية)

Persian (فارسی)

Except in very rare case, verbs are at the end of a phrase.

Urdu (اردو)

Syriac Alphabet[edit]

Syriac (ܐܬܘܪܝܐ)

Dravidian languages[edit]

Kannada

Tamil

அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன

Telugu

Telugu has 56 characters (Aksharamulu) including vowels (Achchulu) and consonants (Hallulu). Telugu uses eighteen vowels, each of which has both an independent form and a diacritic form used with consonants to create syllables. The language makes a distinction between short and long vowels.

అ ఆ ఇ ఈ ఉ ఊ ఋ ౠ ఌ ౡ ఎ ఏ ఐ ఒ ఓ ఔ అం అః క ఖ గ ఘ ఙ చ ఛ జ ఝ ఞ ట ఠ డ ఢ ణ త థ ద ధ న ప ఫ బ భ మ య ర ఱ ల ళ వ శ ష స హ

౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯

Bengali[edit]

The Bengali alphabet or Bangla alphabet (Bengali: বাংলা বর্ণমালা, bangla bôrnômala) or Bengali script (Bengali: বাংলা লিপি, bangla lipi) is the writing system, originating in the Indian subcontinent, for the Bengali language and is the fifth most widely used writing system in the world. The script is used for other languages like Assamese, Maithili, Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.

Bengali

Bengali has unique 50 letter Alphabet.

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ল শ ষ স হ ড় ঢ় য় ৎ ঃ ং ঁ

া ি ী ু ূ ৃ ে ৈ ো ৌ

Assamese

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য ৰ ল শ ষ স হ ড় ঢ় য় ৎ ঃ ং ঁ

া ি ী ু ূ ৃ ে ৈ ো ৌ

Canadian Aboriginal syllabics[edit]

In modern writing, Canadian Aboriginal syllabics are indicative of Cree languages, Inuktitut, or Ojibwe, though the latter two are also written in alternative scripts. The basic glyph set is ᐁ ᐱ ᑌ ᑫ ᒉ ᒣ ᓀ ᓭ ᔦ, each of which may appear in any of four orientations, boldfaced, superscripted, and with diacritics including ᑊ ᐟ ᐠ ᐨ ᒼ ᐣ ᐢ ᐧ ᐤ ᐦ ᕽ ᓫ ᕑ. This abugida has also been used for Blackfoot.

Other North American syllabics[edit]

Cherokee

Cherokee writing features a unique syllabary consisting of the following characters:

ᎡᎢᎣᎤᎥᎦᎧᎨᎩᎪᎫᎬᎭᎮᎯᎰᎱᎲᎳᎴᎵᎶᎷᎸᎹᎺᎻᎼᎽᎾᎿᏀᏁᏂᏃᏄᏅᏆᏇᏈᏉᏊᏋᏌᏍᏎᏏᏐᏑᏒᏓᏔᏕᏖᏗᏘᏙᏚᏛᏜᏝᏞᏟᏠᏡᏢᏣᏤᏥᏦᏧᏨᏩᏪᏫᏬᏭᏮᏯᏰᏱᏲᏳᏴ.

Artificial languages[edit]

Esperanto (Esperanto)

Klingon (tlhIngan Hol)

Lojban (lojban.)

Toki Pona (toki pona)

Full alphabet: p, t, k, s, m, n, l, j, w, a, e, i, o, u

External links[edit]

  1. ^ https://www.oakton.edu/user/4/billtong/chinaclass/Language/cantonese.htm