This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Grapheme" – news · newspapers · books · scholar · JSTOR (April 2020) (Learn how and when to remove this message)

Reading
Part of a series on

Learning to read Reading readiness Vocabulary development Vocabulary learning
Scientific theories and models Dual route theory Simple view of reading Science of reading Scarborough's Reading Rope The active view of reading model
Cognitive processes Comprehension Phonemic awareness Phonological awareness Subvocalization Word recognition
Reading instruction Analytic phonics Basal reader Concept-oriented Directed listening and thinking activity Guided reading Independent reading Literature circle Phonics Reciprocal teaching Structured word inquiry Sustained silent reading Synthetic phonics Whole language
Reading rate Fluency Slow reading Speed reading Words per minute
Readability Automatic assessment Legibility Readability test
Reading differences and disabilities Dyslexia Hyperlexia Reading disability Reading for special needs
Language Alphabetic principle Braille Dolch word list Grapheme History of printing Language Morpheme Orthography Phoneme Sight word Vocabulary Written language Writing Writing system
Literacy Critical literacy Close reading Distant reading Family literacy Functional illiteracy Great books Literary criticism Literature Children's
v t e

In linguistics, a grapheme is the smallest functional unit of a writing system.^[1] The word grapheme is derived from Ancient Greek γράφω (gráphō) 'write' and the suffix -eme by analogy with phoneme and other names of emic units. The study of graphemes is called graphemics. The concept of graphemes is abstract and similar to the notion in computing of a character. By comparison, a specific shape that represents any particular grapheme in a given typeface is called a glyph.

Conceptualization

[edit]

There are two main opposing grapheme concepts.^[2]

In the so-called referential conception, graphemes are interpreted as the smallest units of writing that correspond with sounds (more accurately phonemes). In this concept, the sh in the written English word shake would be a grapheme because it represents the phoneme /ʃ/. This referential concept is linked to the dependency hypothesis that claims that writing merely depicts speech.

By contrast, the analogical concept defines graphemes analogously to phonemes, i.e. via written minimal pairs such as shake vs. snake. In this example, h and n are graphemes because they distinguish two words. This analogical concept is associated with the autonomy hypothesis which holds that writing is a system in its own right and should be studied independently from speech. Both concepts have weaknesses.^[3]

Some models adhere to both concepts simultaneously by including two individual units,^[4] which are given names such as graphemic grapheme for the grapheme according to the analogical conception (h in shake), and phonological-fit grapheme for the grapheme according to the referential concept (sh in shake).^[5]

In newer concepts, in which the grapheme is interpreted semiotically as a dyadic linguistic sign,^[6] it is defined as a minimal unit of writing that is both lexically distinctive and corresponds with a linguistic unit (phoneme, syllable, or morpheme).^[7]

Notation

[edit]

Graphemes are often notated within angle brackets: e.g. ⟨a⟩.^[8] This is analogous to the slash notation /a/ used for phonemes. Analogous to the square bracket notation [a] used for phones, glyphs are sometimes denoted with vertical lines, e.g. |ɑ|.^[9]

Glyphs

[edit]

In the same way that the surface forms of phonemes are speech sounds or phones (and different phones representing the same phoneme are called allophones), the surface forms of graphemes are glyphs (sometimes graphs), namely concrete written representations of symbols (and different glyphs representing the same grapheme are called allographs).

Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that are all functionally equivalent.

For example, in written English (or other languages using the Latin alphabet), there are two different physical representations of the lowercase Latin letter "a": "a" and "ɑ". Since, however, the substitution of either of them for the other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written ⟨a⟩. Similarly, the grapheme corresponding to "Arabic numeral zero" has a unique semantic identity and Unicode value U+0030 but exhibits variation in the form of slashed zero. Italic and bold face forms are also allographic, as is the variation seen in serif (as in Times New Roman) versus sans-serif (as in Helvetica) forms.

There is some disagreement as to whether capital and lower case letters are allographs or distinct graphemes. Capitals are generally found in certain triggering contexts that do not change the meaning of a word: a proper name, for example, or at the beginning of a sentence, or all caps in a newspaper headline. In other contexts, capitalization can determine meaning: compare, for example Polish and polish: the former is a language, the latter is for shining shoes.

Some linguists consider digraphs like the ⟨sh⟩ in ship to be distinct graphemes, but these are generally analyzed as sequences of graphemes. Non-stylistic ligatures, however, such as ⟨æ⟩, are distinct graphemes, as are various letters with distinctive diacritics, such as ⟨ç⟩.

Identical glyphs may not always represent the same grapheme. For example, the three letters ⟨A⟩, ⟨А⟩ and ⟨Α⟩ appear identical but each has a different meaning: in order, they are the Latin letter A, the Cyrillic letter Azǔ/Азъ and the Greek letter Alpha. Each has its own code point in Unicode: U+0041 A LATIN CAPITAL LETTER A, U+0410 А CYRILLIC CAPITAL LETTER A and U+0391 Α GREEK CAPITAL LETTER ALPHA.

Types of grapheme

[edit]

This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this section. Unsourced material may be challenged and removed.Find sources: "Grapheme" – news · newspapers · books · scholar · JSTOR (December 2022) (Learn how and when to remove this message)

The principal types of graphemes are logograms (more accurately termed morphograms^[10]), which represent words or morphemes (for example Chinese characters, the ampersand "&" representing the word and, Arabic numerals); syllabic characters, representing syllables (as in Japanese kana); and alphabetic letters, corresponding roughly to phonemes (see next section). For a full discussion of the different types, see Writing system § Functional classification.

There are additional graphemic components used in writing, such as punctuation marks, mathematical symbols, word dividers such as the space, and other typographic symbols. Ancient logographic scripts often used silent determinatives to disambiguate the meaning of a neighboring (non-silent) word.

Relationship with phonemes

[edit]

As mentioned in the previous section, in languages that use alphabetic writing systems, many of the graphemes stand in principle for the phonemes (significant sounds) of the language. In practice, however, the orthographies of such languages entail at least a certain amount of deviation from the ideal of exact grapheme–phoneme correspondence. A phoneme may be represented by a multigraph (sequence of more than one grapheme), as the digraph sh represents a single sound in English (and sometimes a single grapheme may represent more than one phoneme, as with the Russian letter я or the Spanish c). Some graphemes may not represent any sound at all (like the b in English debt or the h in all Spanish words containing the said letter), and often the rules of correspondence between graphemes and phonemes become complex or irregular, particularly as a result of historical sound changes that are not necessarily reflected in spelling. "Shallow" orthographies such as those of standard Spanish and Finnish have relatively regular (though not always one-to-one) correspondence between graphemes and phonemes, while those of French and English have much less regular correspondence, and are known as deep orthographies.

Multigraphs representing a single phoneme are normally treated as combinations of separate letters, not as graphemes in their own right. However, in some languages a multigraph may be treated as a single unit for the purposes of collation; for example, in a Czech dictionary, the section for words that start with ⟨ch⟩ comes after that for ⟨h⟩.^[11] For more examples, see Alphabetical order § Language-specific conventions.

References

[edit]

Wikimedia Commons has media related to Graphemes.

^ Coulmas, F. (1996), The Blackwell Encyclopedia of Writing Systems. Oxford: Blackwell, p. 174
^ Kohrt, M. (1986), The term 'grapheme' in the history and theory of linguistics. In G. Augst (Ed.), New trends in graphemics and orthography. Berlin: De Gruyter, pp. 80–96. doi:10.1515/9783110867329.80
^ Lockwood, D. G. (2001), Phoneme and grapheme: How parallel can they be? LACUS Forum 27, 307–316.
^ Rezec, O. (2013), Ein differenzierteres Strukturmodell des deutschen Schriftsystems. Linguistische Berichte 234, pp. 227–254.
^ Herrick, E. M. (1994), Of course a structural graphemics is possible! LACUS Forum 21, pp. 413–424.
^ Fedorova, L. (2013), The development of graphic representation in abugida writing: The akshara’s grammar. Lingua Posnaniensis 55:2, pp. 49–66. doi:10.2478/linpo-2013-0013
^ Meletis, D. (2019), The grapheme as a universal basic unit of writing. Writing Systems Research. doi:10.1080/17586801.2019.1697412
^ The Cambridge Encyclopedia of Language, second edition, Cambridge University Press, 1997, p. 196
^ Meletis, Dimitrios; Dürscheid, Christa (2022). Writing Systems and Their Use: An Overview of Grapholinguistics. De Gruyter Mouton. p. 64. ISBN 978-3-110-75777-4.
^ Joyce, T. (2011), The significance of the morphographic principle for the classification of writing systems, Written Language and Literacy 14:1, pp. 58–81. doi:10.1075/wll.14.1.04joy
^ Zeman, Dan. "Czech Alphabet, Code Page, Keyboard, and Sorting Order". Old-site.clsp.jhu.edu. Archived from the original on 15 April 2012. Retrieved 31 March 2012.

v t e Lexicology
Major terms	Lexical item Lexicon Lexis Word
Elements	Chereme Glyphs Grapheme Lemma Lexeme Morpheme Phoneme Seme Sememe
Semantic relations	Antonymy Hypernymy and hyponymy Meronymy and holonymy Idiom Lexical semantics Semantic network Synonym Troponymy
Functions	Function word Headword
Fields	Controlled vocabulary English lexicology and lexicography International scientific vocabulary Lexicographic error Lexicographic information cost Linguistic prescription Morphology Specialized lexicography
Linguistics portal

v t e Writing systems
Index of language articles
Overview	Language History of writing History of the alphabet Graphemes Scripts in Unicode
Lists	Writing systems Languages by writing system / by first written account Ancient languages corpuses by size Undeciphered writing systems Creators of writing systems
Types	Abjads Abugidas Alphabets Featural Ideogrammic Logographic Numeral Phonogrammic Pictographic Semi-syllabaries Shorthand Syllabaries
Current examples	Arabic Canadian syllabics Chinese Devanagari Hangul Kana Latin Mongolian
Related topics	In Africa In Southeast Asia

Types of writing systems

Overview	History of writing Grapheme
Lists	Writing systems undeciphered inventors constructed Languages by writing system / by first written accounts

Types

Abjads

Numerals

Aramaic
- Hatran
Arabic
- Elifba
Egyptian hieroglyphs
Elymaic
Hebrew
- Ashuri
- Cursive
- Rashi
- Solitreo
Tifinagh
Mandaic
Manichaean
Nabataean
Ancient North Arabian
Pahlavi
- Book
- Inscriptional
- Inscriptional Parthian
- Psalter
Pegon
Phoenician
- Paleo-Hebrew
Pitman shorthand
Proto-Sinaitic
Punic
Samaritan
South Arabian
- Zabur
- Musnad
Sogdian
Syriac
- ʾEsṭrangēlā
- Serṭā
- Maḏnḥāyā
Teeline Shorthand
Ugaritic

Abugidas

Brahmic

Northern

Bengali–Assamese
Bhaiksuki
Brahmi script
Devanagari
Dogri
Gujarati
Gupta
Gurmukhi
Kaithi
Kalinga
Khema
Khojki
Khudabadi
Laṇḍā
Lepcha
Mahajani
Marchen
Meitei
Modi
Multani
Nagari
Nandinagari
Nepalese scripts
- Bhujimol
- Golmol
- Himmol
- Kummol
- Kvemmol
- Pachumol
- Pracalit
- Ranjana
- Tamyig
- Tirhuta
- Limbu
- Litumol
Odia
- Karani
ʼPhags-pa
Sharada
Siddhaṃ
Soyombo
Sylheti Nagri
Takri
Tibetan
- Uchen
- Umê
Tocharian
Zanabazar square

Southern

Ahom
Balinese
Batak
Baybayin
Bhattiprolu
Buda
Buhid
Chakma
Cham
Fakkham
Grantha
Goykanadi
Hanunoo
Javanese
Kadamba
Kannada
Karen
Kawi
Khmer
- Khom Thai
Kulitan
Lanna
Langdi
Lao
Leke
Lontara
- Bilang-bilang
Makasar
Malayalam
Old Maldivian
- Dhives Akuru
- Eveyla Akuru
Mon–Burmese
Pallava
Pyu
Saurashtra
Shan
Sinhala
Sukhothai
Sundanese
- Old Sundanese
Tagbanwa
Tai Le
New Tai Lue
Tai Noi
Tai Tham
Tai Viet
Lai Tay
Tamil
Tamil-Brahmi
Tanchangya
Telugu
Thai
Tigalari
Ulu scripts
- Incung
- Lampung
- Lembak
- Ogan
- Pasemah
- Rejang
- Serawai
Vatteluttu
- Kolezhuthu
- Malayanma

Others

Bharati
Boyd's syllabic shorthand
Canadian syllabics
- Blackfoot
- Déné syllabics
Dham
Fox I
Geʽez
Gunjala Gondi
Japanese Braille
Sarati
Jenticha
Kharosthi
Mandombe
Masaram Gondi
Meroitic
Miao
Mwangwego
Pahawh Hmong
Sorang Sompeng
Tengwar
Thaana
Thomas Natural Shorthand
Warang Citi
Mwangwego
Rma

Alphabets

Linear

Adlam
Ariyaka
Armenian
Avestan
- Pazend
Avoiuli
Bassa Vah
Carian
Caucasian Albanian
Cirth
Coelbren
Coorgi–Cox alphabet
Coptic
Cyrillic
- Bosnian
- Early
Deseret
Duployan shorthand
- Chinook
Eclectic shorthand
Elbasan
Enochian
Etruscan
Evenki
Formosan
Fox II
Fraser
Gabelsberger shorthand
Gadabuursi
Garay alphabet
Georgian
- Asomtavruli
- Nuskhuri
- Mkhedruli
Veso Bey
Glagolitic
Gothic
Gregg shorthand
Greek (Archaic)
Greco-Iberian alphabet
Hangul
Hanifi
Jenticha
Kaddare
Kayah Li
Klingon
Latin
- Beneventan
- Blackletter
- Carolingian minuscule
- Fraktur
- Gaelic
- Insular
- Interlac
- IPA
- Kurrent
- Merovingian
- Sigla
- Sütterlin
- Tironian notes
- Visigothic
Luo
Lycian
Lydian
Manchu
Medefaidrin
Molodtsov
Mongolian
Mru
Mundari Bani
N'Ko
Ogham
Oirat
Ol Chiki
Old Hungarian
Old Italic
Old Permic
Orkhon
Old Uyghur
Ol Onal
Osage
Osmanya
Pau Cin Hau
Phrygian
Pisidian
Runic
- Anglo-Saxon
- Cipher
- Dalecarlian
- Elder Futhark
- Younger Futhark
- Gothic
- Marcomannic
- Medieval
- Staveless
Shavian
Sidetic
Sorang Sompeng
Sunuwar
Tifinagh
Todhri
Tolong Siki
Vagindra
Vellara
Visible Speech
Vithkuqi
Wancho
Warang Citi
Yezidi
Zaghawa

Non-linear

Ideograms
Adinkra Aztec Blissymbols Dongba Ersu Shaba Emoji Isotype Kaidā Miꞌkmaw Mixtec New Epoch Notation Painting Nsibidi Ojibwe Hieroglyphs Olmec Siglas poveiras Testerian Yerkish Zapotec

Logograms

Chinese family of scripts

Chinese characters	Simplified Traditional Oracle bone script Bronze scripts Seal script large small bird-worm Hanja Kanji Chữ Nôm Sawndip Bowen
Chinese-influenced	Jurchen Khitan large script Sui Tangut

Cuneiform

Other logosyllabic

Logoconsonantal

Numerals

Other

Sitelen Pona

Semi-syllabaries

Full	Linear Elamite Celtiberian Northeastern Iberian Southeastern Iberian Khom Dunging
Redundant	Espanca script Pahawh Hmong Khitan small script Southwest Paleohispanic Bopomofo Quốc Âm Tân Tự

Sign languages
ASLwrite SignWriting si5s Stokoe notation

Syllabaries
Afaka Bamum Bété Byblos Canadian Aboriginal Cherokee Cypriot Cypro-Minoan Ditema tsa Dinoko Eskayan Geba Great Lakes Algonquian Iban Idu Kana Hiragana Katakana Man'yōgana Hentaigana Sōgana Jindai moji Kikakui Kpelle Linear B Linear Elamite Lisu Loma Nüshu Nwagu Aneke script Old Persian cuneiform Sumerian Vai Woleai Yi Yugtun

Braille ⠃⠗⠁⠊⠇⠇⠑

Braille cell

Braille scripts

French-ordered

Albanian
Azerbaijani
Cantonese
Catalan
Chinese (mainland Mandarin) (largely reassigned)
Czech
Dutch
English (Unified English)
Esperanto
French
German
Ghanaian
Guarani
Hawaiian
Hungarian
Iñupiaq
IPA
Irish
Italian
Latvian
Lithuanian
Luxembourgish (extended to 8-dot)
Maltese
Māori
Navajo
Nigerian
Philippine
Polish
Portuguese
Romanian
Samoan
Slovak
South African
Spanish
Taiwanese Mandarin (largely reassigned)
Turkish
Vietnamese
Welsh
Yugoslav
Zambian

Nordic family	Estonian Faroese Icelandic Scandinavian Danish Finnish Greenlandic Northern Sámi Norwegian Swedish
Russian lineage family i.e. Cyrillic-mediated scripts	Belarusian Bulgarian Kazakh Kyrgyz Mongolian Russian Tatar Ukrainian
Egyptian lineage family i.e. Arabic-mediated scripts	Arabic Persian Urdu (Pakistan)
Indian lineage family i.e. Bharati Braille	Devanagari (Hindi / Marathi / Nepali) Bengali (Bangla / Assamese) Gujarati Kannada Malayalam Odia Punjabi Sinhala Tamil Telugu Urdu (India)
Other scripts	Amharic Armenian Burmese Dzongkha (Bhutanese) Georgian Greek Hebrew Inuktitut (reassigned vowels) Khmer Thai and Lao (Japanese vowels) Tibetan