白语, Báiyǔ
Native toYunnan, China
Native speakers
1.3 million (2003)[1]
  • Jianchuan-Dali
  • Panyi–Lama
Language codes
ISO 639-3Variously:
bca – Central Bai, Jianchuan dialect
bfs – Southern Bai, Dali dialect
bfc – Panyi Bai
lay – Lama Bai
ISO 639-6bicr

The Bai language (Bai: Baip‧ngvp‧zix; simplified Chinese: 白语; traditional Chinese: 白語; pinyin: Báiyǔ) is a language spoken in China, primarily in Yunnan Province, by the Bai people. The language has over a million speakers and is divided into three or four main dialects. Bai syllables are always open, with a rich set of vowels and eight tones. The tones are divided into two groups with modal and non-modal (tense, harsh or breathy) phonation. There is a small amount of traditional literature written with Chinese characters, Bowen (僰文), as well as a number of recent publications printed with a recently standardized system of romanisation using the Latin alphabet.

The origins of Bai have been obscured by intensive Chinese influence of an extended period. Different scholars have proposed that it is an early offshoot or sister language of Chinese, part of the Loloish branch or a separate group within the Sino-Tibetan family.


Bai language is located in Yunnan
Wang's survey sites in Yunnan, and the city of Dali

Xu and Zhao (1984) divided Bai into three dialects, which may actually be distinct languages: Jianchuan (Central), Dali (Southern) and Bijiang (Northern).[3] Bijiang County has since been renamed as Lushui County.[4] Jianchuan and Dali are closely related and speakers are reported to be able to understand one another after living together for a month.

The more divergent Northern dialects are spoken by about 15,000 Laemae (lɛ21 mɛ21, Lemei, Lama), a clan numbering about 50,000 people who are partly submerged within the Lisu.[5] They are now designated as two languages by ISO 639-3:

Wang Feng (2012)[9] provides the following classification for nine Bai dialects:


Wang (2012)[10] also documents a Bai dialect in Xicun, Dacun Village, Shalang Township, Kunming City (昆明市沙朗乡大村西村).[11]


The affiliation of Bai is obscured by over two millennia of influence from varieties of Chinese, leaving most of its lexicon related to Chinese etyma of various periods.[12] To determine its origin, researchers must first identify and remove from consideration the various layers of loanwords and then examine the residue.[13] In his survey of the field, Wang (2006) notes that early work was hampered by a lack of data on Bai and uncertainties in the reconstruction of early forms of Chinese.[14] Recent authors have suggested that Bai is an early offshoot from Chinese, a sister language to Chinese, or more distantly related (though usually still Sino-Tibetan).[15][16]

There are different tonal correspondences in the various layers.[17] Many words can be identified as later Chinese loans because they display Chinese sound changes from the last two millennia:[18]

Some of these changes date back to the first centuries AD.[19]

The oldest layer of Bai vocabulary with Chinese cognates, of which Wang lists some 250 words,[20] includes common Bai words that were also common in Classical Chinese, but are not used in modern varieties of Chinese.[21] Its features have been compared with current ideas on Old Chinese phonology:

Sergei Starostin suggests that these facts indicate a split from mainstream Chinese around the 2nd century BC, corresponding to the Western Han period.[31][32] Wang argues that a few of the correspondences between his reconstructed Proto-Bai and Old Chinese cannot be explained by the Old Chinese forms, and that Chinese and Bai therefore form a Sino-Bai group.[33] However, Gong suggests that at least some of these cases can be accounted for by refining the Proto-Bai reconstruction to take account of complementary distribution within Bai.[34]

Starostin and Zhengzhang Shangfang have separately argued that the oldest Chinese layer accounts for all but an insignificant residue of Bai vocabulary, and that Bai is therefore an early branching from Chinese.[29]

On the other hand, Lee and Sagart (1998) argued that the various layers of Chinese vocabulary are loans, and that when they are removed, a significant non-Chinese residue remains, including 15 entries from the 100-word Swadesh list of basic vocabulary. They suggest that this residue shows similarities with Proto-Loloish.[35] James Matisoff (2001) argued that the comparison with Loloish is less persuasive when considering other Bai varieties than the Jianchuan dialect used by Lee and Sagart, and that it is safer to consider Bai as an independent branch of Sino-Tibetan, though perhaps close to the neighbouring Loloish.[36] Lee and Sagart (2008) refined their analysis, presenting the residue as a non-Chinese form of Sino-Tibetan, though not necessarily Loloish. They also note that this residue includes the Bai vocabulary relating to pig rearing and rice agriculture.[37]

Lee and Sagart's analysis has been further discussed by List (2009).[38] Gong (2015) suggests that the residual layer may be Qiangic, pointing out that the Bai, like the Qiang, call themselves "white", whereas the Lolo use "black".[21]


The Jianchuan dialect has the following consonants, all of which are restricted to syllable-initial position:[39]

Labial Alveolar Palatal Velar
Stop unaspirated p t k
Affricate unaspirated ts
aspirated tsʰ tɕʰ
Fricative voiceless f s ɕ x
voiced v ɣ
Nasal m n ŋ
Approximant l j

The Gongxing and Tuolou dialects retain an older 3-way distinction for stop and affricate initials between voiceless unaspirated, voiceless aspirated and voiced. In the core eastern group, including the standard form of Dali, the voiced initials have become voiceless unaspirated, while other dialects show partial loss of voicing, conditioned by tone in different ways.[40] Some varieties also have an additional uvular nasal [ɴ] that contrasts phonemically with [ŋ].[41]

Jianchuan finals comprise:[39]

All but u, ɑo and iɑo have contrasting nasalized variants. Dali Bai lacks nasal vowels.[39] Some other varieties retain nasal codas instead of nasalization, though only the Gongxing and Tuolou dialects have a contrast between -n and .[42]

Jianchuan has eight tones, divided between those with modal and non-modal phonation.[43] Some of the western varieties have fewer tones.[44]


Bai has a basic subject–verb–object (SVO) order. However, SOV can be found in interrogative and negative sentences.

Writing system

Latin script

The old Bai script used modified Chinese characters, but its use was limited.[45] A new script based on the Latin alphabet was designed in 1958, based on the speech of the urban centre of Xiaguan, even though it was not a typical Southern dialect.[46] The idea of romanization was controversial among Bai elites and the system saw little use.[47] In a renewed attempt in 1982, language planners used the Jianchuan dialect as a base, because it represented an area with a significant population, almost all of whom spoke Bai. The new script was popular in the Jianchuan area, but was rejected in the more economically advanced area of Dali, which also had the largest number of speakers, albeit living alongside a large number of speakers of Chinese.[48][49] The script was revised extensively in 1993 to define two variants, representing Jianchuan and Dali respectively and has since been more widely used.[50][51][52]

Initials of the Bai writing system (1982, 1993)[53]
Labial Alveolar Retroflex Palatal Velar
Stop unaspirated b [p] d [t] g [k]
aspirated p [pʰ] t [tʰ] k [kʰ]
Nasal m [m] n [n] ni [ɲ] ng [ŋ]
Affricate unaspirated z [ts] zh [ʈʂ] j [tɕ]
aspirated c [tsʰ] ch [ʈʂʰ] q [tɕʰ]
Fricative voiceless f [f] s [s] sh [ʂ] x [ɕ] h [x]
voiced v [v] ss [z] r [ʐ] hh [ɣ]
Lateral and semivowel l [l] y [j]

The retroflex initials zh, ch, sh and r are used only in recent loanwords from Standard Chinese or for other Bai varieties.[54]

Vowels of the revised Bai writing system (1993)[54]
i [i] ei [e] ai/er [ɛ]/[əɹ] a [ɑ] ao [ɔ] o [o] ou [ou] u [u] e [ɯ] v [v̩]
iai/ier [iɛ]/[iəɹ] ia [iɑ] iao [iao] io [io] iou [iou] ie [iɯ]
u [ui] uai/uer [uɛ]/[uəɹ] ua [uɑ] uo [uo]

The 1993 revision introduced variants ai/er etc, with the former to be used for Jianchuan Bai and the latter for Dali Bai.[55] In Jianchuan, all vowels but ao, iao, uo, ou and iou have nasalized counterparts, denoted by a suffixed n.[54] Dali Bai lacks nasalized vowels.[39]

Suffixed letters indicate tone contours and modal or non-modal phonation.[54][56] This was the most radical aspect of the 1993 revision:

Tone marking in the 1982 and 1993 systems[57]
Pitch contour and phonation 1982 spelling 1993 spelling Notes
high level (55), modal -l -l
mid level (33), modal -x -x
mid falling (31), breathy -t -t
mid rising (35), modal -f -f
mid-low falling (21), harsh (unmarked) -d
high level (55), tense -rl -b Jianchuan only
mid-high level (44), tense -rx (unmarked)
mid-high falling (42), tense -rt -p
mid falling (32), modal -p/-z distinguished in Dali only

Bowen script

Shanhua tablet (山花碑), written in Bowen script.
Shanhua tablet (山花碑), written in Bowen script.

Bowen script (Chinese: 僰文; pinyin: bówén), also known as Square Bai Script (Chinese: 方块白文), Hanzi Bai Script (simplified Chinese: 汉字白文; traditional Chinese: 漢字白文), Hanzi-style Bai Script (simplified Chinese: 汉字型白文; traditional Chinese: 漢字型白文), or Ancient Bai Script (Chinese: 古白文), was a logographic script formerly used by the Bai people, adapted from Hanzi to fit the Bai language.[58] The script was used from the Nanzhao period to the beginning of the Ming dynasty.[59]

The Shanhua tablet (山花碑), from Dali Town in Yunnan, contains a poem written using Bowen text from the Ming dynasty by the Bai poet Yang fu (杨黼),[60] 《詞記山花·詠蒼洱境》.[61]


Nge, no – I
Ne, no – you

Cai ho – red flower
Gei bo – rooster
A de gei bo – a rooster

Ne mian e ain hain? – What's your name?
Ngo mian e A Lu Gai. – My name is A Lu Gai.
Ngo ze ne san se yin a biu. – I don't recognize you.

Ngo ye can. – I'm eating.
Ne can ye la ma? – Have you eaten?
Ne ze a ma yin? – Who are you?
Ne ze nge mo a bio. – You are not my mother.
Ngo zei pi ne gan. – I'm taller than you.
Ne nge no hha si bei. – You won't let me go.


  1. ^ Up to the 16th edition of Ethnologue (2009), the ISO 639-3 code lay was assigned to "Lama (Myanmar)", listed in the index of languages by C. F. Voegelin and F. M. Voegelin (1977) as a Nungish language of Myanmar. In 2013 the reference name for the code was changed to "Bai, Lama".[8]


  1. ^ Central Bai, Jianchuan dialect at Ethnologue (18th ed., 2015) (subscription required)
    Southern Bai, Dali dialect at Ethnologue (18th ed., 2015) (subscription required)
    Panyi Bai at Ethnologue (18th ed., 2015) (subscription required)
    Lama Bai at Ethnologue (18th ed., 2015) (subscription required)
  2. ^ Ramsey 1987, p. 290.
  3. ^ Wang 2006, p. 115.
  4. ^ Allen 2007, p. 6.
  5. ^ Bradley 2007, pp. 363, 393–394.
  6. ^ a b Wang 2006, p. 31.
  7. ^ Johnson, Eric (2013). "Change Request Documentation: 2013-006". ISO 639-3 Registration Authority.
  8. ^ a b Johnson, Eric (2013). "Change Request Documentation: 2013-007". ISO 639-3 Registration Authority.
  9. ^ Wang, Feng 汪锋 (2012). 语言接触与语言比较:以白语为例 [Language Contact and Language Comparison: The Case of Bai] (in Chinese). Beijing: 商务印书馆. pp. 92–94.
  10. ^ Wang, Feng 王锋 (2012). 昆明西山沙朗白语研究 [A Study of the Bai Language of Shalang] (in Chinese). Beijing: 中国社会科学出版社.
  11. ^ 五华区沙朗乡大村村委会西村. (in Chinese). Archived from the original on 2016-10-10. Retrieved 2016-10-09.
  12. ^ Norman 2003, pp. 73, 75.
  13. ^ Ramsey 1987, p. 291.
  14. ^ Wang 2005, pp. 102–107.
  15. ^ Norman 2003, p. 73.
  16. ^ Wang 2005, pp. 109–116.
  17. ^ Lee & Sagart 2008, pp. 7–8, 10, 12–13.
  18. ^ Starostin 1995, pp. 3–4.
  19. ^ Starostin 1995, p. 4.
  20. ^ Wang 2006, pp. 205–211.
  21. ^ a b Gong 2015, p. 2.
  22. ^ Wang 2006, pp. 131, 144.
  23. ^ Gong 2015, p. 11.
  24. ^ Starostin 1995, p. 3.
  25. ^ Wang 2006, p. 133.
  26. ^ Gong 2015, p. 9.
  27. ^ Starostin 1995, pp. 4–5.
  28. ^ Starostin 1995, p. 12.
  29. ^ a b Wang 2005, pp. 110–111.
  30. ^ Starostin 1995, p. 2.
  31. ^ Starostin 1995, pp. 2, 17.
  32. ^ Wang 2005, p. 110.
  33. ^ Wang 2006, pp. 165–171.
  34. ^ Gong 2015, pp. 4, 7.
  35. ^ Lee & Sagart 1998.
  36. ^ Matisoff 2001, p. 39.
  37. ^ Lee & Sagart 2008.
  38. ^ List 2009.
  39. ^ a b c d Wiersma 2003, p. 655.
  40. ^ Wang 2006, pp. 58–72.
  41. ^ Allen 2007.
  42. ^ Wang 2006, pp. 32–44, 74.
  43. ^ Wiersma 2003, pp. 655, 658.
  44. ^ Wang 2006, pp. 32–44.
  45. ^ Wang 2004, pp. 278–279.
  46. ^ Zhou 2012, pp. 271, 273.
  47. ^ Zhou 2012, p. 110.
  48. ^ Zhou 2012, p. 273.
  49. ^ Wiersma 2003, pp. 653–654.
  50. ^ Zhou 2012, pp. 273–274.
  51. ^ Wiersma 2003, p. 654.
  52. ^ Wang 2004, p. 279.
  53. ^ Zhou 2012, pp. 146, 272.
  54. ^ a b c d Zhou 2012, p. 272.
  55. ^ Zhou 2012, pp. 272–273.
  56. ^ Wiersma 1990, p. 58.
  57. ^ Wiersma 2003, p. 659.
  58. ^ 方块白文简介及字符集 (in Chinese). 中国少数民族文字数据库. Archived from the original on 2006-05-24. Retrieved 2008-11-18.
  59. ^ Zhao, Yansun 赵衍荪 (1987). 白文(中国民族古文字). 中国民族古文字 (in Chinese).
  60. ^ Xu, Lin 徐琳; Zhao, Yansun 赵衍荪 (1980). "白文《山花碑》释读". 民族语文 (in Chinese) (3): 50–57.
  61. ^ Yang, Zhengye 杨政业 (April 1997). 论"白(僰)文"的形态演化及其使用范围 (in Chinese). 大理学院学报(社会科学版). Retrieved 2008-11-18. ((cite journal)): Cite journal requires |journal= (help)

Works cited

Further reading