General Chinese (Chinese: 通字; pinyin: tōng zì; Wade–Giles: t'ung1-tzu4) is a diaphonemic orthography invented by Yuen Ren Chao to represent the pronunciations of all major varieties of Chinese simultaneously.[1] It is "the most complete genuine Chinese diasystem yet published".[2] It can also be used for the Korean, Japanese, and Vietnamese pronunciations of Chinese characters, and challenges the claim that Chinese characters are required for interdialectal communication in written Chinese.

General Chinese is not specifically a romanization system, but two alternative systems: one (Tung-dzih Xonn-dzih) uses Chinese characters phonetically, as a syllabary of 2082 glyphs, and the other (Tung-dzih Lo-maa-dzih) is an alphabetic romanization system with similar sound values and tone spellings to Gwoyeu Romatzyh.

Character-based General Chinese

The character version of General Chinese uses distinct characters for any traditional characters that are distinguished phonemically in any of the control varieties of Chinese, which consist of several dialects of Mandarin, Wu, Min, Hakka, and Yue. That is, a single syllabic character will correspond to more than one logographic character only when these are homonyms in all control dialects. In effect, General Chinese is a syllabic reconstruction of the pronunciation of Middle Chinese, less distinctions which have been dropped nearly everywhere.

The result is a syllabary of 2082 syllables, about 80% of which are single morphemes—that is, in 80% of cases there is no difference between GC and standard written Chinese, and in running text, that figure rises to 90–95%, as the most common morphemes tend to be uniquely identified. For example, kai can mean only 開 kāi 'open', and sam can mean only 三 sān 'three'.[3] Chao notes, "These syllables then are morphemes, or words with definite meanings, or clusters of meanings related by extensions. About 20 percent of the syllables are homophones under each of which there will be more than one morpheme, [which are traditionally] usually written with different characters [...] The degree of homophony is so low that it will be possible to write text either in literary or colloquial Chinese with the same character for each syllable [...] as has been tested in texts of various styles." Chao compares General Chinese to how Chinese was written when the writing system was still productive: "This amounts to a 100 percent use of writing Chinese by 'phonetic loan' [...] The situation is that when the ancients wrote a character by sound regardless of meaning, it was a 'loan character', whereas if a modern schoolboy writes one, he is punished for writing the wrong character!"[4]

Taking a telegraphic code-book of about 10,000 characters as a representative list of characters in modern use,[5] Chao notes that General Chinese results in a reduction of 80% in the number of characters needing to be learned.

In the 20% of cases where a syllable corresponded to more than one word, Chao generally selected the graphically most basic traditional character for General Chinese, as long as it wasn't unduly rare. However, when that character had strong semantic connotations that would have interfered with a phonetic reading, he selected a more neutral character. This phenomenon is familiar from Chinese transcriptions of foreign names.

Romanized General Chinese

Romanized General Chinese has distinct symbols for the onsets (many of them digraphs, and a few trigraphs) and the rimes distinguished by any of the control dialects. For example, it retains the final consonants p, t, k, and the distinction between final m and n, as these are found in several modern dialects, such as Cantonese. General Chinese also maintains the "round-sharp" distinction, such as sia vs. hia, though those are both xia in Beijing Mandarin. It also indicates the "muddy" (voiced) stops of Shanghainese. Indeed, Chao characterized GC as having "the initial consonants of the Wu dialects [...], the vowels of Mandarin, and the endings of Cantonese. It can, however, be pronounced in any dialect, and it is meant to be, by a relatively short list of rules of pronunciation."[6]

Like Chao's other invention, Gwoyeu Romatzyh, romanized General Chinese uses tone spelling. However, the system is somewhat different. The difference between the yin and yang tones is indicated by the voicing of the initial consonant, which is possible because the original voicing distinctions are retained. Given that some tones are indicated by changing rather than adding letters, writing tone requires on average only one additional letter for every three syllables of text.

The digraphs are not reliably featural; for example, the digraphs for the voiced stops do not all follow the same pattern. This is because Chao ran frequency tests, and used single letters for the most common consonants and vowels, while restricting digraphs and trigraphs to the more infrequent ones. Overall, syllables in the texts he transliterated averaged under 3​12 letters apiece.

An example of Romanized General Chinese can be illustrated with Chao's name:

General Chinese dhyao qiuan remm
Mandarin Chinese jaw yuan renn (GR)
zhào yuán rèn (Pinyin)
Cantonese jiuh yùhn yahm (Yale)
ziu6 jyun4 jam6 (Jyutping)
Taiwanese Hokkien tiō gôan jīm (POJ)
Shanghainese (d)zau* gnioe gnin (Ghoqdaon, Vernacular)
(d)zau yoe zen (Ghoqdaon and Yahwe, Literary)
(d)zau nyoe nyin (Yahwe, Vernacular)
Japanese, go'on reading deu gwan nin Historical kana orthography
gan nin Modern kana usage
Japanese, kan'on reading teu gen jin Historical kana orthography
chō gen jin Modern kana usage
Korean cho wŏn im (McCune-Reischauer)
jo won yim (Revised Romanization)
Vietnamese triệu nguyên nhiệm (Chữ quốc ngữ)

* <dz> is reduced to <z> in downtown accents.

All the General Chinese initials here are voiced: The h in dh shows that this is a "muddy" consonant, and the q in qiuan represents an initial ng- (becoming g in Japanese). This voicing shows up in the Cantonese yang tones, which are represented by h in Yale romanization. "Heavy" codas, such as remm, indicate the "departing" (; ) tone, as in Gwoyeu Romatzyh. Similarly, the spelling ao in dhyao indicates the "rising" (; shàng) tone, but because of the voiced initial, it merges with "departing" in Mandarin and literary Cantonese (though not in colloquial Cantonese). The y in dhyao indicates that the initial is a stop in Min, Japanese, and Vietnamese, but otherwise an affricate. Cantonese and Korean retain the final m of remm. These pronunciations are all predictable given the General Chinese transcription, though it was not designed with the Sinospheric languages specifically in mind. Both the pre-war and post-war Japanese orthographies are recoverable.

In every control dialect, some syllables with different spellings will be pronounced the same. However, which these are differs from dialect to dialect. There are some irregular correlations: Often a particular variety will have a pronunciation for a syllable that is not what one would expect from other syllables with similar spellings, due to irregular developments in that variety. This is especially true with the voicing of Japanese consonants, which has evolved idiosyncratically in different compound words. However, except for Japanese voicing, the system is phonetic about 90% of the time.

Onsets and rhymes

Character GC has a separate character for each syllable. However, romanized GC has distinct onsets and rhymes. The onsets are as follows:


The 36 initials
stops and affricates fricatives sonorants
tenuis aspirate voiced tenuis voiced
bilabial b /p/ p /pʰ/ bh /b/ m /m/
labiodental f /f/ fv /v/ v /w̃/[7]
dental stops d /t/ t /tʰ/ dh /d/ n /n/,
l /l/
sibilants z /ts/ ts /tsʰ/ dz /dz/ s /s/ sz /z/
retroflex[8] dr /tʂ/ tr /tʂʰ/ jr /dʐ/ sr /ʂ/ zr[9] /ʐ/ r /ȷ̃/[10]
palatal[8] stops dy /tʲ, ʈ/ ty /tʲʰ, ʈʰ/ dhy /dʲ, ɖ/
sibilants j /tɕ/ ch /tɕʰ/ dj /dʑ/ sh /ɕ/ zh[11] /ʑ/
velar c /k/ k谿 /kʰ/ g /ɡ/ x /x/ h /ɣ/ q /ŋ/
laryngeal /ʔ/ y, w, h /j/

泥 and 娘 are both transcribed ⟨n⟩, as these are not distinct in modern dialects. 喻, a conflation of two older initials, 云 hy~hw and 以 y~w, is transcribed ⟨h⟩ or ∅ according to modern rather than ancient forms; when palatalization is lost, it is transcribed ⟨w⟩. The palatal and retroflex fricatives 照穿牀審禪 fell together early on in the rime tables of Classical Chinese, but are still distinguished in some modern dialects, and so are distinguished here. The convention ⟨q⟩ for nasal 疑, which drops in many dialects, is repeated in the finals, where it represents *[ŋ] with a departing tone.

Although to some extent systematic—the retroflex series are digraphs ending in ⟨r⟩, for example—this is overridden in many cases by the principle of using short transcriptions for common sounds. Thus ⟨z⟩ is used for 精 rather than for the less common 邪, where it might also be expected; ⟨v⟩ is used for frequent 微 rather than for 奉; and ⟨c⟩ and ⟨g⟩, for the high-frequency 見 and 羣, have the additional benefit of being familiar in their palatalized forms (Peking ~ Beijing for example is -⟨cieng⟩) from English words like cello and gem.

Dialectal correspondences

The voiced obstruents (the 濁 "muddy" column) are only distinct in Wu dialects. In Min, they are collapsed with the consonants of the tenuis column. Elsewhere they are generally collapsed with the aspirated column in the even tone, and with the tenuis column in other tones. An exception is Cantonese, where in the rising tone they are aspirated in colloquial speech, but tenuis in reading pronunciations. The sonorants do not vary much apart from ⟨v⟩, ⟨r⟩, which in Wu are nasals [m], [ɲ] colloquially but fricatives [v], [z] when read.

Velars ⟨c⟩, ⟨k⟩, ⟨g⟩ are palatalized to affricates before ⟨i⟩, ⟨iu⟩ (the high-front vowels [i], [y]) apart from Min and Yue, where they remain stops before all vowels; ⟨x⟩, ⟨h⟩ also palatalize, but remain fricatives. For instance in Mandarin, they are g, k, h before non-palatalizing vowels and j, q, x before palatalizing vowels, whereas in Cantonese they remain g, k, h everywhere. (Compare the alternate spellings of Beijing and Peking; see Palatalization (sound change) § Mandarin Chinese) The alveolar sibilants ⟨z⟩, ⟨ts⟩, ⟨dz⟩, ⟨s⟩, ⟨sz⟩ (Mandarin z, c, s) are also generally palatalized before ⟨i⟩, ⟨iu⟩ (to Mandarin j, q, x), collapsing with the palatalized velars ⟨c⟩, ⟨k⟩, ⟨g⟩, ⟨x⟩, ⟨h⟩ in dialects which have lost the "round-sharp" distinction so important to Peking opera.

The palatal stops ⟨dy⟩, ⟨ty⟩, ⟨dhy⟩ remain stops only in Min among the Chinese topolects (though also in Japanese and Vietnamese loan words); elsewhere they are conflated with the affricates. The palatal and retroflex sibilants are generally conflated; in Yue and Min, as well as in much of Wu and Mandarin, they are further conflated with the alveolar sibilants. This contrast remains in Beijing, where [sán] 'three' is distinct from [ʂán] 'mountain'; both are [sán] in Sichuanese and Taiwanese Mandarin.

There are numerous more sporadic correlations. For instance, the alveolar affricates ⟨z⟩, ⟨ts⟩, ⟨dz⟩ become stops [t], [tʰ] in Taishan Yue, whereas the alveolar stops are debuccalized to [ʔ], [h], as in Hoisaan for Cantonese Toisaan (Taishan). In Yüchi, Yunnan,[clarification needed] it is the velars ⟨c⟩, ⟨k⟩, ⟨g⟩ which are debuccalized, to [ʔ], [ʔʰ]. In the Min dialects, ⟨f⟩, ⟨fv⟩ become [h] or [ʍ]. In Xi'an Mandarin, the fricatives ⟨sh⟩, ⟨sr⟩, ⟨zh⟩, ⟨zr⟩ are rounded to [f] before rounded vowels, as in [fêi] 'water' (Beijing shuǐ [ʂwèi]).


The categories of the Late Middle Chinese rime tables are reduced to the four medials of modern Chinese, plus an intermediate type ⟨e⟩:

Division I Division II Divisions III and IV
others back initials
Open (開) e i
Closed (合) u iu

⟨i⟩ and ⟨iu⟩ are omitted after labiodental initials.

Dialectal correspondences

The medial ⟨e⟩ is used for syllables which have a palatalizing medial [i] in Mandarin, but no medial in Yue. That is, in Mandarin ⟨e⟩ should be read as ⟨i⟩, with the same effect on consonants as ⟨i⟩ has, whereas in Cantonese it is silent. In Shanghainese both situations occur: ⟨e⟩ is equivalent to ⟨i⟩ in reading pronunciations, as [i] or [y], but is not found in colloquial speech.

In Cantonese, medial ⟨i⟩ can be ignored after sibilants, as palatalization has been lost in these series. That is, siao, shao are read the same.


Chao uses the following rimes. They do not always correspond to the Middle Chinese rimes.

Rimes consist of a nucleus (the main vowel) and optionally a coda. They need to be considered as a unit because of a strong historical interaction between vowel and coda in Chinese dialects. The following combinations occur (note that the vowel ⟨iu⟩ is treated as medial ⟨i⟩ plus nucleus ⟨u⟩):

-i -u -m -n -ng
o om on ong
a ai au am an ang
e ei eu em en
i im in ing
u un ung

Dialectal correspondences

The most salient dialectal difference in rimes is perhaps the lack of the obstruent codas ⟨p⟩, ⟨t⟩, ⟨c⟩ in most dialects of Mandarin and independently in the Wencheng dialect of Oujiang, though this has traditionally been seen as a loss of tone (see below). In Wu, Min (generally), New Xiang (Hunanese), Jin, and in the Lower Yangtze and Minjiang dialects of Mandarin, these codas conflate to glottal stop /ʔ/. In others, such as Gan, they are reduced to [t], [k], while Yue dialects, Hakka, and Old Xiang maintain the original [p], [t], [k] system.

Nasal codas are also reduced in many dialects. Mandarin and Wu do not distinguish between ⟨m⟩ and ⟨n⟩, with them being reduced to [n] or nasal vowels, or in some cases dropped altogether. In Shanghainese many instances of ⟨nɡ⟩ have conflated as well, or been dropped, but a phonemic distinction is maintained.

In Mandarin, an additional coda is found, -er [ɻ], from GC ⟨ri⟩.[12]

In Cantonese, the simple vowels i u iu o a e are [iː uː yː ɔː aː ɛː], apart from ⟨i⟩ and ⟨iu⟩ after velars, which open to diphthongs, as in ci [kei] and ciu [kɵy]. Diphthongs may vary markedly depending on initial and medial, as in cau [kou], ceau [kaːu], ciau [kiːu], though both ceu ~ cieu are [kɐu], following the general pattern of ⟨e⟩ before a coda (cf. cen [kɐn] vs can [kaːn]). Cantonese does not have medials, apart from gw, kw, though sometimes it is the nuclear vowel which drops: giung [kʰoŋ], xiong [hoŋ], but giuan [kʰyːn].

Combinations of medials and rimes

The following combinations of orthographic medials and rimes occur, taking -iu to be medial i + rime u :

o a e ai ei i u au eu om am em im on an en un in ang* aeng eng ing ung ong*
e ea eai eau eam ean eang eaeng
i ia ie iu iau ieu iem ien iun iang ieng iung iong
u uo ua uai uei ui uon uan uang uaeng ueng
iu iue iuan§ iueng iuing[13]
* In entering tone, ang (eang, iang, uang) changes to oc (eoc, ioc, uoc), and ong (iong) changes to ouc (iouc)
-eaeng is generally shortened to -aeng
§ In entering tone, iuan changes to iuet

Double cells show discrepancies between analysis and orthography. For instance, Chao analyzes ieng, iueng as part of the aeng series rather than the eng series, and ien as part of the an series.[14] Though not apparent from the chart, eng-ing-ueng-iuing, ung-ong-iung-iong, and en-in-un-iun are similar series. The discrepancies are due to an effort to keep frequent syllables short: en-in-un-iun rather than *en-ien-uen-iuen, for example; as well as a reflection of some of the more widespread phonological changes in the rimes.

The Classical correspondences, with many archaic distinctions lost, are as follows:

The Classical rimes[missing -a, -an, -ia]
開 unrounded rounded
Div. I Div. II Divs. III & IV Div. I Div. II Divs. III & IV
-i 支脂之微 -ui 支脂微
果假 -o 歌 -ea 麻 -ie 戈麻 -(u)o 戈 -ua 麻 -iue 戈
-ai 咍 -eai 皆佳 -ei 祭廢齊 -uei 灰 -uai 皆佳 -uei 祭廢齊
-au 豪 -eau 肴 -iau 宵蕭
-eu 侯 -ieu 尤幽
-u 模 -iu 魚虞
-am,om 談 -eam 咸銜 -iem,am,om/iep,ap 鹽嚴添凡
咸深 -om/ap,op 覃 -im,em 侵
-an,on 寒 -ean 刪山 -ien,an/iet,et 仙元先 -(u)on 桓 -uan 刪山 -iuan/at,iuet,ot 仙元先
-en 痕 -in,en 真瑧欣 -un,en 魂 -en,iun/ut,iut 諄文
宕江 -ang 唐 -eang,uang 江 -iang 陽 -uang 唐 -uang 陽[15]
-aeng 庚耕 -ieng,aeng/iec 清唐青 -uaeng 庚耕 -iueng 清唐青
-ung 東 -iung 東
-ong 冬 -iong 鍾
-eng 登 -ing,eng 蒸 -ueng 登 -iuic[13]

These all occur in the velar-initial series, but not all in the others.

Dialectal correspondences

In Cantonese, after coronal stops and sibilants, rounded finals such as -on and -uan produce front rounded vowels, as in don [tyːn], and after velars, iung and iong lose their ⟨i⟩. Min dialects are similar, but in certain tones ⟨i⟩ and ⟨iu⟩ become diphthongs rather that their usual [i], [y]. For example, in Fuzhou, even-tone 星 sieng is [siŋ] but departing-tone 性 sieq is [seiŋ].

In Yunnan Mandarin, ⟨iu⟩ is pronounced as ⟨i⟩, so that the name of the province, yunnom, is [inna] rather than [ynnan] as in Beijing.

In Nanking, ⟨ien⟩ metathesizes to [ein] after alveolars, as in 天 [tʰein] for Beijing tian [tʰiɛn].


The basic spelling is used for the even 平 tone(s). For the rising 上 tone(s), the nucleus is doubled (with the vowel ⟨iu⟩ → ⟨iuu⟩, as that is treated as medial ⟨i⟩ + nucleus ⟨u⟩), or the coda is changed to a 'lighter' letter. For the departing 去 tone(s), the coda is made 'heavier'; if there is no coda, add ⟨h⟩. For the entering 入 tone(s), a stop coda is used.

'Lighter' means that a vowel coda is made more open (⟨i⟩ → ⟨e⟩, ⟨u⟩ → ⟨o⟩); 'heavier' means that a vowel coda is made more close (⟨i⟩ → ⟨y⟩, ⟨u⟩ → ⟨w⟩) and a nasal coda (⟨n⟩, ⟨m⟩) is doubled. The nasal ⟨ng⟩ is 'lightened' to ⟨g⟩ (as in many Polynesian languages) and made heavier as ⟨q⟩ (as in the GC initial):

coda even 平 rising 上 departing 去 entering 入
(none) ba baa bah
ciu ciuu ciuh
-i fui fue fuy
-u cau cao caw
-m lam laam lamm lap
-n ren reen renn ret
-ng jang jag jaq joc

One consequence of this is that the rimes -e and -ei in the even tone conflate to ⟨ee⟩ in the rising tone. However, since there are no such syllables which begin with the same consonant and medial, no syllables are actually conflated.

The difference between yin and yang tones is indicated by the voicing of the consonant. A zero consonant is treated as voiceless (it is sometimes reconstructed as a glottal stop), so i, iem, uon, iuan are ping yin (Mandarin yī, yān, wān, yuān), whereas yi, yem, won, yuan are ping yang (Mandarin yí, yán, wán, yuán). In a few cases, the effect that voiced ⟨m⟩, ⟨n⟩, ⟨l⟩, ⟨r⟩ have on tone needs to be negated to achieve a ping yin tone. This is accomplished by spelling them ⟨mh⟩, ⟨nh⟩, ⟨lh⟩, ⟨rh⟩.

To mark the toneless Mandarin syllable ma, a centered dot is used: ⟨·ma⟩. The dot is omitted for toneless ⟨me, de, te, ne, le⟩, as tonic me, de, te, ne, le do not exist.

Dialectal correspondences

The realization of the tones in the various varieties of Chinese is generally predictable; see Four tones for details. In Beijing Mandarin, for example, even tone is split according to voicing, with muddy consonants becoming aspirates: ba, pa, ma, bhabā, pā, má, pá (and mha). Departing tone is not split, and muddy consonants become tenuis: bah, pah, mah, bhahbà, pà, mà, bà. Rising tone splits, not along voicing, but with muddy-consonant syllables conflating with departing tone: baa, paa, maa, bhaabǎ, pǎ, mǎ, bà. That is, bhaa and bhah are homonyms in Beijing, as indeed they are in all of Mandarin, in Wu apart from Wenzhounese, in Hakka, and in reading pronunciations of Cantonese. Entering tone is likewise split in Beijing: mat, bhatmà, bá.[16]

However, the realization of entering tones in Beijing dialect, and thus in Standard Chinese, is not predictable when a syllable has a voiceless initial such as bat or pat. In such cases even syllables with the same GC spelling may have different tones in Beijing, though they remain homonyms in other Mandarin dialects, such as Xi'an and Sichuanese.[17] This is due to historical dialect-mixing in the Chinese capital that resulted in unavoidably idiosyncratic correspondences.

In Yue, there is a straightforward split according to consonant voicing, with a postvocalic ⟨h⟩ in Yale romanization for the latter. Muddy onsets become aspirates in even and rising tones, but tenuis in departing and entering tones: ba, pa, ma, bhabā, pā, māh, pāh; baa, paa, maa, bhaabá, pá, máh, páh; bah, pah, mah, bhahba, pa, mah, bah; bat, pat, mat, bhatbaat, paat, maht, baht. In addition, there is a split in entering tone according to vowel length, with Cantonese mid-entering tone for short vowels like bāt, pāt.[18] In reading pronunciations, however, rising tone syllables with muddy onsets are treated as departing tone: bhaabah rather than → páh. There is also an unpredictable split in the even ping yin tone which indicates diminutives or a change in part of speech, but this is not written in all Cantonese romanizations (it is written in Yale, but not in Jyutping).

Sample text

Chao provided this poem as an example. The character text is no different in GC and standard Chinese, apart from 裏 , which in any case has now been substituted with Chao's choice of 里 on the Mainland. Note that simplified characters like this would affect all of Chao's proposal, so that 對 below would become 对, etc. The only other difference is 他 for 'her', which may differ from contemporary written Chinese 她, but which follows Classical usage.[19]

Romanized GC Character GC Pinyin English

Si Zuucuec Yee
(Hu Shiec)


Sī Zǔguó Yě
(Hú Shì)


Thinking of One's Own Country
(Hu Shih)

Nii sim-lii ay ta,

Moc shot but ay ta.



Nǐ xīnlǐ ài tā,

Mò shuō bù ài tā.

In your heart you love her,

Don't say you don't love her.

Iaw konn nii ay ta,

Tsiee deg ren hay ta.



Yào kàn nǐ ài tā,

Qiě děng rén hài tā.

Seeing that you love her,

Yet you let one hurt her.

Tag yeo ren hay ta,

Nii ruho duey ta?



Tǎng yǒu rén hài tā,

Nǐ rúhé duì tā?

If someone hurts her,

How will you meet her?

Tag yeo ren ay ta,

Caeq ruho dhay ta?



Tǎng yǒu rén ài tā,

Gèng rúhé dài tā?

If someone loves her,

How will you treat her?


