This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.

This article includes inline links to audio files. If you have trouble playing the files, see Wikipedia Media help.

The phonology of Japanese features a phonemic inventory including five vowels (/a, e, i, o, u/) and 12^[1] or more consonants (the number of consonant phonemes varies greatly depending on how certain sounds are analyzed). The phonotactics are relatively simple, allowing for few consonant clusters. Japanese phonology has been affected by the presence of several layers of vocabulary in the language: in addition to native Japanese vocabulary, Japanese has a large amount of Chinese-based vocabulary and loanwords from other languages.^[2]

Standard Japanese is characterized by a pitch accent system where the position or absence of a pitch drop may determine the meaning of a word: /haꜜsiɡa/ (箸が, 'chopsticks'), /hasiꜜɡa/ (橋が, 'bridge'), /hasiɡa/ (端が, 'edge').

Unless otherwise noted, the following describes the standard variety of Japanese based on the Tokyo dialect.

Lexical strata

Discussions of Japanese phonology often refer to different "strata" or layers of vocabulary, as many statements about phonemes and phonotactics are only valid as generalizations over a subset of vocabulary items. For example, the consonant [p] generally does not occur at the start of Yamato and Sino-Japanese words, but occurs freely in this position in mimetic and foreign words.^[3] The following four strata may be distinguished:^[2]^[4]^[5]

Yamato

Main article: Wago

Called wago (和語)^[5] or yamato kotoba (大和言葉) in Japanese, this category comprises inherited native vocabulary. Morphemes in this category show a number of restrictions on structure that may be violated by vocabulary in other layers.

Mimetic

Main article: Japanese sound symbolism

Japanese possesses a variety of mimetic words that make use of sound symbolism to serve an expressive function. Like Yamato vocabulary, these words are also of native origin, and can be considered to belong to the same overarching group. However, words of this type show some phonological peculiarities that cause some theorists to regard them as a separate layer of Japanese vocabulary.^[4]^[6]

Sino-Japanese

Main article: Sino-Japanese vocabulary

Called kango (漢語) in Japanese, words in this stratum originate from several waves of large-scale borrowing from Chinese that occurred from the 6th-14th centuries AD. They comprise 60% of dictionary entries and 20% of ordinary spoken Japanese, ranging from formal vocabulary to everyday words. Most Sino-Japanese words are composed of more than one Sino-Japanese morpheme. Sino-Japanese morphemes have a limited phonological shape: each has a length of at most two moras, which Ito & Mester (2015a) argue reflects a restriction in size to a single prosodic foot. These morphemes represent the Japanese phonetic adaptation of Middle Chinese monosyllabic morphemes, each generally represented in writing by a single Chinese character, taken into Japanese as kanji (漢字). Japanese writers also repurposed kanji to represent native vocabulary; as a result, there is a distinction between Sino-Japanese readings of kanji, called On'yomi, and native readings, called Kun'yomi.^[7]

The moraic nasal /N/ is relatively common in Sino-Japanese, and contact with Middle Chinese is often described as being responsible for the presence of /N/ in Japanese (starting from approximately 800 AD in Early Middle Japanese), although /N/ also came to exist in native Japanese words as a result of sound changes.^[8]

Foreign

Main article: Loanwords in Japanese

Called gairaigo (外来語) in Japanese, this layer of vocabulary consists of non-Sino-Japanese words of foreign origin, mostly borrowed from Western languages after the 16th century; many of them entered the language in the 20th century.^[9] In words of this stratum, a number of consonant-vowel sequences that did not previously exist in Japanese are tolerated.^[10] This has led to the introduction of new spelling conventions and complicates the phonemic analysis of these consonant sounds in Japanese: some consonants that were once allophones may now be analyzed as having attained phonemic status.^[11]

Consonants

	Bilabial	Alveolar	Alveolo- palatal	Palatal	Velar	Uvular	Glottal
Nasal	m	n	(ɲ)		(ŋ)	(ɴ)
Plosive	p b	t d			k ɡ
Affricate		(ts) (dz)	(tɕ) (dʑ)
Fricative	(ɸ)	s z	(ɕ) (ʑ)	(ç)			h
Liquid		r
Semivowel				j	w
Special moras	/N/ /Q/

Consonants inside parentheses can be analyzed as allophones of other phonemes, at least in native words. In loanwords, /ɸ, ts/ sometimes occur phonemically.^[12] In some analyses the glides [j, w] are not interpreted as consonant phonemes;^[13] Lawrence (2004) considers the glides to be non-syllabic variants of the high vowel phonemes /i, u/, and argues the use of [j, w] vs. [i, ɯ] may be predictable if both phonological and morphological context is taken into account.

Phonetic notes

In word-initial position, voiceless stops /p, t, k/ are slightly aspirated^[14]—less so than English stops, but more than those in Spanish.^[15] Word-medial /p, t, k/ seem to be unaspirated on average^[14] (although some descriptions have supposed them to be aspirated in accented syllables^[16]).
The phonetic realizations of /b, d, ɡ/ cover a range that includes the voiced plosives [b, d, ɡ] but also some other similar phones. A 2019 study of young adult speakers found that after a pause, word-initial /b, d, ɡ/ may be realized as plosives with zero or low positive voice onset time (categorizable as voiceless unaspirated or "short-lag" plosives); while significantly less aspirated on average than word-initial /p, t, k/, some overlap in voice onset time was observed.^[17] A secondary cue to the distinction between /b, d, ɡ/ and /p, t, k/ in word-initial position is a pitch offset on the following vowel: vowels after word-initial (but not word-medial) /p, t, k/ start out with a higher pitch compared to vowels after /b, d, ɡ/, even when the latter are phonetically devoiced.^[18] In addition, /b, d, ɡ/ have weakened non-plosive pronunciations that can be broadly transcribed as voiced fricatives [β, ð, ɣ] (although they may be realized instead as voiced approximants [β̞, ð̞~ɹ, ɣ̞~ɰ]).^[19]^[20] Non-plosive pronunciations occur most often in intervocalic position (which can include word-initial position if the consonant follows a vowel-final word with no intervening pause) but are not consistently used in this phonological context. Maekawa (2018) found that, as with the pronunciation of /z/ as [dz] vs. [z], the use of plosive vs. non-plosive realizations of /b, d, ɡ/ is closely correlated with the time available to a speaker to articulate the consonant, which is affected by speech rate as well as the identity of the preceding sound.^[21] All three show a high (over 90%) rate of realization as plosives after /Q/ or after a pause; after /N/, plosive realizations occur at high (over 80%) rates for /b/ and /d/, but less frequently for /ɡ/, probably because word-medial /ɡ/ is often pronounced instead as a velar nasal [ŋ] in this context (although the use of [ŋ] here may be declining for younger speakers).^[22] Across contexts, /d/ generally has a higher rate of plosive realizations than /b/ and /ɡ/.^[23]

/b/ > bilabial fricative [β]	/abareru/ > [aβaɾeɾɯ]	暴れる, abareru, 'to behave violently'
/ɡ/ > velar fricative [ɣ]	/haɡe/ > [haɣe]	はげ, hage, 'baldness'

[t, d, n] are lamino-alveolar^[24] or laminal denti-alveolar^[25]^[26] (that is, the blade of the tongue contacts the back of the upper teeth and the front part of the alveolar ridge). [ts, s, dz~z] are laminal alveolar.^[27]^[28]
[tɕ, ɕ, dʑ~ʑ] are lamino-alveolopalatal [t̠ɕ, ɕ, d̠ʑ~ʑ]: the affricates are sometimes transcribed broadly as [cɕ, ɟʑ]^[29] (standing for prepalatal [c̟ɕ, ɟ̟ʑ]).^[30] The palatalized allophone of /n/ before /i/ or /j/ is also lamino-alveolopalatal^[31] or prepalatal, and so can be transcribed as [ɲ̟],^[32] or more broadly as [ɲ].^[31] Recasens (2013) reports its place of articulation as dentoalveolar or alveolar.^[33]
/w/ is traditionally described as a velar [ɰ] or labialized velar approximant [w] or something between the two, or as the semivocalic equivalent of /u/ with little to no rounding, while a 2020 real-time MRI study found it is better described as a bilabial approximant [β̞].^[34]
/h/ is [ç] before /i/ and /j/ (listen^ⓘ), and [ɸ] before /u/ (listen^ⓘ),^[32] coarticulated with the labial compression of that vowel. When not preceded by a pause, it often may be breathy-voiced [ɦ] rather than voiceless [h].^[35]
Realization of the liquid phoneme /r/ varies greatly depending on environment and dialect. The prototypical and most common pronunciation is an apical tap, either alveolar [ɾ] or postalveolar [ɾ̠].^[36]^[37]^[32] Utterance-initially and after /N/, the tap is typically articulated in such a way that the tip of the tongue is at first momentarily in light contact with the alveolar ridge before being released rapidly by airflow.^[38]^[37] This sound is described variably as a tap, a "variant of [ɾ]", "a kind of weak plosive",^[38] and "an affricate with short friction, [d̠ɹ̝̆]".^[32] The apical alveolar or postalveolar lateral approximant [l] is a common variant in all conditions,^[32] particularly utterance-initially^[38] and before /i, j/.^[36] According to Akamatsu (1997), utterance-initially and intervocalically (that is, except after /N/), the lateral variant is better described as a tap [ɺ] rather than an approximant.^[38]^[39] The retroflex lateral approximant [ɭ] is also found before /i, j/.^[36] In Tokyo's Shitamachi dialect, the alveolar trill [r] is a variant marked with vulgarity.^[36] Other reported variants include the alveolar approximant [ɹ],^[32] the alveolar stop [d], the retroflex flap [ɽ], the lateral fricative [ɮ],^[36] and the retroflex stop [ɖ].^[40]
/N/ is a syllable-final moraic nasal with variable phonetic realization: its pronunciation assimilates to the following sound (including across a word boundary):^[41]
- Before a plosive, affricate, nasal, or liquid, it is a nasal consonant assimilated to the place of the following consonant:
  - bilabial [m] before /p, b, m/.^[42]^[43]
  - velar [ŋ] before /k, ɡ/.^[43] This is palatalized when the following stop is, as in [ɡeŋʲkʲi].^[44]
  - lamino-alveolar [n] before [d, t, ts, dz, n].^[45]
  - lamino-alveolopalatal [ɲ̟] before [tɕ, dʑ, ɲ̟].^[44]
  - apico-alveolar [n̺] before /r/.^[46]
- Before a vowel, approximant /j, w/, or voiceless fricative [ɸ, s, ɕ, ç, h], it is a nasalized vowel or semivowel that can be broadly transcribed as [ɰ̃] (its specific quality depends on the surrounding sounds).^[45] This pronunciation may also occur before the voiced fricatives [z, ʑ],^[43] although more often they are pronounced as affricates when preceded by /N/.^[47]
- When utterance-final, the moraic nasal is traditionally described as uvular [ɴ], sometimes with qualification that the occlusion may not always be complete^[44] or that it is, or approaches, velar [ŋ] after front vowels.^[48] It is also described as a nasalized vowel (as in intervocalic position).^[32] Instrumental studies in the 2010s showed that there is considerable variability in the realization of utterance-final /N/ and that it often involves a lip closure or constriction.^[49]^[50]^[51]^[52] A 2023 real-time MRI study found that the tongue position of utterance-final /N/ largely corresponds to that of the preceding vowel, though with overlapping locations, leading the researcher to conclude that /N/ has no specified place of articulation rather than a clear allophonic rule.^[53] 5% of the samples of utterance-final /N/ were realized as nasalized vowels with no closure, where appreciable tongue raising was observed only when following /a/.^[54]
/Q/ is a syllable-final moraic obstruent consonant; it is unreleased and completely assimilated to the following consonant, producing a phonetically lengthened obstruent consonant.

Moraic consonants

The phonemic analysis of moraic consonants is disputed.

One analysis, particularly popular among Japanese scholars, posits that geminate (that is, double) obstruent consonants begin with a special "mora phoneme" (モーラ音素, mōra onso) /Q/, which corresponds to a unit of Japanese orthography, the sokuon^[55] (Hiragana: ⟨っ⟩; Katakana: ⟨ッ⟩). Likewise, the moraic nasal may be analyzed as a placeless nasal /N/, which likewise corresponds to a unit of Japanese orthography, the hatsuon^[56] (Hiragana: ⟨ん⟩; Katakana: ⟨ン⟩). These can be seen as "placeless" consonant phonemes that have no underlying place of articulation (and also no manner of articulation, in the case of /Q/), instead manifesting as several phonetic realizations depending on context. According to this kind of analysis, geminate nasal consonants are phonemically /Nn/ and /Nm/, and other geminate consonants are phonemically /Q/ followed by an obstruent. (Phonetically, geminate consonants can be transcribed with a length mark, e.g. [ipːai], but this notation obscures mora boundaries. Vance (2008) considers Japanese geminates to be "extra-long" and prefers to use two length markers in his phonetic transcriptions, e.g. [ipːːɑi], [sɑ̃mːːɑi].^[57] In the following transcriptions, geminates will be phonetically transcribed as two occurrences of the same consonant across a syllable boundary, the first being unreleased.)

/Q/ > [p̚] before [p]	/iQ.pai/	[ip̚.pai]	一杯, ippai, 'one cupful'^[58]
/Q/ > [s] before [s]	/iQ.sai/	[is.sai]	一歳, issai, 'one year old'^[58]
/Q/ > [t̚] before [tɕ]	/saQ.ti/	[sat̚.tɕi]	察知, satchi, 'inference'
/N/ > [m] before [m]	/saN.mai/	[sam.mai]	三枚, sanmai, 'three sheets'^[59]
/N/ > [n] before [n]	/saN.neN/	[san.neɴ]	三年, sannen, 'three years'^[60]

Less abstractly, the moraic nasal /N/ may be interpreted as a phoneme with an underlyingly uvular place of articulation, i.e. /ɴ/,^[61] based on the traditional description of its word-final realization.^[62] Similarly, it has been suggested that the underlying phonemic representation of /Q/ might be a glottal stop /ʔ/—despite the fact that phonetically, it is not always a stop, and is usually not glottal—based on the occurrence of [ʔ] in certain marginal forms that can be interpreted as containing /Q/ not followed by another obstruent: for example, [ʔ] can be found at the end of an exclamation, or before a sonorant in forms with emphatic gemination, and the use of the sokuon as a written representation of [ʔ] in these contexts suggests Japanese speakers identify [ʔ] as the default form of /Q/, or the form it takes when it is not possible for it to share its place and manner of articulation with a following obstruent.^[63]

A competing analysis dispenses entirely with /Q/ and /N/.^[64] The moraic obstruent can be interpreted as having the same phonemic value as the following consonant, as shown below:

/p/ [p̚] before [p]	/ip.pai/	[ip̚.pai]	一杯, ippai, 'one cupful'
/s/ [s] before [s]	/is.sai/	[is.sai]	一歳, issai, 'one year old'
/t/ [t̚] before [tɕ]	/sat.ti/	[sat̚.tɕi]	察知, satchi, 'inference'

Likewise, rather than being considered a distinct phoneme /N/ or /ɴ/, the moraic nasal may be considered an allophone of the coronal nasal phoneme /n/ when it occurs in syllable-final (coda) position^[64]^[65] (this requires treating syllable or mora boundaries as potentially distinctive, in order to explain the contrast between the moraic nasal and non-moraic /n/ before a vowel):

/n/ > [m] before [m]	/san.mai/ > [sam.mai]	三枚, sanmai, 'three sheets'
/n/ [n] before [n]	/san.nen/ > [san.neɴ]	三年, sannen, 'three years'

Alternatively, as there is no contrast in coda position between /m/ and /n/, the moraic nasal can be interpreted as an archiphoneme (a neutralization between otherwise contrastive phonemes).^[66] Likewise, the moraic obstruent can be interpreted as an archiphoneme representing the syllable-final neutralization of Japanese obstruent consonant phonemes.

Voiced affricate vs. fricative

Main article: Yotsugana

The distinction between the voiced fricatives [z ʑ] (originally allophones of /z/) and the voiced affricates [dz, dʑ] (originally allophones of /d/) is neutralized in Standard Japanese and in most regional Japanese dialects. The phoneme resulting from the merger can be transcribed as /z/,^[67] though some analyze it as /dz/, the voiced counterpart to [ts].^[68] A 2010 corpus study found that in neutralizing varieties, both the fricative and the affricate pronunciation could be found in any position in a word, but the likelihood of the affricate realization was increased in phonetic conditions that allowed for greater time to articulate the consonant: voiced affricates were found to occur on average 60% of the time after /N/, 74% after /Q/, and 80% after a pause.^[69] In addition, the rate of fricative realizations increased as speech rate increased.^[70] In terms of direction, these effects match those found for the use of plosive vs. non-plosive pronunciations of the voiced stops /b, d, ɡ/; however, the overall rate of fricative realizations of /(d)z/ (including both [dz~z] and [dʑ~ʑ], in either intervocalic or postnasal position) seems to be higher than the rate of non-plosive realizations of /b, d, ɡ/.^[71]

As a result of the neutralization, the historical spelling distinction between these sounds has been eliminated from the modern written standard except in cases where a mora is repeated once voiceless and once voiced, or where rendaku occurs in a compound word: つづく[続く] /tuzuku/, いちづける[位置付ける] /itizukeru/ from |iti+tukeru|. The use of the historical or morphological spelling in these contexts does not indicate a phonetic distinction: /zu/ and /zi/ in Standard Japanese are variably pronounced with affricates or fricatives according to the contextual tendencies described above, regardless of whether they are underlyingly voiced or derived by rendaku from /tu/ and /ti/.^[72]
Some dialects (e.g. Tosa^[73]) retain the distinctions between /zi/ and /di/ and between /zu/ and /du/, while others retain only /zu/ and /du/ but not /zi/ and /di/, or merge all four (e.g. north Tōhoku).^[73]

Voiceless coronal affricate

In core vocabulary, [ts] can be analyzed as an allophone of /t/ before /u/:^[74]

/t/ > [ts]

/tuɡi/ > [tsɯɡi]

次, tsugi, 'next'

In loanwords, however, [ts] can occur before other vowels:^[75] examples include [tsaitoɡaisɯto] ツァイトガイスト, tsaitogaisuto, 'zeitgeist'; [eɾitsiɴ] エリツィン, Eritsin, 'Yeltsin'. There are also a small number of native forms with [ts] before a vowel other than /u/, such as otottsan, 'dad',^[76]^[77] although these are marginal and nonstandard^[78] (the standard form of this word is otōsan).^[75] Based on dialectal or colloquial forms like these, as well as the phonetic distance between plosive and affricate sounds, Hattori (1950) argues that the affricate [ts] is its own phoneme, represented by the non-IPA symbol /c/ (also interpreted to include [tɕ] before [i]).^[79] In contrast, Shibatani (1990) disregards such forms as exceptional, and prefers analyzing [ts] and [tɕ] as allophones of /t/, not as a distinct affricate phoneme.^[80]

Palatalized consonants

Alveolo-palatal sibilants

The three alveolo-palatal sibilants [tɕ ɕ (d)ʑ] function, at least historically, as the palatalized counterparts of the four coronal obstruents [t s d (d)z]. Original /ti/ came to be pronounced as [tɕi], original /si/ came to be pronounced as [ɕi],^[91] and original /di/ and /zi/ both came to be pronounced as [(d)ʑi].^[92] (As a result, the sequences [ti si di (d)zi] do not occur in native or Sino-Japanese vocabulary.^[93])

/s/ > [ɕ]	/sio/ > [ɕi.o]	塩, shio, 'salt'
/z/ > [dʑ~ʑ]	/mozi/ > [modʑi ~ moʑi]	文字, moji, 'letter, character'
/t/ > [tɕ]	/tiziN/ > [tɕidʑiɴ] ~ [tɕiʑiɴ]	知人, chijin, 'acquaintance'

Likewise, original /tj/ came to be pronounced as [tɕ], original /sj/ came to be pronounced as [ɕ],^[94] and original /dj/ and /zj/ both came to be pronounced as [(d)ʑ]:^[95]

/sj/ > [ɕ]	/isja/ > [iɕa]	医者, isha, 'doctor'
/zj/ > [dʑ~ʑ]	/ɡozjuː/ > [ɡodʑɯː ~ ɡoʑɯː]	五十, gojū, 'fifty'
/tj/ > [tɕ]	/tja/ > [tɕa]	茶, cha, 'tea'

Therefore, alveolo-palatal [tɕ dʑ ɕ ʑ] can be analyzed as positional allophones of /t d s z/ before /i/, or as the surface realization of underlying /tj dj sj zj/ clusters before other vowels. For example, [ɕi] can be analyzed as /si/ and [ɕa] as /sja/. Likewise, [tɕi] can be analyzed as /ti/ and /tɕa/ as /tja/. (These analyses correspond to the representation of these sounds in the Japanese spelling system.) Most dialects show a merger in the pronunciation of underlying /d/ and /z/ before /j/ or /i/, with the resulting merged phone varying between [ʑ] and [dʑ]. The contrast between /d/ and /z/ is also neutralized before /u/ in most dialects (see above).

While the diachronic origins of these sounds as allophones of /t s d z/ is uncontroversial, there is disagreement among linguists about whether alveolo-palatal sibilants continue to function synchronically as allophones of coronal consonant phonemes: the identification of [tɕ] as a palatalized allophone of /t/ is especially debated, due to the presence of a distinctive contrast between [tɕi] and [ti] in the foreign stratum of Standard Japanese vocabulary.

[tɕi (d)ʑi] vs. foreign [ti, di]

The sequences [ti, di] are found exclusively in recent loanwords; they have been assigned the novel kana spellings ティ, ディ. (Loanwords borrowed before [ti] was widely tolerated usually replaced this sequence with チ [tɕi] or (more rarely) テ [te],^[96] and certain forms exhibiting these replacements continue to be used; likewise, ジ [(d)ʑi] or デ [de] can be found instead of [di] in some forms, such as ラジオ, rajio, 'radio' and デジタル, dejitaru, 'digital'.^[97]) Based on a study of type frequency in a lexicon and token frequency in a spoken corpus, Hall (2013) concludes that [t] and [tɕ] have become about as contrastive before /i/ as they are before /a/.^[98] Some analysts argue that the use of [ti, di] in loanwords shows that the change of /ti/ to [tɕi] is an inactive, 'fossilized' rule, and conclude that [tɕi] must now be analyzed as containing an affricate phoneme distinct from /t/; others argue that pronunciation of /ti/ as [tɕi] continues to be an active rule of Japanese phonology, but that this rule is restricted from applying to words belonging to the foreign stratum.^[99]

In contrast to [ti, di], the sequences *[si, zi] are not established even in loanwords. English /s/ is still normally adapted as [ɕ] before /i/^[100] (i.e. with katakana シ, shi). An example is シネマ, shinema [ɕinema] from cinema.^[101] Likewise, English /z/ is normally adapted as [(d)ʑ] before /i/ (i.e. with katakana ジ, ji). Pronunciation of loanwords with [si]^[102] or [zi] is rare even among the most innovative speakers, but not entirely absent.^[103] To transcribe [si], as opposed to [ɕi], it is possible to use the novel kana spelling スィ (su + small i)^[100] (though this has also been used to transcribe original [sw] before /i/ in forms like スィッチ, 'switch' [sɯittɕi],^[104] as an alternative to the spellings スイッチ, suitchi or スウィッチ, suwitchi). The use of スィ and its voiced counterpart ズィ was mentioned, but not officially recommended, by a 1991 cabinet directive on the use of kana to spell foreign words.^[105]^[106] Nogita (2016) argues that the difference between [ɕi] and [si] may be marginally contrastive for some speakers,^[90] whereas Labrune (2012) denies that *[si, zi] are ever distinguished in pronunciation from [ɕi, (d)ʑi] in adapted forms, regardless of whether the spellings スィ and ズィare used in writing.^[107]

The sequence [tsi] (as opposed to either [tɕi] or [ti]) also has some marginal use in loanwords.^[108] An example is エリツィン, Eritsin, 'Yeltsin'.^[75] In many cases a variant adaptation with [tɕi] exists.^[108]

Alternations involving [tɕ ɕ (d)ʑ]

Aside from arguments based on loanword phonology, there is also disagreement about the phonemic analysis of native Japanese forms. Some verbs can be analyzed as having an underlying stem that ends in either /t/ or /s/; these become [tɕ] or [ɕ] respectively before inflectional suffixes that start with [i]:

[matanai] 'wait' (negative)	vs.	[matɕimasu] 'wait' (polite)^[109]
[kasanai] 'lend' (negative)	vs.	[kaɕimasu] 'lend' (polite)^[109]

In addition, Shibatani (1990) notes that in casual speech, /se/ or /te/ in verb forms may undergo coalescence with a following /ba/ (marking the conditional), forming [ɕaː] and [tɕaː] respectively, as in [kaɕaː] for /kaseba/ 'if (I) lend' and [katɕaː] for /kateba/ 'if (I) win.'^[110] On the other hand, per Vance (1987), [tj, sj] (more narrowly, [tj̥, sj̥]) can occur instead of [tɕ, ɕ] for some speakers in contracted speech forms, such as [tjɯː] for /tojuː/ 'saying',^[111] [matja(ː)] for /mateba/ 'if one waits', and [hanasja(ː)] for /hanaseba/ 'if one speaks'; Vance notes these could be dismissed as non-phonemic rapid speech variants.^[112]

Hattori (1950) argues that alternations in verb forms do not prove [tɕ] is phonemically /t/, citing kawanai (with /w/) vs. kai, kau, kae, etc. as evidence that a stem-final consonant is not always maintained without phonemic change throughout a verb's conjugated forms, and /joɴdewa/~/joɴzja/ '(must not) read' as evidence that palatalization produced by vowel coalescence can result in alternation between different consonant phonemes.^[113]

Competing phonemic analyses

There are several alternatives to the interpretation of [tɕ ɕ (d)ʑ] as allophones of /t s z/ before /i/ or /j/.

Some interpretations agree with the analysis of [ɕ] as an allophone of /s/ and [(d)ʑ] as an allophone of /z/ (or /dz/), but treat [tɕ] as the palatalized allophone of a voiceless coronal affricate phoneme /ts/^[32] (sometimes transcribed with the non-IPA symbol /c/). In this sort of analysis, [tɕi, tɕa] = /tsi, tsja/ (or /ci, cja/).

Other interpretations treat [tɕ ɕ (d)ʑ] as their own phonemes, while treating other palatalized consonants as allophones or clusters.^[114]^[90] The status of [tɕ ɕ (d)ʑ] as phonemes rather than clusters ending in /j/ is argued to be supported by the stable use of the sequences [tɕe (d)ʑe ɕe] in loanwords; in contrast, /je/ is somewhat unstable (it may be variably replaced with /ie/ or /e/^[115]), and other consonant + /je/ sequences such as [pje], [kje] are generally absent.^[116]^[90] (Aside from loanwords, [tɕe ɕe] also occur marginally in native vocabulary in certain exclamatory forms.^[117]^[118])

It has alternatively been suggested that pairs like [tɕi] vs. [ti] could be analyzed as /tji/ vs. /ti/.^[119] Vance (2008) objects to analyses like /tji/ on the basis that the sequence /ji/ is otherwise forbidden in Japanese phonology.^[120]

Voiceless bilabial fricative

In core vocabulary, [ɸ] occurs only before /u/ and can be analyzed as an allophone of /h/:^[74]

/h/ > [ɸ]

/huta/ > [ɸɯta]

ふた, futa, 'lid'

According to some descriptions, the initial sound of ふ, fu /hu/ is not consistently produced as [ɸ], but can sometimes be a sound with weak or no bilabial friction that could be transcribed as [h]^[121]^[122] (a voiceless approximant similar to the start of English "who"^[123]).

In loanwords, [ɸ] can occur before other vowels or before /j/. Examples include [ɸiɴ] (フィン, fin, 'fin'), [ɸeɾiː] (フェリー, ferī, 'ferry'), [ɸaɴ] (ファン, fan, 'fan'), [ɸoːmɯ] (フォーム, fōmu, 'form'), and [ɸjɯː(d)ʑoɴ] (フュージョン, fyūjon, 'fusion').^[124] Even in loanwords, *[hɯ] is not distinguished from [ɸɯ]^[101] (e.g. English hood and food > [ɸɯːdo] フード, fūdo), but [ɸ] and [h] are distinguished before other vowels (e.g. English fork > [ɸoːkɯ] フォーク, fōku versus hawk > [hoːkɯ] ホーク, hōku).

The integration of [ɸi], [ɸe], [ɸa], [ɸo] and [ɸjɯ] into contemporary spoken Standard Japanese seems to have been completed at some point after the middle of the twentieth century,^[125] in the post-war period: before then, the pronunciation of these sequences seems to have been common only in educated pronunciation.^[126] Loanwords borrowed more recently than around 1890 fairly consistently show [ɸ] as an adaptation of foreign [f].^[127] Some older borrowed forms show adaptation of foreign [f] to Japanese /h/ before a vowel other than /u/, such as コーヒー, kōhī, 'coffee' and プラットホーム, purattohōmu, 'platform'. Another old adaptation pattern was the replacement of foreign [f] with [ɸɯ] before a vowel other than /u/, e.g. film > [ɸɯ.i.rɯ.mɯ] フイルム, fuirumu. Both of these replacement strategies are now largely obsolete,^[126] although certain old adapted forms continue to be used, sometimes with specialized meanings compared to a variant pronunciation: for example, フイルム, fuirumu tends to be restricted in modern use to photographic films, whereas フィルム, firumu is used for other senses of "film" such as movie films.^[128]

Velar nasal onset

For some speakers, the velar nasal [ŋ] can occur as an onset in place of the voiced velar plosive [ɡ] in certain conditions. Onset [ŋ] is generally restricted to word-internal position,^[129] where it may occur either after a vowel (as in 禿, hage, 'baldness' [haŋe]^[130]) or after a moraic nasal /N/^[131] (as in 音楽, ongaku, 'music' [oŋŋakɯ~oŋŋakɯ̥]^[132]). It is debated whether onset [ŋ] constitutes a separate phoneme or an allophone of /ɡ/.^[133] They are written the same way in kana, and native speakers have the intuition that the two sounds belong to the same phoneme.^[134]^[a]

Speakers can be divided in three groups based on the extent to which they use [ŋ] in contexts where [ɡ] is not required: some consistently use [ŋ], some never use [ŋ], and some show variable use of [ŋ] versus [ɡ] (or [ɣ]). Speakers who consistently use [ŋ] are a minority. The distribution of [ŋ] versus [ɡ] for these speakers mostly follows predictable rules (as described below): however, a number of complications and exceptions exist, and as a result, some linguists analyze /ŋ/ as a distinct phoneme for consistent nasal speakers.^[134] The contrast has very low functional load,^[134] but for some speakers pairs such as [oːɡaɾasɯ] (大硝子, 'big sheet of glass') versus [oːŋaɾasɯ] (大烏, 'big raven') can be cited as examples of words that are segmentally identical aside from the use of [ɡ] versus [ŋ].^[135] Another commonly cited pair is [seŋɡo] 千五, 'one thousand and five' versus [seŋŋo] 戦後, 'postwar', although aside from the segmental difference in the consonant, these are prosodically distinct: the first is normally pronounced as two accent phrases, [seꜜŋɡoꜜ], whereas the second is pronounced as a single accent phrase (either [seꜜŋŋo] or [seŋŋo]).^[136]

Distribution of [ŋ] vs. [ɡ]

At the start of an independent word, all speakers use [ɡ] in almost all circumstances. However, postpositional particles, such as the subject marker が, ga, are pronounced with [ŋ] by consistent nasal speakers.^[137] In addition, certain words that normally occur after other words may be pronounced with [ŋ] even when they occur at the start of an utterance: examples include the conjunction が, ga, 'but' and the word gurai, 'approximately'.^[137]^[129]

In the middle of a native morpheme, consistent nasal speakers always use [ŋ]. But in the middle of foreign-stratum morphemes, [ɡ] may be used even by consistent nasal speakers.^[138] It is also possible for foreign morphemes to be pronounced with medial [ŋ]: there is considerable variability, but this may be more common in older borrowings (such as オルガン, orugan, 'organ', from Portuguese órgão)^[138] or in borrowings that contained [ŋ] in the source language (such as イギリス, igirisu, 'England', from Portuguese inglês).^[139]

At the start of a morpheme in the middle of a word, either [ŋ] or [ɡ] may be possible, depending on the word. Only [ɡ] is possible after the honorific prefix お, o (as in お元気, ogenki, 'health' [oɡenki]) or at the start of a reduplicated mimetic morpheme^[139] (as in がらがら, gara-gara, 'rattle-rattle' [ɡaɾaɡaɾa]).^[140] Consistent nasal speakers typically use [ŋ] at the start of the second morpheme of a bimorphemic Sino-Japanese word.^[141] When the second element of a compound word is a non-bound morpheme or a multi-morphemic word, consistent nasal speakers may use either [ɡ] or [ŋ]: factors such as the lexical stratum of the morpheme may play a role, but it seems difficult to establish precise rules predicting which pronunciation occurs in this context, and the pronunciation of some words varies even among consistent nasal speakers, such as 縞柄, shimagara, 'striped pattern' [ɕimaɡaɾa~ɕimaŋaɾa].^[142]

The morpheme 五, go, 'five', is pronounced with [ɡ] when it is used as part of a compound numeral, as in [ɲi(d)ʑɯːgo] 二十五, nijū-go, 'twenty-five' (accented as [ɲiꜜ(d)ʑɯːgoꜜ]),^[32] although 五 can potentially be pronounced as [ŋo] when it occurs non-initially in certain proper nouns or lexicalized compound words, such as [tameŋoɾoː] 為五郎 (a male given name), [ɕitɕiŋosaɴ] 七五三 (the name of a festival for children aged five, three or seven), or [(d)ʑɯːŋoja] 十五夜 (a night of the full moon).^[143]

To summarize:

	in the middle of a morpheme	at the start of a word	at the start of a morpheme, in the middle of a word
	はげ, hage, 'baldness'	外遊, gaiyū, 'overseas trip'	at the start of a morpheme, in the middle of a word
inconsistent speakers	[haŋe] or [haɡe] or [haɣe]	[ɡaijɯː], but not *[ŋaijɯː]	sometimes [ŋ], sometimes [ɡ]~[ɣ]
consistent nasal speakers	[haŋe]		sometimes [ŋ], sometimes [ɡ]~[ɣ]
consistent stop speakers	[haɡe] or [haɣe]		[ɡ] or [ɣ]

Sociolinguistics of [ŋ]

The frequency of onset [ŋ] in Tokyo Japanese speech was falling as of 2008, and seems to have already been on the decline in 1940.^[144] Pronunciations with [ŋ] are generally less frequent for younger speakers,^[145]^[144]^[133] and even though the use of [ŋ] was traditionally prescribed as a feature of standard Japanese, pronunciations with [ɡ] seem in practice to have acquired a more prestigious status, as shown by studies that find higher rates of [ɡ] usage when speakers read words from a list.^[146]^[147]

Vowels

Vowel phonemes of Japanese
	Front	Central	Back
Close	i		u
Mid	e		o
Open		a

/u/ is a close near-back vowel with the lips unrounded ([ɯ̟])^[148]^[149] or compressed ([ɯ̟ᵝ]).^[32]^[150] When compressed, it is pronounced with the side portions of the lips in contact but with no salient protrusion. In conversational speech, compression may be weakened or completely dropped.^[150] It is centralized [ɨ] after /s, z, t/ and palatalized consonants (/Cj/),^[148] and possibly also after /n/.^[150] In contradiction to the preceding descriptions, Nogita & Yamane (2019) characterize /u/ as rounded and propose that the transcription [ʉ̜] is more accurate than [ɯ], while acknowledging the possibility of unrounding in fast speech. Based on visual recordings of Japanese speakers' lips, they conclude that /u/ is pronounced with lip protrusion (forward motion causing the lip corners to be brought closer together horizontally), in contrast to the spread lip position of a vowel like /i/, or the vertical movement of the lips towards each other for the [β] allophone of /b/. They suggest that the perceptual impression of Japanese /u/ as an unrounded vowel could be caused partly by its fronted articulation, and partly by its protrusion being accompanied by less vertical lip closure compared to /u/ in other languages, resulting in a less rounded sound. Lip protrusion was also found to be greater for Japanese /u/ than for /i/ in a 2005 MRI study^[151]^[152] and in a 1997 study using x-ray microbeam kinematic data.^[153]
/e, o/ are mid [e̞, o̞].^[154]
/a/ is central [ä].^[154]

Long vowels and vowel sequences

All vowels display a length contrast: short vowels are phonemically distinct from long vowels:

[obasaɴ]	小母さん, obasan, 'aunt'	[obaːsaɴ]	お婆さん, obaasan, 'grandmother'
[keɡeɴ]	怪訝, kegen, 'dubious'	[keːɡeɴ]	軽減, keigen, 'reduction'
[çirɯ]	蛭, hiru, 'leech'	[çiːrɯ]	ヒール, hiiru, 'heel'
[tokai]	都会, tokai, 'city'	[toːkai]	倒壊, tōkai, 'destruction'
[kɯ]	区, ku, 'district'	[kɯː]	空, kū, 'void'^[155]

Long vowels are pronounced with around 2.5 or 3 times the phonetic duration of short vowels, but are considered to be two moras long at the phonological level.^[156] In normal speech, a "double vowel", that is, a sequence of two identical short vowels (for example, across morpheme boundaries), is pronounced the same way as a long vowel. However, in slow or formal speech, a sequence of two identical short vowels may be pronounced differently from an intrinsically long vowel:^[157]

[satoːja]	砂糖屋, satō-ya, 'sugar shop'
[satoːja]~[sato.oja]	里親, sato-oya, 'foster parent'^[157]

In the above transcription, [.] represents a hiatus between vowels; sources differ on how they transcribe and describe the phonetic realization of hiatus in Japanese. Labrune (2012) says it can be "a pause or a light glottal stop", and adopts the transcription [ˀ].^[157] Shibatani (1990) states that there is no complete glottal closure and questions whether there is any actual glottal narrowing at all.^[158] Vance describes it as vowel rearticulation (a drop in intensity) and transcribes it as [ˀ]^[159] or [*].^[160]

In addition, a double vowel may bear pitch accent on either the first or second element, whereas an intrinsically long vowel can be accented only on its first mora.^[161] The distinction between double vowels and long vowels may be phonologically analyzed in various ways. One analysis interprets long vowels as ending in a special segment /R/ that adds a mora to the preceding vowel sound^[162] (a chroneme). Another analysis interprets long vowels as sequences of the same vowel phoneme twice, with double vowels distinguished by the presence of a "zero consonant" or empty onset between the vowels.^[163] A third approach also interprets long vowels as sequences of the same vowel phoneme twice, but treats the difference between long and double vowels as a matter of syllabification, with the long vowel [oː] consisting of the phonemes /oo/ pronounced in one syllable, and the double vowel [o.o] consisting of the same two phonemes split between two syllables.^[164]

Any pair of short vowels may occur in sequence^[165] (although only a subset of vowel sequences can be found within a morpheme in native or Sino-Japanese vocabulary). Sequences of three or more vowels also occur. Similar to the distinction between long vowels and double vowels, some analyses of Japanese phonology recognize a distinction between diphthongs (two different vowel phonemes pronounced in one syllable) and heterosyllabic vowel sequences; other analyses make no such distinction.

Devoicing

Nasalization

Vowels are nasalized before the moraic nasal /N/ (or equivalently, before a syllable-final nasal).^[171]

/kaNtoo/ > [kãntoː]	関東, Kantō 'Kanto region'
/seesaN/ > [seːsãɴ]	生産, seisan, 'production'

Vowels adjacent to a non-moraic nasal consonant may also have incidental phonetic nasalization, as in the first and second mora of minato, 'harbor'.^[172]

Glottal stop insertion

At the beginning and end of utterances, Japanese vowels may be preceded and followed by a glottal stop [ʔ], respectively.^[173] This is demonstrated below with the following words (as pronounced in isolation):

/eN/ > [eɴ] ~ [ʔeɴ]	円, en, 'yen'
/kisi/ > [kiɕiʔ]	岸, kishi, 'shore'
/u/ > [ɯʔ ~ ʔɯʔ]	鵜, u, 'cormorant'

When an utterance-final word is uttered with emphasis, the presence of a glottal stop is noticeable to native speakers, and it may be indicated in writing with the sokuon っ, suggesting it is identified with the moraic obstruent /Q/^[174] (normally found as the first half of a geminate). This is also found in interjections like あっ, a and えっ, e.

Prosody

Moras

Further information: On (Japanese prosody), Mora (linguistics) § Japanese, and Isochrony § Mora timing

Japanese words have traditionally been analysed as composed of moras, a distinct concept from that of syllables.^[b]^[175] Each mora occupies one rhythmic unit, i.e. it is perceived to have the same time value.^[176] A mora may be "regular" consisting of just a vowel (V) or a consonant and a vowel (CV), or may be one of two "special" moras, /N/ and /Q/. A glide /j/ may precede the vowel in "regular" moras (CjV). Some analyses posit a third "special" mora, /R/, the second part of a long vowel (a chroneme).^[c]^[177] In the following table, the period represents a mora break, rather than the conventional syllable break.

Mora type	Example	Japanese	Moras per word
V	/o/	尾, o, 'tail'	1-mora word
jV	/jo/	世, yo, 'world'	1-mora word
CV	/ko/	子, ko, 'child'	1-mora word
CjV	/kjo/¹	巨, kyo, 'hugeness'	1-mora word
R	/R/ in /kjo.R/ or /kjo.o/	今日, kyō, 'today'	2-mora word
N	/N/ in /ko.N/	紺, kon, 'deep blue'	2-mora word
Q	/Q/ in /ko.Q.ko/ or /ko.k.ko/	国庫, kokko, 'national treasury'	3-mora word

^1 Traditionally, moras were divided into plain and palatal sets, the latter of which entail palatalization of the consonant element.^[178]

Thus, the disyllabic [ɲip.poɴ] (日本, 'Japan') may be analyzed as /niQpoN/, dissected into four moras: /ni/, /Q/, /po/, and /N/.

In English, stressed syllables in a word are pronounced louder, longer, and with higher pitch, while unstressed syllables are relatively shorter in duration. Japanese is often considered a mora-timed language, as each mora tends to be of the same length,^[179] though not strictly: geminate consonants and moras with devoiced vowels may be shorter than other moras.^[180] Factors such as pitch have negligible influence on mora length.^[181]

Pitch accent

Main article: Japanese pitch accent

Standard Japanese has a distinctive pitch accent system where a word can either be unaccented, or can bear an accent on one of its moras. An accented mora is pronounced with a relatively high tone and is followed by a drop in pitch, which can be marked in transcription by placing a downward-pointing arrow /ꜜ/ after the accented mora.^[182]

The pitch of other moras in the word (or more precisely, in the accent phrase) is predictable. A common simplified model describes pitch patterns in terms of a two-way division between low- and high-pitched moras. Low pitch is found on all moras following the accented mora (if there is one) and usually also on the first mora of the accent phrase (unless it bears the accent). High pitch is found on the accented mora (if there is one) and on non-initial moras up to the accented mora, or up to the end of the accent phrase if there is no accented mora.^[183]

Under this model, it is not possible to distinguish the pitch patterns of an unaccented phrase and a phrase with accent on the final mora: both show low pitch on the first mora and high pitch on every following mora. It is generally said that there is no audible difference between these two accentuation patterns.^[184]^[185] (Some acoustic experiments have found evidence that some speakers may produce slightly different phonetic pitch contours for these two accentuation patterns;^[184]^[185] however, even when such differences exist, they do not seem to be perceptible to listeners.^[186]) However, the underlying lexical distinction between unaccented words and words with accent on the final mora is made apparent when the word is followed by further material within the same accent phrase, such as the case particle が: thus, /hasiꜜɡa/ (橋が, 'bridge NOM') clearly contrasts with /hasiɡa/ (端が, 'edge NOM'), even though a distinction is not perceived between 橋 /hasiꜜ/ and 端 /hasi/ when pronounced in isolation.

The placement of pitch accent, and the lowering of pitch on an initial unaccented mora, show some restrictions that can be explained in terms of syllable structure. Accent cannot be placed on the second mora of a heavy syllable;^[187] this includes syllable-final coda /N/ and /Q/ and the second mora of a long vowel or of a diphthong. An initial unaccented mora isn't always pronounced with low pitch when it occurs as part of a heavy syllable.^[188] Specifically, when the second mora of an accent phrase is /R/ (the latter part of a long vowel) or /N/ (the moraic nasal), the first two moras are optionally either LH (low-high) or HH (high-high).^[189] In contrast, when the second mora is /Q/ the first two moras are LL (low-low).^[190] When the second mora is /i/, initial lowering seems to apply as usual to the first mora only, LH (low-high).^[190] Labrune (2012) rejects the use of the syllable in descriptions of Japanese phonology and so explains these phenomena alternatively as a consequence of /N/, /Q/, /R/ constituting "deficient moras", a term Labrune suggests can also encompass moras without an onset, with a devoiced vowel, or with an epenthetic vowel.^[191]

Different dialects of Japanese have different accent systems: some distinguish a greater number of contrastive pitch patterns than the Tokyo dialect, while others make fewer distinctions.^[192]

Feet

The bimoraic foot, a unit composed of two moras, plays an important role in linguistic analyses of Japanese prosody.^[193]^[194] The relevance of the bimoraic foot can be seen in the formation of hypocoristic names, clipped compounds, and shortened forms of longer words.

For example, the hypocoristic suffix -chan is attached to the end of a name to form an affectionate term of address. When this suffix is used, the name may be unchanged in form, or it may optionally be modified: modified forms always have an even number of moras before the suffix.^[195] It is common to use the first two moras of the base name, but there are also variations that are not produced by simple truncation:^[d]

Truncation to the first two moras:^[196]

/o.sa.mu/	osamu	>	/o.sa.tja.N/	osachan
/ta.ro.ː/	taroo	>	/ta.ro.tja.N/	tarochan
/jo.ː.su.ke/	yoosuke	>	/jo.ː.tja.N/	yoochan
/ta.i.zo.ː/	taizoo	>	/ta.i.tja.N/	taichan
/ki.N.su.ke/	kinsuke	>	/ki.N.tja.N/	kinchan

From first mora, with lengthening:^[197]

/ti/	chi	>	/ti.ː.tja.N/	chiichan
/ka.yo.ko/	kayoko	>	/ka.ː.tja.N/	kaachan

With formation of a moraic obstruent:^[198]

/a.tu.ko/	atsuko	>	/a.Q.tja.N/	atchan
/mi.ti.ko/	michiko	>	/mi.Q.tja.N/	mitchan
/bo.ː/	boo	>	/bo.Q.tja.N/	botchan

With formation of a moraic nasal:^[199]

/a.ni/	ani	>	/a.N.tja.N/	anchan
/me.ɡu.mi/	megumi	>	/me.N.tja.N/	menchan
/no.bu.ko/	nobuko	>	/no.N.tja.N/	nonchan

From two non-adjacent moras:^[200]

/a.ki.ko/	akiko	>	/a.ko.tja.N/	akochan
/mo.to.ko/	motoko	>	/mo.ko.tja.N/	mokochan

Poser (1990) argues that the various kinds of modifications are best explained in terms of a two-mora 'template' used in the formation of this type of hypocoristic: the bimoraic foot.^[201]

Aside from the bimoraic foot as shown above, in some analyses monomoraic (one-mora) feet (also called "degenerate" feet) or trimoraic (three-mora) feet are considered to occur in certain contexts.^[193]

Syllables

Although there is debate about the usefulness or relevance of syllables to the phonology of Japanese, it is possible to analyze Japanese words as being divided into syllables. When setting Japanese lyrics to (modern Western-style) music, a single note may correspond either to a mora or to a syllable.^[202]

Normally, each syllable contains at least one vowel^[203] and has a length of either one mora (called a light syllable) or two moras (called a heavy syllable); thus, the structure of a typical Japanese syllable can be represented as (C)(j)V(V/N/Q), where C represents an onset consonant, V represents a vowel, N represents a moraic nasal, Q represents a moraic obstruent, components in parentheses are optional, and components separated by a slash are mutually exclusive.^[204] However, other, more marginal syllable types (such as trimoraic syllables or vowelless syllables) may exist in restricted contexts.

The majority of syllables in spontaneous Japanese speech are 'light',^[205] that is, one mora long, with the form (C)(j)V.

Heavy syllables

"Heavy" syllables (two moras long) may potentially take any of the following forms:

(C)(j)VN (ending in a short vowel + /N/)
(C)(j)VQ (ending in a short vowel + /Q/)
(C)(j)VR (ending in a long vowel). May be analyzed either as a special case of (C)(j)VV with both V as the same vowel phoneme,^[204] or as ending in a vowel followed by a special chroneme segment (written as R or sometimes H).^[206]
(C)(j)V₁V₂, where V₁ is different from V₂. Sometimes notated as (C)(j)VJ.

Some descriptions of Japanese phonology refer to a VV sequence within a syllable as a diphthong; others use the term "quasi-diphthong" as a means of clarifying that these are analyzed as sequences of two vowel phonemes within one syllable, rather than as unitary phonemes.^[207] There is disagreement about which non-identical vowel sequences can occur within the same syllable. One criterion used to evaluate this question is the placement of pitch accent: it has been argued that, like syllables ending in long vowels, syllables ending in diphthongs cannot bear a pitch accent on their final mora.^[208] It has also been argued that diphthongs, like long vowels, cannot normally be pronounced with a glottal stop or vowel rearticulation between their two moras, whereas this may optionally occur between two vowels that belong to separate syllables.^[209] Kubozono (2015a) argues that only /ai/, /oi/ and /ui/ can be diphthongs,^[210] although some prior literature has included other sequences such as /ae/, /ao/, /oe/, /au/, when they occur within a morpheme.^[211] Labrune (2012) argues against the syllable as a unit of Japanese phonology and thus concludes that no vowel sequences ought to be analyzed as diphthongs.^[212]

In some contexts, a VV sequence that could form a valid diphthong is separated by a syllable break at a morpheme boundary, as in /kuruma.iꜜdo/ 'well with a pulley' from /kuruma/ 'wheel, car' and /iꜜdo/ 'well'.^[213] However, the distinction between a heterosyllabic vowel sequence and a long vowel or diphthong is not always predictable from the position of morpheme boundaries: that is, syllable breaks between vowels do not always correspond to morpheme boundaries (or vice versa).

For example, some speakers may pronounce the word 炎, honoo, 'flame' with a heterosyllabic /o.o/ sequence, even though this word is arguably monomorphemic in modern Japanese.^[214] This is an exceptional case: for the most part, heterosyllabic sequences of two identical short vowels are found only across a morpheme boundary.^[214] On the other hand, it is not so rare for a heterosyllabic sequence of two non-identical vowels to occur within a morpheme.^[214]

In addition, it seems to be possible in some cases for a VV sequence to be pronounced in one syllable even across a morpheme boundary. For example, 歯医者, haisha, 'dentist' is morphologically a compound of 歯, ha, 'tooth' and 医者, isha, 'doctor' (itself composed of the morphemes 医, i, 'medical' and 者, i, 'person'); despite the morpheme boundary between /a/ and /i/ in this word, they seem to be pronounced in one syllable as a diphthong, making it a homophone with 敗者, haisha, 'defeated person'.^[215] Likewise, the morpheme /i/ used as a suffix to form the dictionary form (or affirmative nonpast-tense form) of an i-adjective is almost never pronounced as a separate syllable; instead, it combines with a preceding stem-final /i/ to form the long vowel [iː], or with a preceding stem-final /a/, /o/ or /u/ to form a diphthong.^[216]

Superheavy syllables

Syllables of three or more moras, called "superheavy" syllables, are uncommon and exceptional (or "marked"); the extent to which they occur in Japanese words is debated.^[217] Superheavy syllables never occur within a morpheme in Yamato or Sino-Japanese.^[218] Apparent superheavy syllables can be found in certain morphologically derived Yamato forms (including inflected verb forms where a suffix starting with /t/ is attached to a root ending in -VVC-, derived adjectives in っぽい, -ppoi, or derived demonyms in っこ, -kko) as well as in many loanwords.^[218]^[219]

Apparent superheavy syllables
Syllable type	Examples
Syllable type	Morphologically complex forms	Loanwords
(C)(j)VRN		English: green → Japanese: グリーン, romanized: gurīn^[220]
(C)(j)V₁V₂N		English: Spain → Japanese: スペイン, romanized: supein^[220]
(C)(j)VRQ	通った, tootta, 'pass-PAST'^[221]^[222] 東京っ子, tōkyōkko, 'Tokyoite'^[223]
(C)(j)V₁V₂Q	入って, haitte, 'enter-GERUNDIVE'^[222] 仙台っ子, sendaikko, 'Sendai-ite'^[224]
C)(j)VNQ	ロンドンっ子, rondonkko, 'Londoner',^[221]^[224] ドラえもんっぽい, doraemonppoi, 'like Doraemon'^[224]
C)(j)VRNQ	ウィーンっ子, uiinkko, 'Wiener',^[224] ウィーンって言った, uiintte itta, 'Vienna, (s)he said'^[224]

According to some accounts, certain forms listed in the above table may be avoided in favor of a different pronunciation with an ordinary heavy syllable (by reducing a long vowel to a short vowel or a geminate to a singleton consonant). Vance (1987) suggests there might be a strong tendency to reduce superheavy syllables to the length of two moras in speech at a normal conversational speed, saying that tooQta is often indistinguishable from toQta.^[225] Vance (2008) again affirms the existence of a tendency to shorten superheavy syllables in speech at a conversational tempo (specifically, to replace VRQ with VQ, VRN with VN, and VNQ with VN), but stipulates that the distinctions between 通った, tootta and 取った, totta; シーン, shiin and 芯, shin; and コンテ, konte, 'script' and 紺って, kontte, 'navy blue-QUOTATIVE' are clearly audible in careful pronunciation.^[226] Ito and Mester explicitly deny that there is a general tendency to shorten the long vowel of forms such as tootte in most styles of speech.^[222]^[227] Ohta (1991) accepts superheavy syllables ending in /RQ/ and /JQ/ but describes /NQ/ as hardly possible, stating that he and the majority of the informants he consulted judged examples such as /roNdoNQko/ to be questionably well-formed in comparison to /roNdoNko/.^[228]

It has also been argued that in some cases, an apparent superheavy syllable might actually be a sequence of a light syllable followed by a heavy syllable.

Kubozono (2015c) argues that /VVN/ sequences are generally syllabified as /V.VN/, citing forms where pitch accent is placed on the second vowel such as スペイン風邪, supeiꜜnkaze, 'Spanish influenza', リンカーン杯, rinkaaꜜnhai, 'Lincoln Cup', グリーン車, guriiꜜnsha, 'Green Car' (first-class car of a train) (syllabified per Kubozono as su.pe.in.ka.ze, rin.ka.an.hai, gu.ri.in.sha).^[229]^[230] Ito & Mester (2018) state that compounds formed from words of this shape often exhibit variable accentuation, citing guriꜜinsha~guriiꜜnsha, Uターン率, yuutaaꜜnritsu~yuutaꜜanritsu, 'U-turn percentage', and マクリーン館, makuriiꜜnkan ~ makuriꜜinkan, 'McLean Building'.^[231]

Ito & Mester (2015b) note that the pitch-based criterion for syllabifying VV sequences would suggest that Sendaiꜜkko is syllabified as Sen.da.ik.ko;^[224] likewise, Ohta (1991) reports a suggestion by Shin’ichi Tanaka (per personal communication) that the accentuation tookyooꜜkko implies the syllable division -kyo.oQ-, although Ohta favors the analysis with a superheavy syllable based on intuitition that this word contains a long vowel and not a sequence of two separate vowels.^[232] Ito and Mester ultimately question whether the placement of pitch accent on the second mora really rules out analyzing a three-mora sequence as a single superheavy syllable.

The word rondonkko has a pronunciation where the pitch accent is placed on /N/:^[232]^[224] /roNdoNꜜQko/. Vance (2008) interprets /NꜜQ/ here as its own syllable, separate from the preceding vowel, while stating that a variant pronunciation /roNdoꜜNQko/, with a superheavy syllable /doꜜNQ/, also exists.^[188] Ito and Mester consider the syllabification ron.do.nk.ko implausible,^[224] and propose that pitch accent, rather than always falling on the first mora of a syllable, may fall on the penultimate mora when a syllable is superheavy.^[233] Per Kubozono (2015c), the superheavy syllable in toꜜotta bears accent on its first mora.^[234]

Evidence for the avoidance of superheavy syllables includes the adaptation of foreign long vowels or diphthongs to Japanese short vowels before /N/ in loanwords such as the following:

English: foundation → Japanese: ファンデーション, romanized: fandēshon

English: stainless → Japanese: ステンレス, romanized: sutenresu

English: corned beef → Japanese: コンビーフ, romanized: konbīfu^[235]

There are exceptions to this shortening: /ai/ seems to never be affected, and /au/, although often replaced with /a/ in this context, can be kept, as in the following words:^[236]

English: sound → Japanese: サウンド, romanized: saundo

English: mountain → Japanese: マウンテン, romanized: maunten^[237]

Vowelless syllables

Some analyses recognize vowelless syllables in restricted contexts.

Kawahara & Shaw (2018) argue that high vowel deletion may produce syllabic fricatives or affricates.^[238]
Per Vance (2008), /N/ is syllabic in the marginal circumstances where it occurs word-initially, such as ン十億, njūoku, 'several billion';^[239] Vance also considers /NQ/ to constitute its own syllable in the exceptional form rondonkko /roNdoNꜜQko/ (alternatively analyzed as containing a superheavy syllable; see above) due to the placement of the pitch accent on /N/.^[188]

Phonotactics

Further information: Hiragana, Katakana, and Transcription into Japanese

Within a mora

Phonotactically legal phoneme sequences, each counting as one mora
	/-a/	/-i/	/-u/	/-e/	/-o/	/-ja/	/-ju/	/-jo/
/∅-/	/a/	/i/	/u/ [ɯ]	/e/	/o/	/ja/	/ju/ [jɯ]	/jo/
/k-/	/ka/	/ki/ [kʲi]	/ku/ [kɯ]	/ke/	/ko/	/kja/ [kʲa]	/kju/ [kʲɨ]	/kjo/ [kʲo]
/ɡ-/	/ɡa/	/ɡi/ [ɡʲi]	/ɡu/ [ɡɯ]	/ɡe/	/ɡo/	/ɡja/ [ɡʲa]	/ɡju/ [ɡʲɨ]	/ɡjo/ [ɡʲo]
/s-/	/sa/	/si/ [ɕi]	/su/ [sɨ]	/se/	/so/	/sja/ [ɕa]	/sju/ [ɕɨ]	/sjo/ [ɕo]
/z-/	/za/ [(d)za]	/zi/ [(d)ʑi]	/zu/ [(d)zɨ]	/ze/ [(d)ze]	/zo/ [(d)zo]	/zja/ [(d)ʑa]	/zju/ [(d)ʑɨ]	/zjo/ [(d)ʑo]
/t-/	/ta/	/ti/ [tɕi]	/tu/ [tsɨ]	/te/	/to/	/tja/ [tɕa]	/tju/ [tɕɨ]	/tjo/ [tɕo]
/d-/	/da/	(/di/) [(d)ʑi]	(/du/) [(d)zɨ]	/de/	/do/	(/dja/) [(d)ʑa]	(/dju/) [(d)ʑɨ]	(/djo/) [(d)ʑo]
/n-/	/na/	/ni/ [ɲi]	/nu/ [nɯ]	/ne/	/no/	/nja/ [ɲa]	/nju/ [ɲɨ]	/njo/ [ɲo]
/h-/	/ha/	/hi/ [çi]	/hu/ [ɸɯ]	/he/	/ho/	/hja/ [ça]	/hju/ [çɨ]	/hjo/ [ço]
/b-/	/ba/	/bi/ [bʲi]	/bu/ [bɯ]	/be/	/bo/	/bja/ [bʲa]	/bju/ [bʲɨ]	/bjo/ [bʲo]
/p-/	/pa/	/pi/ [pʲi]	/pu/ [pɯ]	/pe/	/po/	/pja/ [pʲa]	/pju/ [pʲɨ]	/pjo/ [pʲo]
/m-/	/ma/	/mi/ [mʲi]	/mu/ [mɯ]	/me/	/mo/	/mja/ [mʲa]	/mju/ [mʲɨ]	/mjo/ [mʲo]
/r-/	/ra/ [ɾa]	/ri/ [ɾʲi]	/ru/ [ɾɯ]	/re/ [ɾe]	/ro/ [ɾo]	/rja/ [ɾʲa]	/rju/ [ɾʲɨ]	/rjo/ [ɾʲo]
/w-/	/wa/ [β̞a]
Marginal combinations mostly found in Western loans^[240]
[ɕ-]				[ɕe]
[(d)ʑ-]				[(d)ʑe]
[t-]		[tʲi]	[tɯ]				[tʲɨ]
[tɕ-]				[tɕe]
[ts-]	[tsa]	[tsʲi]		[tse]	[tso]
[d-]		[dʲi]	[dɯ]				[dʲɨ]
[ɸ-]	[ɸa]	[ɸʲi]		[ɸe]	[ɸo]		[ɸʲɨ]
[j-]				[je]
[β̞-]		[β̞i]		[β̞e]	[β̞o]
Special moras
/V-/	/N/ [ɴ, m, n, ɲ, ŋ, ɰ̃]
/V-C/	/Q/ (geminates the following consonant)
/V-/	/R/ [ː]

Palatals

Japanese syllables may start with the palatal glide /j/ or with consonant + /j/ clusters. These onsets normally can be found only before the back vowels /a o u/.^[81]

Before /i/, /j/ never occurs.^[81] All consonants are phonetically palatalized before /i/, but do not contrast in this position with unpalatalized consonants: as a result, palatalization in this context can be analyzed as allophonic. In native Japanese vocabulary, coronal obstruent phones (i.e. [t s d (d)z]) do not occur before /i/, and in contexts where a morphological process such as verb inflection would place a coronal obstruent phoneme before /i/, the coronal is replaced with an alveolo-palatal sibilant, resulting in alternations such as [matanai] 'wait' (negative) vs. [matɕimasɯ] 'wait' (polite) or [kasanai] 'lend' (negative) vs. [kaɕimasɯ] 'lend' (polite).^[109] Thus, [tɕ ɕ (d)ʑ] function in native vocabulary as the palatalized counterparts of coronal consonant phonemes. However, the analysis of alveolo-palatal sibilants as palatalized allophones of coronal consonants is complicated by loanwords. The sequences [ti di] are distinguished from [tɕi (d)ʑi] in recent loanwords (with [ti] generally preserved in words borrowed more recently than 1930^[241]) and to a lesser extent, some speakers may exhibit a contrast in loanwords between [tsi (d)zi si] and [tɕi (d)ʑi ɕi].

Before /e/, [j] was lost in the current standard language, but some dialects (such as Kyushu) and pre-modern versions of the language contain [je], as well as exhibiting [ɕe] in place of modern standard [se].^[82] As discussed above, the sequences [tɕe (d)ʑe ɕe] do not occur in standard Japanese outside of foreign loanwords and a few marginal exclamations. There are no morphological alternations motivated by this gap,^[242] since no morphemes have an underlying form ending in [tɕ (d)ʑ ɕ]. In borrowed words, [tɕe] has been used consistently at all time periods (the use of [se] in セロ sero from cello seems to be a unique exception).^[243]^[115] Another rare exception, showing adaptation to [tɕi] (vowel raising), is チッキ (chikki) from English check (less common than チェック (chekku)).^[244] The sequences [(d)ʑe] and [ɕe] tend to be found in words borrowed more recently than around 1950, whereas words borrowed before that point may show depalatalization to [(d)ze] and [se] respectively,^[244] as seen in the 19th-century borrowed forms ゼリー (zerī) from English jelly, ゼントルマン (zentoruman) from English gentleman,^[245] and セパード (sepādo) from English shepherd.^[246]

The use of [je] in loanwords is inconsistent: adapted pronunciations with [ie] (イエ), such as イエローカード ierōkādo from English yellow card,^[247]^[248] continue to be used even for recent borrowings. In theory, pronunciations with [je] can be represented by the spelling イェ (mostly used to transcribe proper nouns), although it's not clear that the use of the spelling イェ necessarily corresponds to how speakers phonetically realize the sequence.^[247] Foreign [je] may also be adapted as /e/ in some cases.^[115] For some speakers, the optional, colloquial coalescence of certain other vowel sequences to [eː] can produce [jeː] in native forms, such as [hajeː] (a variant pronunciation of /hajai/ 'fast').^[249]

The sequences [ɸʲɯ dʲɯ tʲɯ] occur only in recent loans, such as フュージョン (fyūjon), デュエット (dyuetto), テューバ (tyūba) from fusion, duet, tuba: they can be interpreted as /fju dju tju/ in analyses where [tɕ] is not interpreted as /tj/.^[250]

Pre-/u/ consonants

Several Japanese consonants developed special phonetic values before /u/. Though originally allophonic, some of these variants have arguably attained phonemic status because of later neutralizations or the introduction of novel contrasts in loanwords.

In core vocabulary, [ɸɯ] can be analyzed as an allophonic realization of /hu/.^[74] However, in words of foreign origin, the voiceless bilabial fricative [ɸ] can occur before vowels other than /u/. This introduces a distinctive contrast between [ɸa ɸe ɸi ɸo] and [ha he çi ho]; therefore, Vance (2008) recognizes [ɸ] as a distinct consonant phoneme /f/, and interprets [ɸɯ] as phonemically /fu/, leaving */hu/ as a gap.^[251] In contrast, Watanabe (2009) prefers the analysis /hu/ and argues that /h/ in this context is distinct phonemically and sometimes phonetically from the /f/ [ɸ] found in foreign /fa fe fi fo/^[252] (which would leave */fu/ as a gap). In any case, /h/ and /f/ do not contrast before /u/.

Outside of loanwords, [tɯ] and [dɯ] do not occur, because /t d/ were affricated to [ts dz] before /u/.^[253]

In dialects that show neutralization of the [dz z] contrast, the merged phone [(d)z] can occur before /a, e, o/ as well as before /u/. Thus, for these dialects, [(d)zɯ] can be phonemically analyzed as /zu/, leaving /du/ as a gap.^[254]

In core vocabulary, the voiceless coronal affricate [ts] occurs only before the vowel /u/; thus [tsɯ] can be analyzed as an allophonic realization of /tu/.^[74] Verb inflection shows alternations between [t] and [ts], as in [katanai] 'win' (negative) and [katsɯ] 'win' (present tense).^[74] However, the interpretation of [tsɯ] as /tu/ (with [ts] merely an allophone of /t/) is complicated by the occurrence of [ts] before vowels other than /u/ in loanwords.^[75]

In addition, unaffricated [tɯ dɯ] are sometimes used in recent loanwords. They can be represented in kana by トゥ and ドゥ, which received official recognition by a cabinet notice in 1991 as an alternative to the use of [tsɯ] [(d)zɯ] or [to] [do] to adapt foreign [tu] [du].^[255] Forms where [tɯ] and [dɯ] can be found include the following:

English: Today → [tɯdei]

French: toujours [tuʒuʀ] → [tɯ(d)ʑɯːɾɯ]

French: douze [duz] → [dɯːzɯ]^[256]

Older loanwords from French display adaptation of [tɯ] as [tsɯ] and of [dɯ] as [do]:

French: Toulouse [tuluz] → [tsɯːɾɯːzɯ]

French: Pompidou [pɔ̃pidu] → [pompidoː]^[257]

Vance (2008) argues that [tɯ] and [dɯ] remain "foreignisms" in Japanese phonology;^[258] they are less frequent than [ti di],^[259] and this has been interpreted as evidence that a constraint against *[tɯ] remained active in Japanese phonology for longer than the constraint against *[ti].^[260]

In both old and recent loanwords, the epenthetic vowel used after word-final or pre-consonantal /t/ or /d/ is normally /o/ rather than /u/ (there is also some use of [tsɯ] and [(d)zɯ]^[261]). However, adapted forms show some fluctuation between [to do] and [tɯ dɯ] in this context, e.g. French estrade [estʀad] 'stage', in addition to being adapted as /esutoraddo/, has a variant adaptation /esuturaddu/.^[256]

Between moras

Special moras

If analyzed as phonemes, the moraic consonants /N/ and /Q/ show a number of phonotactic restrictions (although some constraints can be violated in certain contexts, or may apply only within certain layers of Japanese vocabulary).

/N/

In general, the moraic nasal /N/ can occur between a vowel and a consonant, between vowels (where it contrasts with non-moraic nasal onsets), or at the end of a word.

In Sino-Japanese vocabulary, /N/ can occur as the second and final mora of a Sino-Japanese morpheme.^[262] It may be followed by any other consonant or vowel. However, in some contexts Sino-Japanese morpheme-final /N/ may cause changes to the start of a closely connected following morpheme:

Within a bimorphemic Sino-Japanese word, /h/ is regularly replaced with /p/ after /N/, as shown by the different pronunciation of 輩 in 後輩, kōhai, 'one's junior' versus 先輩, senpai, 'one's senior'.^[263] This does not affect /Nh/ across word boundaries or across the juncture in the middle of a "complex compound" where the first or second element is a prosodic word composed of more than one Sino-Japanese morpheme: for example, /h/ remains unchanged in 完全敗北, kan+zen#hai+boku, 'total defeat', 新発明, shin#hatsu+mei, 'new invention',^[264] and 疑問符, gi+mon#fu, 'question mark'.^[265]
Some words where /N/ is followed by a morpheme that starts in modern Japanese with a vowel or semivowel developed a pronunciation with a geminate nasal (/Nn/ or /Nm/) as the result of historic sound changes (see renjō). Aside from these isolated exceptions, /N/ followed by a vowel is regularly pronounced without resyllabification in Sino-Japanese compounds.^[266]
A following /t k h s/ is sometimes changed to /d ɡ b z/; this can be interpreted as a special case of the more general sound change of rendaku.^[267]

Although usually not found at the start of a word, initial /N/ can occur in some colloquial speech forms as a result of dropping of a preceding mora.^[268] In this context, its pronunciation is invariably assimilated to the place of articulation of the following consonant:

/naN bjaku neN/ → /N bjaku neN/ [mbjakɯneɴ] 'several hundred years'

/soNna koto/ → /Nna koto/ [nnakoto] 'such thing'^[269]

Initial /N/ may also be used in some loanword forms:

[n.dʑa.me.na]~[ɴ.dʑa.me.na] 'N'Djamena (proper noun)'^[269]

(This place name has an alternative pronunciation with an epenthetic /u/ inserted before the /N/.^[270])

/Q/

The moraic obstruent /Q/ generally occurs only between a vowel and a consonant in the middle of a word. However, word-initial geminates may occur in casual speech as the result of elision:

/mattaku/ ('entirely; totally', an expression of exasperation) → [ttakɯ]

/usseena/ ('shut up') → [sseena]^[271]

In native Japanese vocabulary, /Q/ is found only before /p t k s/^[272] (this includes [ts], [tɕ] and [ɕ], which can be viewed as allophones of /t/ and /s/); in other words, before voiceless obstruents other than /h/. The same generally applies to Sino-Japanese vocabulary. In these layers of vocabulary, [pp] functions as the geminate counterpart of /h/, due to the historical development of Japanese /h/ from Old Japanese [p].^[273]

Tamaoka & Makioka (2004) found that in a Japanese newspaper corpus, /Q/ was followed over 98% of the time by one of /p t k s/: however, there were also at least some cases where it was followed by /h b d ɡ z r/.^[274]

Geminate /h/ is found only in recent loanwords (e.g. ゴッホ, Gohho, '(van) Gogh', バッハ, Bahha, 'Bach'), and rarely in Sino-Japanese or mixed compounds (e.g. 十針, juhhari, 'ten stitches', 絶不調, zeffuchō, 'terrible slump').^[275]

Voiced obstruents (/b d ɡ z/) do not occur as geminates in native Japanese words.^[276] Avoidance of voiced obstruents in native words can be seen in certain morphological processes that cause voiceless obstruents to geminate but cause voiced obstruents to be preceded by the moraic nasal /N/.

However, voiced geminate obstruents have been used in words adapted from foreign languages since the 19th century.^[277] These loanwords can even come from languages, such as English, that do not feature gemination in the first place. For example, when an English word features a coda consonant preceded by a lax vowel, it can be borrowed into Japanese with a geminate; gemination may also appear as a result of borrowing via written materials, where a word spelled with doubled letters leads to a geminated pronunciation.^[278] Because these loanwords can feature voiced geminates, Japanese now exhibits a voice distinction with geminates where it formerly did not:^[279]

スラッガー, suraggā ('slugger') vs. surakkā ('slacker')

キッド, kiddo ('kid') vs. kitto ('kit')

The most frequent geminated voiced obstruent is /Qd/, followed by /Qɡ/, /Qz/, /Qb/.^[274] In borrowed words, /d/ is the only voiced stop that is regularly adapted as a geminate when it occurs in word-final position after a lax/short vowel; gemination of /b/ and /ɡ/ in this context is sporadic.^[280]

Phonetically, voiced geminate obstruents in Japanese tend to have a 'semi-devoiced' pronunciation where phonetic voicing stops partway through the closure of the consonant.^[281] High vowels are not devoiced after phonemically voiced geminates.^[281]

In some cases, voiced geminate obstruents can optionally be replaced with the corresponding voiceless geminate phonemes:^[282]^[283]

バッド, baddo → バット, batto, 'bad'^[282]

ドッグ, doggu → ドック, dokku, 'dog'^[282]

ベッド, beddo → ベット, betto, 'bed'^[283]

Phonemic devoicing like this (which may be marked in spelling) has been argued to be conditioned by the presence of another voiced obstruent.^[284]^[285] Another example is doreddo ~ doretto 'dreadlocks'. Kawahara (2006) attributes this to a less reliable distinction between voiced and voiceless geminates compared to the same distinction in non-geminated consonants, noting that speakers may have difficulty distinguishing them due to the partial devoicing of voiced geminates and their resistance to the weakening process mentioned above, both of which can make them sound like voiceless geminates.^[286]

A small number of foreign proper names have katakana spellings that would imply a pronunciation with /Qr/, such as アッラー, arrā, 'Allah' and チェッリーニ, Cherrīni, 'Cellini'.^[287] The phonetic realization of /Qr/ in such forms varies between a lengthened sonorant sound and a sequence of a glottal stop followed by a sonorant.^[288]

Aside from loanwords, consonants that cannot normally occur after /Q/ may be geminated in certain emphatic variants of native words.^[289] Reduplicative mimetics may be used in an intensified form where the second consonant of the first portion is geminated, and this can affect consonants that otherwise do not occur as geminates, such as /r/ (as in barra-bara, 'in disorder', borro-boro, 'worn out', gurra-gura, 'shaky', karra-kara, 'dry', perra-pera, 'thin') or /j/ (as in buyyo-buyo, 'flabby').^[290] Adjectives may take an emphatic pronunciation where the second consonant is geminated and the following vowel is lengthened, as in naggaai < nagai, 'long', karraai < karai, 'hot', kowwaai < kowai, 'dreadful'.^[290] Similarly, per Vance (2008), /Qj/ and /Qm/ can occur in emphatic pronunciations of 速い, hayai, 'fast' and 寒い, samui, 'cold' as [haʔːjai] and [saʔːmɯi].^[291] A 2020 study of geminate production in mimetic forms found that emphatically lengthened /r/ could be pronounced either as a lengthened sonorant with uninterrupted voicing, or with some amount of laryngealization such as glottal stop insertion.^[292] Another noteworthy characteristic of emphatically lengthened consonants is the potential for a greater than two-way distinction in length.^[293]^[289]

Atypical /Q/ + consonant sequences may also arise in truncated word forms (created by blending some moras from each word in a longer phrase) and in forms produced as the outcome of word games:^[288]

カットモデル, katto moderu, 'cut model' /kaQto moderu/ → kadderu /kaQderu/ (blend)^[288]

バット, batto, 'bat' /baQto/ → tobba /toQba/ (form produced in a reversing language game)^[288]

Vowel sequences and long vowels

Vowel sequences with no intervening consonant (VV sequences) occur in many contexts:

Any pair of vowels can occur in sequence across morpheme boundaries, or within a morpheme in foreign words.^[294]
The sequences /ai oi ui ie ae oe ue io ao uo/ can be found within a morpheme in indigenous or Sino-Japanese words.^[295] Youngberg (2021) also includes /eo/, as in 夫婦, meoto, 'husband and wife', and /ia/, as in 夫婦, shiawase, 'happy'.^[296]
Within a Sino-Japanese morpheme, the only vowel sequences that can normally be found are /ai ui/ (as sequences of non-identical vowels) or [eː oː ɯː] (as long vowels).^[297] Sino-Japanese [eː] is historically derived from /ei/ and may variably be realized phonetically as [ei] (possibly due to spelling pronunciation) rather than as the long vowel [eː].^[156]

When the first of two vowels in a VV sequence is higher than the second, there is often not a clear distinction between a pronunciation with hiatus and a pronunciation where a glide with the same frontness as the first vowel is inserted before the second: i.e., the VV sequences /ia io ua ea oa/ may sound like /ija ijo uwa eja owa/.^[298] For example, English gear has been borrowed into Japanese as ギア, gia, 'gear', but an alternative form of this word is ギヤ, giya.^[299] Per Kawahara (2003), the sequences /eo eu/ are not pronounced like *[ejo ejɯ]. The sequence /iu/ is not pronounced like *[ijɯ], but it is sometimes replaced with [jɯː]:^[298] this change is optional in loanwords.^[300] Kawahara states that the formation of a glide between /ia io ua ea oa/ may be blocked by a syntactic boundary or by some (though not all) morpheme boundaries (Kawahara suggests that apparent cases of glide formation across morpheme boundaries are best interpreted as evidence that the boundary is no longer transparent).^[298]

Many long vowels historically developed from vowel sequences by coalescence, such as /au ou eu iu/ > [oː oː joː jɯː]. In addition, some vowel sequences in contemporary Japanese may optionally undergo coalescence to a long vowel in colloquial or casual speech (for some sequences, such as /oi/ and /ui/, coalescence is not possible in all contexts, but only in adjective forms):^[301]

/ai/ > [eː]	/itai/ > [iteː]	痛い, itai, 'painful, ouch'
/oi/ > [eː]	/suɡoi/ > [sɯɡeː]	凄い, sugoi, 'great'^[302]

Within words and phrases, Japanese allows long sequences of phonetic vowels without intervening consonants.^[303] Sequences of two vowels within a single word are extremely common, occurring at the end of many i-type adjectives, for example, and having three or more vowels in sequence within a word also occurs, as in あおい, aoi, 'blue/green'. In phrases, sequences with multiple o sounds are most common, due to the direct object particle を, wo (which comes after a word) being realized as o and the honorific prefix お〜, o, which can occur in sequence, and may follow a word itself terminating in an o sound; these may be dropped in rapid speech. A fairly common construction exhibiting these is 「〜をお送りします」, wo o-okuri-shimasu, '...humbly send...'. More extreme examples follow:

/hoː.oː.o.o.oː/ [hoː.oː.o.o.oː]	hōō o oō (鳳凰（ほうおう）を追（お）おう)	'let's chase the fenghuang'
/toː.oː.o.oː.oː/ [toː.oː.o.oː.oː]	tōō o ōō (東欧（とうおう）を覆（おお）おう)	'let's cover Eastern Europe'

Distribution of consonant phonemes based on word position

In Yamato vocabulary, certain consonant phonemes, such as /p/, /h/, /r/ and voiced obstruents, tend to be found only in certain positions in a word.^[304] None of these restrictions applies to foreign vocabulary; some do not apply to mimetic or Sino-Japanese vocabulary; and certain generalizations have exceptions even within Yamato vocabulary; nevertheless, some linguists interpret them as still playing a role in Japanese phonology, based on the model of a "stratified" lexicon where some active phonological constraints affect only certain layers of the vocabulary. The gaps in the distribution of these consonant phonemes can also be explained in terms of diachronic sound changes.

The voiced obstruents /b d ɡ z/ occur without restriction at the start of Sino-Japanese and foreign morphemes,^[305] but usually do not occur at the start of Yamato words^[306]^[307] (although suffixes or postposed particles starting with these sounds have been in use since Old Japanese, such as the case particle ga,^[308] and morphemes that underlyingly start with a voiceless obstruent often have allomorphs that start with a voiced obstruent in the context of rendaku). However, word-initial /b d ɡ z/ occur frequently in the mimetic stratum of native Japanese vocabulary, where they often function as sound-symbolic variants of their voiceless counterparts /p h t k s/.^[309] In addition, some non-mimetic Yamato words show a voiced initial obstruent; in some cases, voicing seems to have had an expressive function, adding a negative or pejorative shade to a root.^[310]^[311] There are also some Yamato forms where a word-initial voiced obstruent developed from the loss of an original word-initial high vowel, or from changes involving an original word-initial nasal.^[312] Diachronically, the scarcity of word-initial voiced obstruents in native Japanese words seems to be a consequence of their origin from Proto-Japonic sequences involving a nasal phoneme followed by an obstruent phoneme, which developed to prenasalized consonants in Old Japanese.^[313]

Yamato and mimetic words almost never start with /r/.^[314] In contrast, word-initial /r/ occurs without restriction in Sino-Japanese and foreign vocabulary.

In Yamato words, /p/ occurs only as a word-medial geminate (or equivalently, only after /Q/) as in 河童, kappa. In Sino-Japanese words, /p/ occurs only after /Q/ or /N/ (as in 切腹, seppuku, 北方, hoppō, 音符, onpu), alternating with /h/ in other positions. In contrast, mimetic words can contain singleton /p/, either word-initially or word-medially.^[315] Singleton /p/ also occurs freely in foreign words,^[316] such as パオズ, paozu, ペテン, peten, パーティー, pātī. The gap in the distribution of singleton [p] results from the fact that original *p developed in Japanese to [ɸ] in word-initial position and to /w/ in intervocalic position, resulting in [p] being retained only as part of the geminate [pː] or after /N/.^[317] (The fricative [ɸ] remained labial before all vowels up through Late Middle Japanese, but was eventually debuccalized to [h] before any vowel other than /u/, resulting in the modern Japanese /h/ phoneme. The glide /w/ was eventually lost before any vowel other than /a/.) The few non-mimetic words where /p/ occurs initially include 風太郎, pūtarō, although as a personal name it is still pronounced Fūtarō.

The phoneme /h/ is rarely found in the middle of a Yamato morpheme (a small number of exceptions exist, such as afureru, 'overflow', ahiru, 'duck', yahari, 'likewise') or in the middle of a mimetic root (examples are mostly confined to mimetics that imitate "gutteral" or "laryngeal" sounds, such as goho-goho, 'coughing' and ahaha, 'laughing').^[314] This gap results from the aforementioned development of original *p to /w/, rather than /h/, in intervocalic position.^[318] Likewise, /h/ never occurs in the middle of a Sino-Japanese morpheme.^[319]

Epenthetic vowels

Further information: Transcription into Japanese and Sino-Japanese vocabulary § Rimes (medials and finals)

Words of foreign origin are systematically adapted to Japanese phonotactics by inserting an epenthetic vowel (usually /u/) after a word-final consonant or between adjacent consonants. While /u/ is inserted after the majority of consonants, it is usual to use /o/ after [t, d] and /i/ after [tʃ, dʒ] (but usually not after [ʃ]). After /hh/ (used to adapt foreign word-final [x]) the epenthetic vowel is often /a/ or /o/, echoing the quality of the vowel before the consonant. There are some deviations from the aforementioned patterns, such as use of /i/ after [k] in some older borrowings.^[320] The use of epenthetic vowels in these contexts is an established convention of Japanese writing, embedded in the standard rules for using kana to transcribe foreign words or names.

Historically, Sino-Japanese morphemes developed epenthetic vowels after most syllable-final consonants. This is usually /u/, in some cases /i/: the identity of the epenthetic vowel is largely, although not completely, predictable from the preceding consonant and vowel.^[321] It is debated whether these vowels should be regarded as having epenthetic status in the phonology of modern Japanese.^[322] The use of epenthetic vowels in Sino-Japanese forms has undergone some changes over time: for example, the descriptions of Portuguese missionaries indicate that in previous stages of the language, Sino-Japanese morphemes could end in coda [t] with no epenthetic vowel.^[323]

Morphophonology

Japanese morphology is generally agglutinative rather than fusional. Nevertheless, Japanese exhibits a number of morphophonological processes that can change the shape of morphemes when they are combined in compounds, derived words, or inflected forms of verbs or adjectives. Various forms of sandhi exist; the Japanese term for sandhi generally is ren'on (連音).

Rendaku

Main article: Rendaku

In Japanese, sandhi is prominently exhibited in rendaku – consonant mutation of the initial consonant of a morpheme from unvoiced to voiced in some contexts when it occurs in the middle of a word. This phonetic difference is marked in the kana spelling of a word via the addition of dakuten, as in ka, ga (か／が). In cases where this combines with the yotsugana mergers, notably ji, dzi (じ／ぢ) and zu, dzu (ず／づ) in standard Japanese, the resulting spelling is morphophonemic rather than purely phonemic.

Yamato gemination or prenasalization

Certain processes, such as onbin sound changes, have acted to produce voiceless geminates in Yamato words (often across morpheme boundaries, but sometimes even within a morpheme). Gemination can arise as the result of emphasis, compounding, or verb conjugation. In this context, sequences of a moraic nasal /N/ and a voiced consonant are found in place of voiced geminate obstruents, which do not occur in native Standard Japanese words (other than marginally as emphatically lengthened variants of single voiced obstruents).

For example, adverbs built from a mimetic root and the suffix -ri may display root-internal gemination,^[324] as in nikkori (alongside nikori) from niko 'smiling'. Adverbs derived from roots with voiced medial consonants exhibit forms with a moraic nasal in place of gemination, such as shonbori from shobo 'lonely', unzari from uza 'bored, disappointed', bon'yari from boya 'vague', and funwari from fuwa 'light' (/r/ does not undergo either gemination or /N/-insertion in this context).^[325] Likewise, a moraic consonant often occurs between the emphatic prefix /ma/ and a following consonant: its allomorphs /maQ/ and /maN/ are in complementary distribution, with /maQ/ used before voiceless consonants and /maN/ used elsewhere.^[326]

Another example where either a voiceless geminate or /N/ is formed depending on the voicing of the following consonant is the derivation of reduced, i.e. contracted, compound verbs. Japanese has a type of compound verb formed by placing the stem of one verb before another. If the first verb has a stem that ends in a consonant, the vowel /i/ is usually placed between the first and second verb stem. But in some compounds, this vowel can be omitted, resulting in the final consonant of the first verb stem being placed directly before the initial consonant of the second verb stem.^[327] When this happens, the first consonant assimilates to the second, producing a voiceless geminate if the second is voiceless, and a sequence starting with /N/ if the second is a voiced obstruent or nasal (e.g. hik- 'pull' + tate- 'stand' > hikitateru~hittateru 'support', tsuk- 'stab' + das- 'put out' > tsukidasu~tsundasu 'thrust out'^[328]).^[329]

In verb conjugation, the voiceless geminate /Qt/ is produced when a verb root that underlyingly ends in /r/, /t/, or /w/ is followed by a suffix starting with /t/ (namely, -te, -ta, -tari, -tara, -tatte^[330]), whereas /Nd/ is produced when a verb root that underlyingly ends in /m/, /n/, or /b/ is followed by a suffix starting with /t/.^[331] (At the end of a verb stem, /w/ descends from original *p; some generative analyses interpret this as the synchronic underlying form of the consonant.^[332])

Sino-Japanese gemination

When the second mora of a Sino-Japanese morpheme is つ, tsu, く, ku, ち, chi or き, ki and it is followed by a voiceless consonant, this mora is sometimes replaced by the sokuon っ (whose spelling as a small つ is based on the frequent alternation of these sounds in this context), forming a geminate consonant:

一 (いつ itsu) + 緒 (しょ sho) = 一緒 (いっしょ issho)
学 (がく gaku) + 校 (こう kō) = 学校 (がっこう gakkō)

Sino-Japanese morphemes ending in these moras remain unchanged when followed by a voiced consonant, and are usually unchanged when followed by a vowel (but see renjō for exceptional examples of geminate formation before a vowel).

学 (がく gaku) + 外 (がい -gai) = 学外, gakugai, 'outside of school campus'
別 (べつ betsu) + 宴 (えん en) = 別宴, betsuen, 'farewell dinner'
学 (がく gaku) + 位 (い i) = 学位, gakui, 'academic degree'^[333]

Gemination can also affect Sino-Japanese morphemes that historically ended in ふ, fu and that now end in long vowels:

法 (hafu はふ > hō ほう) + 被 (hi ひ) = 法被 (happi はっぴ), instead of hōhi ほうひ
合 (kafu かふ > gō ごう) + 戦 (sen せん) = 合戦 (kassen), instead of gōsen
入 (nifu > nyū) + 声 (shō) = 入声 (nisshō), instead of nyūshō
十 (jifu > jū) + 戒 (kai) = 十戒 (jikkai) instead of jūkai

Most morphemes exhibiting this change derive from Middle Chinese morphemes ending in /t̚/, /k̚/ or /p̚/, which developed a prop vowel after them when pronounced in isolation (e.g., 日 MC */nit̚/ > Japanese /niti/ [ɲitɕi]) but were assimilated to the following consonant in compounds (e.g. 日本 MC */nit̚.pu̯ən/ > Japanese /niQ.poN/ [ɲip̚.poɴ]).

Gemination occurs regularly in words consisting of two Sino-Japanese morphemes, but tends not to occur across the major boundary of a complex compound (where one of the components is formed of more than one Sino-Japanese morpheme). However, there are some cases of gemination in this context.^[334]

The formation of a geminate also depends on the identity of the first and second consonant:

/tu/ つ, tsu	Systematically becomes っ /Q/ before any voiceless obstruent (/p~h t k s/).^[335]
/ku/ く, ku	Systematically becomes っ /Q/ before /k/. The numeral /roku/ also becomes /roQ/ before /p~h/. Otherwise, remains /ku/.^[336]
/ti/ ち, chi	May become っ /Q/ before any voiceless obstruent, but some morphemes, such as the numerals /siti/ and /hati/, do not consistently undergo this change.^[336] Only a small number of Sino-Japanese characters have a reading with /ti/ that is in common use.
/ki/ き, ki	May become っ /Q/ before /k/, but this is not systematic; many words show variation between /ki/ and /Q/.^[337] The form /seki/~/seQ/ (which occurs as a reading of various etymologically unrelated morphemes) shows a higher tendency to undergo gemination than other Sino-Japanese forms ending in /ki/.^[338]

Renjō

Further information: 連声 and Late Middle Japanese § Medial gemination

Sandhi also occurs much less often in renjō (連声), where, most commonly, a terminal /N/ or /Q/ on one morpheme results in /n/ (or /m/ when derived from historical m) or /t̚/ respectively being added to the start of a following morpheme beginning with a vowel or semivowel, as in ten + ō → tennō (天皇: てん + おう → てんのう). Examples:

First syllable ending with /N/

銀杏 (ginnan): ぎん (gin) + あん (an) → ぎんなん (ginnan)
観音 (kannon): くゎん (kwan) + おむ (om) → くゎんのむ (kwannom) → かんのん (kannon)
天皇 (tennō): てん (ten) + わう (wau) → てんなう (tennau) → てんのう (tennō)

First syllable ending with /N/ from original /m/

三位 (sanmi): さむ (sam) + ゐ (wi) → さむみ (sammi) → さんみ (sanmi)
陰陽 (onmyō): おむ (om) + やう (yau) → おむみゃう (ommyau) → おんみょう (onmyō)

First syllable ending with /Q/

雪隠 (setchin): せつ (setsu) + いん (in) → せっちん (setchin)
屈惑 (kuttaku): くつ (kutsu) + わく (waku) → くったく (kuttaku)

Onbin

Spelling changes
Archaic	Modern
あ＋う (a + u) あ＋ふ (a + fu)	おう (ō)
い＋う (i + u) い＋ふ (i + fu)	ゆう (yū)¹
う＋ふ (u + fu)	うう (ū)
え＋う (e + u) え＋ふ (e + fu)	よう (yō)
お＋ふ (o + fu)	おう (ō)
お＋ほ (o + ho) お＋を (o + wo)	おお (ō)
auxiliary verb む (mu)	ん (n)
medial or final は (ha)	わ (wa)
medial or final ひ (hi), へ (he), ほ (ho)	い (i), え (e), お (o) (via wi, we, wo, see below)
any ゐ (wi), ゑ (we), を (wo)	い (i), え (e), お (o)¹

1. usually not reflected in spelling

Further information: Japanese grammar § Euphonic changes (音便, onbin); and Onbin in verb conjugations

Another prominent feature is onbin (音便, euphonic sound change). This refers to various historical sound changes that can be loosely described as showing reduction, lenition or coalescence. Alternations resulting from onbin continue to be seen in some areas of Japanese morphology, such as the conjugation of certain verb forms or the form of certain compound verbs.

In some cases, onbin changes occurred within a morpheme, as in hōki (箒 (ほうき), broom), which underwent two sound changes from earlier hahaki (ははき) → hauki (はうき) (onbin) → houki (ほうき) (historical vowel change) → hōki (ほうき) (long vowel, sound change not reflected in kana spelling).

One type of onbin caused certain onset consonants to be deleted, mainly before /i/ or /u/,^[339] which created vowel sequences, or long vowels by coalescence of /u/ with the preceding vowel.

Another type of onbin resulted in the development of moraic consonants /Q/ or /N/ in certain circumstances in native Japanese words.

Polite adjective forms

Further information: Japanese grammar § Polite forms of adjectives

The polite adjective forms (used before the polite copula gozaru (ござる, be) and verb zonjiru (存じる, think, know)) exhibit a one-step or two-step sound change. Firstly, these use the continuative form, -ku (-く), which exhibits onbin, dropping the k as -ku (-く) → -u (-う). Secondly, the vowel may combine with the preceding vowel, according to historical sound changes; if the resulting new sound is palatalized, meaning yu, yo (ゆ、よ), this combines with the preceding consonant, yielding a palatalized syllable.

This is most prominent in certain everyday terms that derive from an i-adjective ending in -ai changing to -ō (-ou), which is because these terms are abbreviations of polite phrases ending in gozaimasu, sometimes with a polite o- prefix. The terms are also used in their full form, with notable examples being:

arigatō (有難う、ありがとう, Thank you), from arigatai (有難い、ありがたい, (I am) grateful).
ohayō (お早う、おはよう, Good morning), from hayai (早い、はやい, (It is) early).
omedetō (お目出度う、おめでとう, Congratulations), from medetai (目出度い、めでたい, (It is) auspicious).

Other transforms of this type are found in polite speech, such as oishiku (美味しく) → oishū (美味しゅう) and ōkiku (大きく) → ōkyū (大きゅう).

-hito

The morpheme hito (人 (ひと), person) (with rendaku -bito (〜びと)) has changed to uto (うと) or udo (うど), respectively, in a number of compounds. This in turn often combined with a historical vowel change, resulting in a pronunciation rather different from that of the components, as in nakōdo (仲人 (なこうど), matchmaker) (see below). These include:

otōto (弟 (おとうと), younger brother), from otohito (弟人 (おとひと)) 'younger sibling' + 'person' → otouto (おとうと) → otōto.
imōto (妹 (いもうと), younger sister), from imohito (妹人 (いもひと)) 'sister' + 'person' → imouto (いもうと) → imōto.
shirōto (素人 (しろうと), novice), from shirohito (白人 (しろひと)) 'white' + 'person' → shirouto (しろうと) → shirōto.
kurōto (玄人 (くろうと), veteran), from kurohito (黒人 (くろひと)) 'black' + 'person' → kurouto (くろうと) → kurōto.
nakōdo (仲人 (なこうど), matchmaker), from nakabito (仲人 (なかびと)) → nakaudo (なかうど) → nakoudo (なこうど) → nakōdo.
karyūdo (狩人 (かりゅうど), hunter), from karibito (狩人 (かりびと)) → kariudo (かりうど) → karyuudo (かりゅうど) → karyūdo.
shūto (舅 (しゅうと), stepfather), from shihito (舅人 (しひと)) → shiuto (しうと) → shuuto (しゅうと) → shūto.
kurōdo (蔵人 (くろうど), warehouse keeper (archivist, sake/soy sauce/miso maker)), from kurabito (蔵人 (くらびと)) 'storehouse' + 'person' → kurando (くらんど) → kuraudo (くらうど) → kuroudo (くろうど) → kurōdo. kurauzu (くらうず) is also found, as a variant of kuraudo (くらうど).

Notes

References

Bibliography

Akamatsu, Tsutomu (1997), Japanese Phonetics: Theory and Practice, München: Lincom Europa, ISBN 978-3-89586-095-9
Aoyama, Katsura (2001), A Psycholinguistic Perspective on Finnish and Japanese Prosody: Perception, Production and Child Acquisition of Consonantal Quantity Distinctions, Springer Science & Business Media, ISBN 978-0-7923-7216-5
Arai, Takayuki; Warner, Natasha; Greenberg, Steven (2007), "Analysis of spontaneous Japanese in a multi-language telephone-speech corpus", Acoustical Science and Technology, 28 (1): 46–48, doi:10.1250/ast.28.46
Broselow, Ellen; Huffman, Marie; Hwang, Jiwon; Kao, Sophia; Lu, Yu-An (2012), "Emergent Rankings in Foreign Word Adaptations", in Arnett, Nathan; Bennett, Ryan (eds.), Proceedings of the 30th West Coast Conference on Formal Linguistics, pp. 98–108
Crawford, Clifford James (2009), Adaptation and Transmission in Japanese Loanword Phonology (PhD thesis)
Frellesvig, Bjarke (2010). A History of the Japanese Language. Cambridge University Press.
Gao, Jiayin; Arai, Takayuki (2019), "Plosive (de-)voicing and f0 perturbations in Tokyo Japanese: Positional variation, cue enhancement, and contrast recovery." (PDF), Journal of Phonetics, 77, doi:10.1016/j.wocn.2019.100932
Hall, Kathleen Currie (2013), "Documenting phonological change: A comparison of two Japanese phonemic splits" (PDF), in Luo, Shan (ed.), Proceedings of the 2013 Annual Conference of the Canadian Linguistic Association
Hashi, Michiko; Komada, Akina; Miura, Takao; Daimon, Shotaro; Takakura, Yuhki; Hayashi, Ryoko (2014), "Articulatory Variability in Word-Final Japanese Moraic-Nasals: An X-ray Microbeam Study", Journal of the Phonetic Society of Japan, 18 (2): 95–105, doi:10.24467/onseikenkyu.20.1_77
Hattori, Shiro (1950), "Phoneme, Phone, and Compound Phone", Gengo Kenkyu: Journal of the Linguistic Society of Japan, 1950 (16): 92–108, 163, doi:10.11435/gengo1939.1950.16_92
Irwin, Mark (2008). "Homomorphemic Diffusion in Japanese Nonce Lexemes". Japanese Language and Literature. 42 (1): 45–61. JSTOR 30198054.
Irwin, Mark (2011), Loanwords in Japanese, John Benjamins, ISBN 978-90-2720592-6
Ito, Junko; Kubozono, Haruo; Mester, Armin (2017), "A prosodic account of consonant gemination in Japanese loanwords", in Kubozono, Haruo (ed.), The Phonetics and Phonology of Geminate Consonants, Oxford University Press, pp. 283–320
Itō, Junko; Mester, R. Armin (1995), "Japanese phonology", in Goldsmith, John A (ed.), The Handbook of Phonological Theory, Blackwell Handbooks in Linguistics, Blackwell Publishers, pp. 817–838
Itō, Junko; Mester, R. Armin (2013). "Junko Ito, Armin Mester (UC Santa Cruz): Supersized Units". youtube.com. MIT Department of Linguistics and Philosophy.
Ito, Junko; Mester, Armin (2015a), "Sino-Japanese phonology", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 253–288
Ito, Junko; Mester, Armin (2015b), "Word formation and phonological processes", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 363–395
Ito, Junko; Mester, R. Armin (2018), "Tonal alignment and preaccentuation", Journal of Japanese Linguistics, 34 (2): 195–222, doi:10.1515/jjl-2018-0014
Katayama, Motoko (1998), Loanword phonology in Japanese and optimality theory (dissertation), Santa Cruz: University of California, Santa Cruz
Kawahara, Shigeto (2003), Phonological Society of Japan (ed.), "On a Certain Type of Hiatus Resolution in Japanese" (PDF), On'in Kenkyuu 音韻研究 [Phonological Studies], 6: 11–20
Kawahara, Shigeto (2006), "A faithfulness ranking projected from a perceptibility scale: The case of [+voice] in Japanese", Language, 82 (3): 536–574, doi:10.1353/lan.2006.0146, S2CID 145093954
Kawahara, Shigeto (2011), Japanese loanword devoicing revisited: A rating study
Kawahara, Shigeto (2015), "The phonetics of sokuon, or geminate obstruents", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 43–77
Kawahara, Shigeto; Shaw, Jason (2018), "Persistence of prosody", Hana-bana (花々): A Festschrift for Junko Ito and Armin Mester, UC Santa Cruz: Festschrifts
Kitaoka, Daiho (2017), "Repair Strategies for failed feature specification in Japanese: Evidence from loanwords, a reversing word game, and blending.", Proceedings of the Annual Meetings on Phonology, 4, doi:10.3765/amp.v4i0.3978
Kitagawa, Yoshihisa; Albin, Aaron Lee (2023), "On the interaction of fortition, lenition and Rendaku voicing in Japanese — Experimental and diachronic insights", IULC Working Papers, 23 (1), Indiana University Linguistics Club Working Papers
Kochetov, Alexei (2014), "Voicing and Tongue-Palate Contact Differences in Japanese Obstruents", Journal of the Phonetic Society ofJapan, 18 (2): 63–76, doi:10.24467/onseikenkyu.18.2_63
Kochetov, Alexei (2018), "Linguopalatal contact contrasts in the production of Japanese consonants: Electropalatographic data from five speakers", Acoust. Sci. & Tech.', 39 (2), The Acoustical Society of Japan: 84–91, doi:10.1250/ast.39.84
Kubozono, Haruo; Itô, Junko; Mester, Armin (2009), The Linguistic Society of Korea (ed.), "Consonant Gemination in Japanese Loanword Phonology", Current Issues in Unity and Diversity of Languages. Collection of Papers Selected from the 18th International Congress of Linguists, Republic of Korea: Dongam Publishing Co.
Kubozono, Haruo (2015a), "Introduction to Japanese phonetics and phonology", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 1–40, doi:10.1515/9781614511984.1, ISBN 978-1-61451-252-3
Kubozono, Haruo (2015b), "Diphthongs and vowel coalescence", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 215–249, doi:10.1515/9781614511984.1, ISBN 978-1-61451-252-3
Kubozono, Haruo (2015c), "Loanword phonology", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 313–361, doi:10.1515/9781614511984.1, ISBN 978-1-61451-252-3
Labrune, Laurence (2012), The Phonology of Japanese, Oxford, England: Oxford University Press, ISBN 978-0-19-954583-4
Lawrence, Wayne P. (2004), "High Vowels, Glides, and Japanese Phonology", Gengo Kenkyu 言語研究, 125: 1–30
Li, Teng; Honda, Kiyoshi; Wei, Jianguo; Dang, Jianwu (2015), A lip protrusion mechanism examined by magnetic resonance imaging and finite element modeling. (PDF)
Maddieson, Ian (2005), "Bilabial and Labio-dental Fricatives in Ewe", UC Berkeley PhonLab Annual Report, 1 (1): 199–215, doi:10.5070/P74r49g6qx
Maekawa, Kikuo (2010), "Coarticulatory reinterpretation of allophonic variation: Corpus-based analysis of /z/ in spontaneous Japanese", Journal of Phonetics, 38 (3): 360–374, doi:10.1016/j.wocn.2010.03.001
Maekawa, Kikuo (2018) [2010], "Weakening of Stop Articulation in Japanese Voiced Plosives", Journal of the Phonetic Society of Japan, 22 (1): 21–34, doi:10.24467/onseikenkyu.22.1_21
Maekawa, Kikuo (2020), "Remarks on Japanese /w/", ICU Working Papers in Linguistics, 10: 45–52, doi:10.34577/00004625
Maekawa, Kikuo (2023), "Production of the utterance-final moraic nasal in Japanese: A real-time MRI study", Journal of the International Phonetic Association, 53 (1): 189–212, doi:10.1017/S0025100321000050
Mester, R. Armin; Itô, Junko (1989), "Feature Predictability and Underspecification: Palatal Prosody in Japanese Mimetics", Language, 65 (2), Linguistic Society of America: 258–293
Mizoguchi, Ai (2019), Articulation of the Japanese Moraic Nasal: Place of Articulation, Assimilation, and L2 Transfer (PhD thesis), City University of New York
Morimoto, Maho (2020), Geminated Liquids in Japanese: A production study (PhD thesis), UC Santa Cruz
Murano, Emi Z.; Stone, Maureen; Honda, Kiyoshi (2005), "Muscular hydrostat mechanism for lip protrusion in speech", The Journal of the Acoustical Society of America, doi:10.1121/1.4785760
Nasu, Akio (2015), "The phonological lexicon and mimetic phonology", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 253–288
National Language Research Institute (1990), Nihongo no boin, shiin, onsetsu: Chōon undō no jikken onseigaku-teki kenkyū 日本語の母音，子音，音節: 調音運動の実験音声学的研究 [Japanese vowels, consonants, syllables: Experimental phonetics research of articulatory movements] (in Japanese), Tokyo: National Language Research Institute, doi:10.15084/00001212
Nogita, Akitsugu (2010), Examination of the [si] and [ʃi] Confusion by Japanese ESL Learners (Master of Arts thesis), University of Victoria
Nogita, Akitsugu (2016). "Arguments that Japanese [Cj]s are complex onsets: durations of Japanese [Cj]s and Russian [Cj]s and blocking of Japanese vowel devoicing". Working Papers of the Linguistics Circle of the University of Victoria. 26 (1): 73–99.
Nogita, Akitsugu; Yamane, Noriko (2015), "Japanese moraic dorsalized nasal stop" (PDF), Phonological Studies, 18: 75–84, archived from the original (PDF) on 2019-08-19, retrieved 2020-04-09
Nogita, Akitsugu; Yamane, Noriko (2019), "Redefining Roundness, Protrusion and Compression: In the Case of Tokyo Japanese /u/", Phonological Studies, 22, Phonological Society of Japan: 59–66
Ohta, Satoshi (1991), "Syllable and Mora Geometry in Japanese", Tsukuba English Studies, 10: 157–181
Okada, Hideo (1999), "Japanese", in International Phonetic Association (ed.), Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, Cambridge University Press, pp. 117–119, ISBN 978-0-52163751-0
Otake, Takashi (2015), "Mora and mora-timing", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 493–523
Pintér, Gábor (2015), "The emergence of new consonant contrasts", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 121–165, doi:10.1515/9781614511984.167, ISBN 978-1-61451-252-3
Poser, William (1986), "Japanese Evidence Bearing on the Compensatory Lengthening Controversy", in Wetzels, Leo; Sezer, Engin (eds.), Studies in Compensatory Lengthening, Foris Publications, pp. 167–186
Poser, William J. (1990), "Evidence for Foot Structure in Japanese", Language, 66 (1): 78–105
Recasens, Daniel (2013), "On the articulatory classification of (alveolo)palatal consonants" (PDF), Journal of the International Phonetic Association, 43 (1): 1–22, doi:10.1017/S0025100312000199, S2CID 145463946, archived from the original (PDF) on 2021-05-06, retrieved 2015-11-23
Riney, Timothy James; Takagi, Naoyuki; Ota, Kaori; Uchida, Yoko (2007), "The intermediate degree of VOT in Japanese initial voiceless stops", Journal of Phonetics, 35 (3): 439–443, doi:10.1016/j.wocn.2006.01.002
Saito, Yoshio (2005), Nihongo Onseigaku Nyūmon 日本語音声学入門 (in Japanese) (2nd ed.), Tokyo: Sanseido, ISBN 4-385-34588-0
Sano, Shin-ichiro (2013), "Patterns in Avoidance of Marked Segmental Configurations in Japanese Loanword Phonology" (PDF), Proceedings of GLOW in Asia IX: Main Session: 245–260
Schourup, Lawrence; Tamori, Ikuhiro (1992), "Japanese Palatalization in Relation to Theories of Restricted Underspecification", Gengo Kenkyu 言語研究, 101: 107–145
Seward, Jack (1992), Easy Japanese, McGraw-Hill Professional, ISBN 978-0-8442-8495-8
Shaw, Jason A.; Kawahara, Shigeto (2018), "The lingual articulation of devoiced /u/ in Tokyo Japanese" (PDF), Journal of Phonetics, 66: 100–119, doi:10.1016/j.wocn.2017.09.007
Shibatani, Masayoshi (1990), The Languages of Japan, Cambridge: Cambridge University Press, ISBN 978-0-521-36070-8
Shinohara, Shigeko (2004), "Emergence of universal grammar in foreign word adaptations", in Kager, René; Pater, Joe; Zonneveld, Wim (eds.), Constraints in Phonological Acquisition, Cambridge University Press, pp. 292–320
Smith, R. Edward (1980). Natural Phonology of Japanese (Thesis).
Starr, Rebecca Lurie; Shih, Stephanie S (2017), "The syllable as a prosodic unit in Japanese lexical strata: Evidence from text-setting", Glossa: A Journal of General Linguistics, 2 (1) 93: 1–34, doi:10.5334/gjgl.355
Takayama, Tomoaki (2015), "Historical phonology", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 621–650
Tamaoka, Katsuo; Makioka, Shogo (2004), "Frequency of occurrence for units of phonemes, morae, and syllables appearing in a lexical corpus of a Japanese newspaper", Behavior Research Methods, Instruments, & Computers, 36 (3): 531–547
Tateishi, Koichi (2017), Kaplan, Aaron; Kaplan, Abby; McCarvel, Miranda K.; Rubin, Edward J. (eds.), "More Arguments against Japanese as a Mora Language" (PDF), Proceedings of the 34th West Coast Conference on Formal Linguistics, Somerville, Massachusetts: Cascadilla Proceedings Project: 529–535, ISBN 978-1-57473-471-3
Tsuchida, Ayako (2001), "Japanese vowel devoicing", Journal of East Asian Linguistics, 10 (3): 225–245, doi:10.1023/A:1011221225072, S2CID 117861220
Vance, Timothy J. (1987), An Introduction to Japanese Phonology, Albany, NY: State University of New York Press, ISBN 978-0-88706-360-2
Vance, Timothy J. (2008), The Sounds of Japanese, Cambridge University Press, ISBN 978-0-5216-1754-3
Vance, Timothy J. (2015), "Rendaku", in Kubozono, Haruo (ed.), Handbook of Japanese Phonetics and Phonology, Berlin: De Gruyter, pp. 397–441
Vance, Timothy J. (2017), "The Japanese Syllable Debate: A Skeptical Look at Some Anti-Syllable Arguments", Proceedings of GLOW in Asia XI, MIT Working Papers in Linguistics 84, 1
Vance, Timothy J. (2022). Irregular Phonological Marking of Japanese Compounds. De Gruyter.
Watanabe, Seiji (2009). Cultural and Educational Contributions to Recent Phonological Changes in Japanese (PhD thesis). The University of Arizona.
Westbury, John R.; Hashi, Michiko (1997), "Lip-pellet positions during vowels and labial consonants", Journal of Phonetics, 25: 405–419
Yamane, Noriko; Gick, Bryan (2010), "Speaker-specific place of articulation: Idiosyncratic targets for Japanese coda nasal", Canadian Acoustics, 38 (3): 136–137
Youngberg, Connor (2021), "The role of the elements in diphthong formation and hiatus resolution: Evidence from Tokyo and Owari Japanese", in Bendjaballah, Sabrina; Tifrit, Ali; Voeltzel, Laurence (eds.), Perspectives on Element Theory, Studies in Generative Grammar, vol. 143, De Gruyter Mouton, pp. 207–249

/mi/ > [mʲi]	/umi/ > [ɯmʲi]	海, umi, 'sea'
/mj/ > [mʲ]	/mjaku/ > [mʲakɯ]^[32]	脈, myaku, 'pulse'
/ɡj/ > [ɡʲ]	/ɡjoːza/ > [ɡʲoːza]	ぎょうざ, gyōza, 'fried dumpling'
/ri/ > [ɾʲi]	/kiri/ > [kʲiɾʲi]	霧, kiri, 'fog'

/h/ > [ç]	/hito/ > [çito]	人, hito, 'person'
/hj/ > [ç]	/hjaku/ > [çakɯ]	百, hyaku, 'hundred'

/kutu/ > [kɯ̥tsɯ]	靴, kutsu, 'shoe'
/atu/ > [atsɯ̥]	圧, atsu, 'pressure'
/hikaN/ > [çi̥kaɴ]	悲観, hikan, 'pessimism'

/kisitu/ > [ki̥ɕitsɯ]	気質, kishitsu, 'temperament'
/kusikumo/ > [kɯɕi̥kɯmo]	奇しくも, kushikumo, 'strangely'

/kokoro/ > [ko̥koɾo]	心, kokoro, 'heart'
/haka/ > [hḁka]	墓, haka, 'grave'

[niɕɕimbaɕi]	日進橋, Nisshinbashi	vs.	[niɕi̥ɕimbaɕi] or [niɕiɕimbaɕi]	西新橋, Nishi-shinbashi
[kessai]	決済, 'check out'	vs.	[kesɯ̥sai] or [kesɯsai]	消す際, 'while erasing'

v t e Phonologies of the world's languages
Phonologies Orthographies Grammars Adjectives Determiners Nouns Prepositions Pronouns Verbs
A–E	Abkhaz Acehnese Adyghe Afrikaans American Sign Language Arabic Modern Standard Egyptian Hejazi Levantine Tunisian Avestan Belarusian Bengali Bulgarian Burmese Catalan Chinese Mandarin Cantonese Hokkien Northern Wu Old Historical Chukchi Cornish Czech Danish Dutch Standard Orsmaal-Gussenhoven dialect English Australian General American New Zealand Received Pronunciation Regional North American White South African Standard Canadian Old Middle Esperanto Estonian
F–L	Faroese Finnish French Parisian Quebec Galician German Standard Bernese Greek Standard Modern Ancient Koine Greenlandic Gujarati Hawaiian Hebrew (Modern) Hindustani Hungarian Icelandic Ingrian Inuit Irish Italian Japanese Kiowa Konkani Korean Kurdish Kyrgyz Latgalian Latin Latvian Limburgish Maastrichtian Lithuanian Luxembourgish
M–S	Macedonian Malay Maldivian Māori Marathi Massachusett Medumba Navajo Nepali Norwegian Occitan Ojibwe Old Saxon Oromo Ottawa Pashto Persian Polish Portuguese Proto-Indo-European Ripuarian Colognian Kerkrade dialect Romanian Russian Sardinian Scots Scottish Gaelic Serbo-Croatian Slovak Slovene Somali Sotho Spanish Dialects and varieties Swedish
T–Z	Tagalog Tamil Taos Turkish Ubykh Ukrainian Uyghur Vietnamese Welsh West Frisian Yiddish Zuni