Mandarin (Taiwan) MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Taiwanese Mandarin

  • Phone set: MFA

  • Number of words: 78,603

  • Phones: ai˥˥ ai˥˩ ai˦ ai˧˥ ai˨ ai˨˩˦ ai˩ au˥˥ au˥˩ au˦ au˧˥ au˨ au˨˩˦ au˩ a˥˥ a˥˩ a˧˥ a˨˩˦ ei˥˥ ei˥˩ ei˦ ei˧˥ ei˨ ei˨˩˦ ei˩ e˥˥ e˥˩ e˧˥ e˨˩˦ f i˥˥ i˥˩ i˧˥ i˨˩˦ j k l m n ou˥˥ ou˥˩ ou˦ ou˧˥ ou˨ ou˨˩˦ ou˩ o˥˥ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ u˥˥ u˥˩ u˧˥ u˨˩˦ w x y˥˥ y˥˩ y˧˥ y˨˩˦ z̩˥˥ z̩˥˩ z̩˦ z̩˧˥ z̩˨ z̩˨˩˦ z̩˩ ŋ ŋ̍˧˥ ɕ ə˥˥ ə˥˩ ə˦ ə˧˥ ə˨ ə˨˩˦ ə˩ ɥ ɻ ʂ ʈʂ ʈʂʰ ʐ ʐ̩˥˥ ʐ̩˥˩ ʐ̩˦ ʐ̩˧˥ ʐ̩˨ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_mandarin_taiwan_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (Taiwan) MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_taiwan_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
7,105
Examples:
* 石決明:
[ʂ ʐ̩˧˥ ɥ e˧˥ m i˧˥ ŋ]
* 龍彌你:
[l u˧˥ ŋ m i˧˥ n i˨˩˦]
* 變體假名:
[p j e˥˩ n i˨˩˦ j a˨˩˦ m i˧˥ ŋ]
* 流星馬:
[l j ow˧˥ ɕ i˥˥ ŋ m a˨˩˦]
Occurrences:
39,160
Examples:
* 烏剋蘭語:
[u˥˥ o˥˩ l a˧˥ n y˨˩˦]
* 陰莖骨:
[i˥˥ n i˥˥ ŋ k u˨˩˦]
* 刺五加片:
[tsʰ z̩˥˩ u˨˩˦ j a˥˥ j e˥˩ n]
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]
Occurrences:
30,436
Examples:
* 石決明:
[ʂ ʐ̩˧˥ ɥ e˧˥ m i˧˥ ŋ]
* 陰莖骨:
[i˥˥ n i˥˥ ŋ k u˨˩˦]
* 豆腐渣工程:
[t ow˥˩ f ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ]
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
Occurrences:
2
Examples:

Stop

Occurrences:
9,354
Examples:
* 不會吧:
[p u˥˩ x w ej˥˩ p ]
* 杜伊斯堡:
[t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦]
* 熊本熊:
[ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ]
* 白雲岩:
[p aj˧˥ y˧˥ n j e˧˥ n]
Occurrences:
11,030
Examples:
* 豆腐渣工程:
[t ow˥˩ f ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ]
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
* 杜伊斯堡:
[t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦]
Occurrences:
8,160
Examples:
* 陰莖骨:
[i˥˥ n i˥˥ ŋ k u˨˩˦]
* 豆腐渣工程:
[t ow˥˩ f ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ]
* 錫林郭勒:
[ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩]
* 工聯會:
[k u˥˥ ŋ l j e˧˥ n x w ej˥˩]
Occurrences:
3,497
Examples:
* 富拉爾基:
[f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ i˥˥]
* 阿羅漢:
[ʔ a˥˥ l w o˧˥ x a˥˩ n]
* 卡爾加裏:
[ a˨˩˦ ʔ o˨˩˦ ɻ j a˥˥ l i˨˩˦]
* 巴彥洪戈爾:
[p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ]

Affricate

Occurrences:
5,010
Examples:
* 天龍座:
[ j e˥˥ n l u˧˥ ŋ ts w o˥˩]
* 連字號:
[l j e˧˥ n ts z̩˥˩ x aw˥˩]
* 操作者:
[tsʰ aw˥˥ ts w o˥˩ ʈʂ o˨˩˦]
* 女真子:
[n y˨˩˦ ʈʂ ə˥˥ n ts z̩˨˩˦]
Occurrences:
9,431
Examples:
* 豆腐渣工程:
[t ow˥˩ f ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ]
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
* 癥候群:
[ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n]
* 互質數:
[x u˥˩ ʈʂ ʐ̩˥˩ ʂ u˥˩]
Occurrences:
13,254
Examples:
* 石決明:
[ʂ ʐ̩˧˥ ɥ e˧˥ m i˧˥ ŋ]
* 陰莖骨:
[i˥˥ n i˥˥ ŋ k u˨˩˦]
* 刺五加片:
[tsʰ z̩˥˩ u˨˩˦ j a˥˥ j e˥˩ n]
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]

Sibilant

Occurrences:
4,814
Examples:
* 杜伊斯堡:
[t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦]
* 四方形:
[s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ]
* 孫王營:
[s w ə˥˥ n w a˧˥ ŋ i˧˥ ŋ]
* 維納斯:
[w ej˧˥ n a˥˩ s z̩˥˥]
Occurrences:
4,182
Examples:
* 龜茲文:
[tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n]
* 杜伊斯堡:
[t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦]
* 紮猛子:
[ʈʂ a˥˥ m o˨˩˦ ŋ ts z̩˦]
* 名詞短語:
[m i˧˥ ŋ tsʰ z̩˧˥ t w a˨˩˦ n y˨˩˦]
Occurrences:
12,045
Examples:
* 石決明:
[ʂ ʐ̩˧˥ ɥ e˧˥ m i˧˥ ŋ]
* 半金屬:
[p a˥˩ n i˥˥ n ʂ u˨˩˦]
* 互質數:
[x u˥˩ ʈʂ ʐ̩˥˩ ʂ u˥˩]
* 臨時工:
[l i˧˥ n ʂ ʐ̩˧˥ k u˥˥ ŋ]
Occurrences:
2,738
Examples:
* 冰島人:
[p i˥˥ ŋ t aw˨˩˦ ʐ ə˧˥ n]
* 視乳頭水腫:
[ʂ ʐ̩˥˩ ʐ u˨˩˦ ow˧˥ ʂ w ej˨˩˦ ʈʂ u˨˩˦ ŋ]
* 撒瑪黎雅人:
[s a˥˥ m a˨˩˦ l i˧˥ j a˨˩˦ ʐ ə˧˥ n]
* 小綠人:
[ɕ j aw˨˩˦ l y˥˩ ʐ ə˧˥ n]
Occurrences:
7,056
Examples:
* 結婚戒指:
[ j e˧˥ x w ə˥˥ n j e˥˩ ʈʂ ʐ̩˩]
* 計算尺:
[ i˥˩ s w a˥˩ n ʈʂʰ ʐ̩˨˩˦]
* 紙飛機:
[ʈʂ ʐ̩˨˩˦ f ej˥˥ i˥˥]
* 絲織品:
[s z̩˥˥ ʈʂ ʐ̩˥˥ i˨˩˦ n]
Occurrences:
11,432
Examples:
* 下午好:
[ɕ j a˥˩ u˨˩˦ x aw˨˩˦]
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
* 錫林郭勒:
[ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩]
* 熊本熊:
[ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ]

Fricative

Occurrences:
5,888
Examples:
* 豆腐渣工程:
[t ow˥˩ f ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ]
* 富拉爾基:
[f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ i˥˥]
* 四方形:
[s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ]
* 德納裏峰:
[t o˧˥ n a˥˩ l i˨˩˦ f o˥˥ ŋ]

Approximant

Occurrences:
22,265
Examples:
* 龜茲文:
[tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n]
* 不會吧:
[p u˥˩ x w ej˥˩ p ]
* 錫林郭勒:
[ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩]
* 工聯會:
[k u˥˥ ŋ l j e˧˥ n x w ej˥˩]
Occurrences:
1,414
Examples:
* 富拉爾基:
[f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ i˥˥]
* 卡爾加裏:
[ a˨˩˦ ʔ o˨˩˦ ɻ j a˥˥ l i˨˩˦]
* 巴彥洪戈爾:
[p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ]
* 外甥女兒:
[w aj˥˩ ʂ ŋ n y˨˩˦ ɻ]
Occurrences:
27,399
Examples:
* 下午好:
[ɕ j a˥˩ u˨˩˦ x aw˨˩˦]
* 刺五加片:
[tsʰ z̩˥˩ u˨˩˦ j a˥˥ j e˥˩ n]
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
Occurrences:
4,285
Examples:
* 石決明:
[ʂ ʐ̩˧˥ ɥ e˧˥ m i˧˥ ŋ]
* 參議員:
[tsʰ a˥˥ n i˥˩ ɥ e˧˥ n]
* 半月線:
[p a˥˩ n ɥ e˥˩ ɕ j e˥˩ n]
* 上野原:
[ʂ a˥˩ ŋ j e˨˩˦ ɥ e˧˥ n]

Lateral

Occurrences:
11,655
Examples:
* 烏剋蘭語:
[u˥˥ o˥˩ l a˧˥ n y˨˩˦]
* 富拉爾基:
[f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ i˥˥]
* 錫林郭勒:
[ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩]
* 工聯會:
[k u˥˥ ŋ l j e˧˥ n x w ej˥˩]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
28,680
Examples:
* 四方形:
[s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ]
* 不客氣:
[p u˥˩ o˥˩ tɕʰ ]
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]
* 富拉爾基:
[f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ i˥˥]
Occurrences:
5,772
Examples:
* 愛玉子:
[ʔ aj˥˩ y˥˩ ts z̩˨˩˦]
* 林堡語:
[l i˧˥ n p aw˨˩˦ y˨˩˦]
* 癥候群:
[ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n]
* 老閨女:
[l aw˨˩˦ k w ej˥˥ n ]
Occurrences:
22,611
Examples:
* 熊本熊:
[ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ]
* 豆腐乳:
[t ow˥˩ f ʐ u˨˩˦]
* 瞧不起:
[tɕʰ j aw˧˥ p tɕʰ i˨˩˦]
* 下午好:
[ɕ j a˥˩ u˨˩˦ x aw˨˩˦]

Close-Mid

Occurrences:
16,156
Examples:
* 電正性:
[t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ]
* 參議員:
[tsʰ a˥˥ n i˥˩ ɥ e˧˥ n]
* 上野原:
[ʂ a˥˩ ŋ j e˨˩˦ ɥ e˧˥ n]
* 大少爺:
[t a˥˩ ʂ aw˥˩ j ]
Occurrences:
7,724
Examples:
* 毀滅性:
[x w ej˨˩˦ m j e˥˩ ɕ i˥˩ ŋ]
* 電吹風:
[t j e˥˩ n ʈʂʰ w ej˥˥ f o˥˥ ŋ]
* 氣纍脖兒:
[tɕʰ i˥˩ l ej˩ p w o˧˥ ɻ]
* 迴迴教:
[x w ej˧˥ x w ej˨ j aw˥˩]
Occurrences:
19,882
Examples:
* 巴彥洪戈爾:
[p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ]
* 外甥媳婦:
[w aj˥˩ ʂ ŋ ɕ i˧˥ f u˥˩]
* 夫妻老婆店:
[f u˥˥ tɕʰ i˥˥ l aw˨˩˦ w t j e˥˩ n]
* 老太婆:
[l aw˨˩˦ aj˥˩ w o˧˥]
Occurrences:
7,947
Examples:
* 癥候群:
[ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n]
* 小時候:
[ɕ j aw˨˩˦ ʂ ʐ̩˧˥ x ow˨]
* 龜茲文:
[tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n]
* 流星馬:
[l j ow˧˥ ɕ i˥˥ ŋ m a˨˩˦]
Occurrences:
7,328
Examples:
* 孫王營:
[s w ə˥˥ n w a˧˥ ŋ i˧˥ ŋ]
* 德纍斯頓:
[t o˧˥ l ej˨˩˦ s z̩˥˥ t w ə˥˩ n]
* 熊本熊:
[ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ]
* 龜茲文:
[tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n]

Open-Mid

Open

Occurrences:
36,127
Examples:
* 大鍵琴:
[t a˥˩ j e˥˩ n tɕʰ i˧˥ n]
* 南威島:
[n a˧˥ n w ej˥˥ t aw˨˩˦]
* 月亮湖:
[ɥ e˥˩ l j ŋ x u˧˥]
* 刺五加片:
[tsʰ z̩˥˩ u˨˩˦ j a˥˥ j e˥˩ n]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥˥

  • ˥˩

  • ˦

  • ˧˥

  • ˨

  • ˨˩˦

  • ˩