Mandarin (Taiwan) MFA dictionary v2.0.0a#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Taiwanese Mandarin

  • Phone set: MFA

  • Number of words: 78,602

  • Phones: ai˥˥ ai˥˩ ai˦ ai˧˥ ai˨ ai˨˩˦ ai˩ au˥˥ au˥˩ au˦ au˧˥ au˨ au˨˩˦ au˩ a˥˥ a˥˩ a˧˥ a˨˩˦ ei˥˥ ei˥˩ ei˦ ei˧˥ ei˨ ei˨˩˦ ei˩ e˥˥ e˥˩ e˧˥ e˨˩˦ f i˥˥ i˥˩ i˧˥ i˨˩˦ j k l m n ou˥˥ ou˥˩ ou˦ ou˧˥ ou˨ ou˨˩˦ ou˩ o˥˥ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ u˥˥ u˥˩ u˧˥ u˨˩˦ w x y˥˥ y˥˩ y˧˥ y˨˩˦ z̩˥˥ z̩˥˩ z̩˦ z̩˧˥ z̩˨ z̩˨˩˦ z̩˩ ŋ ŋ̍˧˥ ɕ ə˥˥ ə˥˩ ə˦ ə˧˥ ə˨ ə˨˩˦ ə˩ ɥ ɻ ʂ ʈʂ ʈʂʰ ʐ ʐ̩˥˥ ʐ̩˥˩ ʐ̩˦ ʐ̩˧˥ ʐ̩˨ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_mandarin_taiwan_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (Taiwan) MFA dictionary v2.0.0a},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v2_0_0a.html}},
	year={2022},
	month={May},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_taiwan_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
7,123
Examples:
* 末班車:
[m w o ˥ ˩ p a ˥ ˥ n ʈ ʂ ʰ o ˥ ˥]
* 孟高棉語族:
[m o ˥ ˩ ŋ k a w ˥ ˥ m j e ˧ ˥ n y ˨ ˩ ˦ t s u ˧ ˥]
* 默突捨拉:
[m w o ˥ ˩ t ʰ u ˥ ˥ ʂ o ˥ ˩ l a ˥ ˥]
* 摩羯座:
[m w o ˧ ˥ t ɕ j e ˧ ˥ t s w o ˥ ˩]
Occurrences:
39,355
Examples:
* 討價還價:
[t ʰ a w ˨ ˩ ˦ t ɕ j a ˥ ˩ x w a ˧ ˥ n t ɕ j a ˥ ˩]
* 密涅瓦:
[m i ˥ ˩ n j e ˥ ˩ w a ˨ ˩ ˦]
* 按乃近:
[ʔ a ˥ ˩ n n a j ˨ ˩ ˦ t ɕ i ˥ ˩ n]
* 排水管:
[p ʰ a j ˧ ˥ ʂ w e j ˨ ˩ ˦ k w a ˨ ˩ ˦ n]
Occurrences:
30,593
Examples:
* 金融工程:
[t ɕ i ˥ ˥ n ʐ u ˧ ˥ ŋ k u ˥ ˥ ŋ ʈ ʂ ʰ o ˧ ˥ ŋ]
* 中子數:
[ʈ ʂ u ˥ ˥ ŋ t s z ̩ ˨ ˩ ˦ ʂ u ˥ ˩]
* 女大不中留:
[n y ˨ ˩ ˦ t a ˥ ˩ p u ˥ ˩ ʈ ʂ u ˥ ˥ ŋ l j o w ˧ ˥]
* 老人星:
[l a w ˨ ˩ ˦ ʐ ə ˧ ˥ n ɕ i ˥ ˥ ŋ]
Occurrences:
2
Examples:

Stop

Occurrences:
9,394
Examples:
* 領導班子:
[l i ˨ ˩ ˦ ŋ t a w ˨ ˩ ˦ p a ˥ ˥ n t s z ̩ ˨]
* 伊豆半島:
[i ˥ ˥ t o w ˥ ˩ p a ˥ ˩ n t a w ˨ ˩ ˦]
* 背靠背:
[p e j ˥ ˩ k ʰ a w ˥ ˩ p e j ˥ ˩]
* 被害人:
[p e j ˥ ˩ x a j ˥ ˩ ʐ ə ˧ ˥ n]
Occurrences:
11,105
Examples:
* 撒迦利亞書:
[s a ˥ ˥ t ɕ j a ˥ ˥ l i ˥ ˩ j a ˥ ˩ ʂ u ˥ ˥]
* 不得不:
[p u ˥ ˩ t o ˧ ˥ p u ˥ ˩]
* 瓦根基:
[w a ˨ ˩ ˦ k ə ˥ ˥ n t ɕ i ˥ ˥]
* 羅盤座:
[l w o ˧ ˥ p ʰ a ˧ ˥ n t s w o ˥ ˩]
Occurrences:
8,190
Examples:
* 格林多:
[k o ˧ ˥ l i ˧ ˥ n t w o ˥ ˥]
* 老閨女:
[l a w ˨ ˩ ˦ k w e j ˥ ˥ n y ˨]
* 尤剋裏裏:
[j o w ˧ ˥ k ʰ o ˥ ˩ l i ˨ ˩ ˦ l i ˨ ˩ ˦]
* 兵工廠:
[p i ˥ ˥ ŋ k u ˥ ˥ ŋ ʈ ʂ ʰ a ˨ ˩ ˦ ŋ]
Occurrences:
3,525
Examples:
* 耳鼻喉科:
[ʔ o ˨ ˩ ˦ ɻ p i ˧ ˥ x o w ˧ ˥ k ʰ o ˥ ˥]
* 澳洲堅果:
[ʔ a w ˥ ˩ ʈ ʂ o w ˥ ˥ t ɕ j e ˥ ˥ n k w o ˨ ˩ ˦]
* 田二河:
[t ʰ j e ˧ ˥ n ʔ o ˥ ˩ ɻ x o ˧ ˥]
* 維吾爾語:
[w e j ˧ ˥ u ˧ ˥ ʔ o ˨ ˩ ˦ ɻ y ˨ ˩ ˦]

Affricate

Occurrences:
5,019
Examples:
* 馬販子:
[m a ˨ ˩ ˦ f a ˥ ˩ n t s z ̩ ˩]
* 草根性:
[t s ʰ a w ˨ ˩ ˦ k ə ˥ ˥ n ɕ i ˥ ˩ ŋ]
* 藥材鋪:
[j a w ˥ ˩ t s ʰ a j ˧ ˥ p ʰ u ˥ ˩]
* 政策性:
[ʈ ʂ o ˥ ˩ ŋ t s ʰ o ˥ ˩ ɕ i ˥ ˩ ŋ]
Occurrences:
9,480
Examples:
* 巴拿馬城:
[p a ˥ ˥ n a ˧ ˥ m a ˨ ˩ ˦ ʈ ʂ ʰ o ˧ ˥ ŋ]
* 疑難雜癥:
[i ˧ ˥ n a ˧ ˥ n t s a ˧ ˥ ʈ ʂ o ˥ ˩ ŋ]
* 主持人:
[ʈ ʂ u ˨ ˩ ˦ ʈ ʂ ʰ ʐ ̩ ˧ ˥ ʐ ə ˧ ˥ n]
* 日常生活:
[ʐ ̩ ˥ ˩ ʈ ʂ ʰ a ˧ ˥ ŋ ʂ o ˥ ˥ ŋ x w o ˧ ˥]
Occurrences:
13,315
Examples:
* 五方旗幟:
[u ˨ ˩ ˦ f a ˥ ˥ ŋ t ɕ ʰ i ˧ ˥ ʈ ʂ ʐ ̩ ˥ ˩]
* 商業區:
[ʂ a ˥ ˥ ŋ j e ˥ ˩ t ɕ ʰ y ˥ ˥]
* 微積分:
[w e j ˥ ˥ t ɕ i ˥ ˥ f ə ˥ ˥ n]
* 滅火器:
[m j e ˥ ˩ x w o ˨ ˩ ˦ t ɕ ʰ i ˥ ˩]

Sibilant

Occurrences:
4,845
Examples:
* 四季豆:
[s z ̩ ˥ ˩ t ɕ i ˥ ˩ t o w ˥ ˩]
* 勒剋斯:
[l o ˥ ˩ k ʰ o ˥ ˩ s z ̩ ˥ ˥]
* 三七仔:
[s a ˥ ˥ n t ɕ ʰ i ˥ ˥ t s z ̩ ˨ ˩ ˦]
* 黃大仙祠:
[x w a ˧ ˥ ŋ t a ˥ ˩ ɕ j e ˥ ˥ n t s ʰ z ̩ ˧ ˥]
Occurrences:
0
Examples:
* 俄羅斯:
[ʔ o ˧ ˥ l w o ˧ ˥ s z ̩ ˥ ˥]
* 死巷子:
[s z ̩ ˨ ˩ ˦ ɕ j a ˥ ˩ ŋ t s z ̩ ˩]
* 自限性:
[t s z ̩ ˥ ˩ ɕ j e ˥ ˩ n ɕ i ˥ ˩ ŋ]
* 龜茲語:
[t ɕ ʰ j o w ˥ ˥ t s ʰ z ̩ ˧ ˥ y ˨ ˩ ˦]
Occurrences:
4,197
Examples:
* 電子束:
[t j e ˥ ˩ n t s z ̩ ˨ ˩ ˦ ʂ u ˥ ˩]
* 梨園子弟:
[l i ˧ ˥ ɥ e ˧ ˥ n t s z ̩ ˨ ˩ ˦ t i ˥ ˩]
* 斯蒂芬:
[s z ̩ ˥ ˥ t i ˥ ˩ f ə ˥ ˥ n]
* 腳腕子:
[t ɕ j a w ˨ ˩ ˦ w a ˥ ˩ n t s z ̩ ˩]
Occurrences:
12,099
Examples:
* 非洲聯盟:
[f e j ˥ ˥ ʈ ʂ o w ˥ ˥ l j e ˧ ˥ n m o ˧ ˥ ŋ]
* 帳前吏:
[ʈ ʂ a ˥ ˩ ŋ t ɕ ʰ j e ˧ ˥ n l i ˥ ˩]
* 六十三:
[l j o w ˥ ˩ ʂ ʐ ̩ ˧ ˥ s a ˥ ˥ n]
* 萬人塚:
[w a ˥ ˩ n ʐ ə ˧ ˥ n ʈ ʂ u ˨ ˩ ˦ ŋ]
Occurrences:
2,754
Examples:
* 十一日:
[ʂ ʐ ̩ ˧ ˥ i ˥ ˥ ʐ ̩ ˥ ˩]
* 二十六日:
[ʔ o ˥ ˩ ɻ ʂ ʐ ̩ ˧ ˥ l j o w ˥ ˩ ʐ ̩ ˥ ˩]
* 二十四史:
[ʔ o ˥ ˩ ɻ ʂ ʐ ̩ ˧ ˥ s z ̩ ˥ ˩ ʂ ʐ ̩ ˨ ˩ ˦]
* 均值定理:
[t ɕ y ˥ ˥ n ʈ ʂ ʐ ̩ ˧ ˥ t i ˥ ˩ ŋ l i ˨ ˩ ˦]
Occurrences:
7,088
Examples:
* 原教旨主義:
[ɥ e ˧ ˥ n t ɕ j a w ˥ ˩ ʈ ʂ ʐ ̩ ˨ ˩ ˦ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩]
* 電氣石:
[t j e ˥ ˩ n t ɕ ʰ i ˥ ˩ ʂ ʐ ̩ ˧ ˥]
* 結婚戒指:
[t ɕ j e ˧ ˥ x w ə ˥ ˥ n t ɕ j e ˥ ˩ ʈ ʂ ʐ ̩ ˩]
* 食肉目:
[ʂ ʐ ̩ ˧ ˥ ʐ o w ˥ ˩ m u ˥ ˩]
Occurrences:
11,482
Examples:
* 舞陽君:
[u ˨ ˩ ˦ j a ˧ ˥ ŋ t ɕ y ˥ ˥ n]
* 傳國璽:
[ʈ ʂ ʰ w a ˧ ˥ n k w o ˧ ˥ ɕ i ˨ ˩ ˦]
* 科西嘉島:
[k ʰ o ˥ ˥ ɕ i ˥ ˥ t ɕ j a ˥ ˥ t a w ˨ ˩ ˦]
* 羚羊角:
[l i ˧ ˥ ŋ j a ˧ ˥ ŋ t ɕ j a w ˨ ˩ ˦]

Fricative

Occurrences:
5,914
Examples:
* 番茄汁:
[f a ˥ ˥ n t ɕ ʰ j e ˧ ˥ ʈ ʂ ʐ ̩ ˥ ˥]
* 花粉癥:
[x w a ˥ ˥ f ə ˨ ˩ ˦ n ʈ ʂ o ˥ ˩ ŋ]
* 自由放任:
[t s z ̩ ˥ ˩ j o w ˧ ˥ f a ˥ ˩ ŋ ʐ ə ˥ ˩ n]
* 東方紅:
[t u ˥ ˥ ŋ f a ˥ ˥ ŋ x u ˧ ˥ ŋ]

Approximant

Occurrences:
22,330
Examples:
* 錫剋教:
[ɕ i ˥ ˥ k ʰ o ˥ ˩ t ɕ j a w ˥ ˩]
* 從小到大:
[t s ʰ u ˧ ˥ ŋ ɕ j a w ˨ ˩ ˦ t a w ˥ ˩ t a ˥ ˩]
* 密爾瓦基:
[m i ˥ ˩ ʔ o ˨ ˩ ˦ ɻ w a ˨ ˩ ˦ t ɕ i ˥ ˥]
* 管弦樂隊:
[k w a ˨ ˩ ˦ n ɕ j e ˧ ˥ n ɥ e ˥ ˩ t w e j ˥ ˩]
Occurrences:
1,436
Examples:
* 一邊兒:
[i ˥ ˥ p j a ˥ ˥ ɻ]
* 土耳其文:
[t ʰ u ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ t ɕ ʰ i ˧ ˥ w ə ˧ ˥ n]
* 達斡爾族:
[t a ˧ ˥ w o ˥ ˩ ʔ o ˨ ˩ ˦ ɻ t s u ˧ ˥]
* 模特兒:
[m w o ˧ ˥ t ʰ o ˥ ˩ ɻ]
Occurrences:
27,517
Examples:
* 列王記:
[l j e ˥ ˩ w a ˧ ˥ ŋ t ɕ i ˥ ˩]
* 小溪河:
[ɕ j a w ˨ ˩ ˦ ɕ i ˥ ˥ x o ˧ ˥]
* 雅各伯:
[j a ˨ ˩ ˦ k o ˥ ˩ p w o ˧ ˥]
* 三級片:
[s a ˥ ˥ n t ɕ i ˧ ˥ p ʰ j e ˥ ˩ n]
Occurrences:
4,301
Examples:
* 月見草:
[ɥ e ˥ ˩ t ɕ j e ˥ ˩ n t s ʰ a w ˨ ˩ ˦]
* 瑞士捲:
[ʐ w e j ˥ ˩ ʂ ʐ ̩ ˥ ˩ t ɕ ɥ e ˨ ˩ ˦ n]
* 委麯求全:
[w e j ˨ ˩ ˦ t ɕ ʰ y ˥ ˥ t ɕ ʰ j o w ˧ ˥ t ɕ ʰ ɥ e ˧ ˥ n]
* 齣納員:
[ʈ ʂ ʰ u ˥ ˥ n a ˥ ˩ ɥ e ˧ ˥ n]

Lateral

Occurrences:
11,757
Examples:
* 石榴皮:
[ʂ ʐ ̩ ˧ ˥ l j o w ˧ ˥ p ʰ i ˧ ˥]
* 拉巴特:
[l a ˥ ˥ p a ˥ ˥ t ʰ o ˥ ˩]
* 大墩路:
[t a ˥ ˩ t w ə ˥ ˥ n l u ˥ ˩]
* 伊利亞特:
[i ˥ ˥ l i ˥ ˩ j a ˥ ˩ t ʰ o ˥ ˩]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
28,825
Examples:
* 精密度:
[t ɕ i ˥ ˥ ŋ m i ˥ ˩ t u ˥ ˩]
* 比利時:
[p i ˨ ˩ ˦ l i ˥ ˩ ʂ ʐ ̩ ˧ ˥]
* 土地爺:
[t ʰ u ˨ ˩ ˦ t i ˦ j e ˧ ˥]
* 釘書機:
[t i ˥ ˩ ŋ ʂ u ˥ ˥ t ɕ i ˥ ˥]
Occurrences:
5,781
Examples:
* 莙薘菜:
[t ɕ y ˥ ˩ n t a ˧ ˥ t s ʰ a j ˥ ˩]
* 喜歌劇:
[ɕ i ˨ ˩ ˦ k o ˥ ˥ t ɕ y ˥ ˩]
* 堯舜禹湯:
[j a w ˧ ˥ ʂ w ə ˥ ˩ n y ˨ ˩ ˦ t ʰ a ˥ ˥ ŋ]
* 瑞士軍官刀:
[ʐ w e j ˥ ˩ ʂ ʐ ̩ ˥ ˩ t ɕ y ˥ ˥ n k w a ˥ ˥ n t a w ˥ ˥]
Occurrences:
22,774
Examples:
* 奧卡姆剃刀:
[ʔ a w ˥ ˩ k ʰ a ˨ ˩ ˦ m u ˨ ˩ ˦ t ʰ i ˥ ˩ t a w ˥ ˥]
* 一股腦兒:
[i ˥ ˥ k u ˨ ˩ ˦ n a w ˨ ˩ ˦ ɻ]
* 保不定:
[p a w ˨ ˩ ˦ p u ˦ t i ˥ ˩ ŋ]
* 常陸大宮:
[ʈ ʂ ʰ a ˧ ˥ ŋ l u ˥ ˩ t a ˥ ˩ k u ˥ ˥ ŋ]

Close-Mid

Occurrences:
16,209
Examples:
* 導盲犬:
[t a w ˨ ˩ ˦ m a ˧ ˥ ŋ t ɕ ʰ ɥ e ˨ ˩ ˦ n]
* 煙油子:
[j e ˥ ˥ n j o w ˧ ˥ t s z ̩ ˨]
* 立足點:
[l i ˥ ˩ t s u ˧ ˥ t j e ˨ ˩ ˦ n]
* 大少爺:
[t a ˥ ˩ ʂ a w ˥ ˩ j e ˩]
Occurrences:
7,740
Examples:
* 氣纍脖兒:
[t ɕ ʰ i ˥ ˩ l e j ˩ p w o ˧ ˥ ɻ]
* 水力劈裂:
[ʂ w e j ˨ ˩ ˦ l i ˥ ˩ p ʰ i ˥ ˥ l j e ˥ ˩]
* 迴報率:
[x w e j ˧ ˥ p a w ˥ ˩ l y ˥ ˩]
* 對談者:
[t w e j ˥ ˩ t ʰ a ˧ ˥ n ʈ ʂ o ˨ ˩ ˦]
Occurrences:
19,963
Examples:
* 火車頭:
[x w o ˨ ˩ ˦ ʈ ʂ ʰ o ˥ ˥ t ʰ o w ˧ ˥]
* 糯稻根:
[n w o ˥ ˩ t a w ˥ ˩ k ə ˥ ˥ n]
* 走後門兒:
[t s o w ˨ ˩ ˦ x o w ˥ ˩ m ə ˧ ˥ ɻ]
* 小老婆:
[ɕ j a w ˨ ˩ ˦ l a w ˨ ˩ ˦ p ʰ w o ˦]
Occurrences:
7,993
Examples:
* 木頭人:
[m u ˥ ˩ t ʰ o w ˩ ʐ ə ˧ ˥ n]
* 陳傢樓:
[ʈ ʂ ʰ ə ˧ ˥ n t ɕ j a ˥ ˥ l o w ˧ ˥]
* 井字遊戲:
[t ɕ i ˨ ˩ ˦ ŋ t s z ̩ ˥ ˩ j o w ˧ ˥ ɕ i ˥ ˩]
* 修正液:
[ɕ j o w ˥ ˥ ʈ ʂ o ˥ ˩ ŋ j e ˥ ˩]
Occurrences:
7,352
Examples:
* 四輪驅動:
[s z ̩ ˥ ˩ l w ə ˧ ˥ n t ɕ ʰ y ˥ ˥ t u ˥ ˩ ŋ]
* 溫布頓:
[w ə ˥ ˥ n p u ˥ ˩ t w ə ˥ ˩ n]
* 人情味:
[ʐ ə ˧ ˥ n t ɕ ʰ i ˧ ˥ ŋ w e j ˥ ˩]
* 婚紗照:
[x w ə ˥ ˥ n ʂ a ˥ ˥ ʈ ʂ a w ˥ ˩]

Open-Mid

Open

Occurrences:
36,307
Examples:
* 工商界:
[k u ˥ ˥ ŋ ʂ a ˥ ˥ ŋ t ɕ j e ˥ ˩]
* 恍然大悟:
[x w a ˨ ˩ ˦ ŋ ʐ a ˧ ˥ n t a ˥ ˩ u ˥ ˩]
* 皇阿瑪:
[x w a ˧ ˥ ŋ ʔ a ˥ ˩ m a ˩]
* 香港仔:
[ɕ j a ˥ ˥ ŋ k a ˨ ˩ ˦ ŋ t s a j ˨ ˩ ˦]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥˥

  • ˥˩

  • ˦

  • ˧˥

  • ˨

  • ˨˩˦

  • ˩