Mandarin (Taiwan) MFA dictionary v3.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Taiwanese Mandarin

  • Phone set: MFA

  • Number of words: 15,287

  • Phones: a aj aj˥ aj˥˩ aj˧ aj˧˥ aj˨˩˦ aj˩ aw˥ aw˥˩ aw˧˥ aw˨˩˦ aw˩ a˥˩ a˧˥ a˨˩˦ e ej ej˥ ej˥˩ ej˧ ej˧˥ ej˨˩˦ ej˩ e˥˩ e˧˥ e˨˩˦ f i i˥˩ i˧˥ i˨˩˦ j k l m n n̩˥˩ n̩˧˥ n̩˨˩˦ o ow ow˥ ow˥˩ ow˧ ow˧˥ ow˨˩˦ ow˩ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ tɕʷ u u˥˩ u˧˥ u˨˩˦ w x y˥˩ y˧˥ y˨˩˦ z̩˥ z̩˥˩ z̩˧ z̩˧˥ z̩˨˩˦ z̩˩ ŋ ŋ̍˥˩ ŋ̍˧˥ ŋ̍˨˩˦ ɕ ɕʷ ə ə˥ ə˥˩ ə˧ ə˧˥ ə˨˩˦ ə˩ ɥ ɲ ɻ ʂ ʈʂ ʈʂʰ ʎ ʐ ʐ̩˥ ʐ̩˥˩ ʐ̩˧˥ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v3.0.0

  • Citation:

@techreport{mfa_mandarin_taiwan_mfa_dictionary_2024,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (Taiwan) MFA dictionary v3.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v3_0_0.html}},
	year={2024},
	month={Feb},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_taiwan_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/mandarin/mfa/Mandarin (Taiwan) MFA dictionary v3_0_0.dict).

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
857
Examples:
* 好事多磨:
[x aw˨˩˦ ʂ ʐ̩˥˩ t w m w o˧˥]
* 罗马尼亚:
[l w o˧˥ m a˨˩˦ ɲ i˧˥ j a˥˩]
* 苗栗縣:
[m j aw˧˥ ʎ i˥˩ ɕ e˥˩ n]
* 北美洲:
[p ej˨˩˦ m ej˨˩˦ ʈʂ ow˥]
Occurrences:
318
Examples:
* 密苏里:
[ i˥˩ s ʎ i˨˩˦]
* 黎明路:
[ʎ i˧˥ i˧˥ ŋ l u˥˩]
* 原住民:
[ɥ e˧˥ n ʈʂ u˥˩ i˧˥ n]
* 名列錶:
[ i˧˥ ŋ l j e˥˩ p j aw˨˩˦]
Occurrences:
7,648
Examples:
* 年雨量:
[n j e˧˥ ɲ y˨˩˦ l j a˥˩ ŋ]
* 輔導人:
[f u˨˩˦ t aw˨˩˦ ʐ ə˧˥ n]
* 離綫版:
[ʎ i˧˥ ɕ e˥˩ n p a˨˩˦ n]
* 五十分:
[u˨˩˦ ʂ ʐ̩˧˥ f ə˥˩ n]
Occurrences:
3
Examples:
Occurrences:
235
Examples:
* 現異性:
[ɕ e˥˩ ɲ i˥˩ ɕ i˥˩ ŋ]
* 參與感:
[tsʰ ɲ y˥˩ k a˨˩˦ n]
* 不盡人意:
[ u˥˩ i˨˩˦ n ʐ ə˧˥ ɲ i˥˩]
* 鬍言亂語:
[ u˧˥ j e˧˥ n l w a˥˩ ɲ y˨˩˦]
Occurrences:
5,994
Examples:
* 安平路:
[ʔ n i˧˥ ŋ l u˥˩]
* 颱中港:
[ aj˧˥ ʈʂ ŋ k a˨˩˦ ŋ]
* 登記簿:
[t ŋ i˥˩ u˥˩]
* 神岡區:
[ʂ ə˧˥ n k ŋ tɕʰ ]
Occurrences:
3
Examples:

Stop Plain

Occurrences:
1,079
Examples:
* 識彆化:
[ʂ ʐ̩˧˥ p j e˧˥ a˥˩]
* 玻利维亚:
[p w ʎ i˥˩ w ej˧˥ j a˥˩]
* 完整版:
[w a˧˥ n ʈʂ o˨˩˦ ŋ p a˨˩˦ n]
* 準備金:
[ʈʂ w ə˨˩˦ n p ej˥˩ n]
Occurrences:
464
Examples:
* 屏息以待:
[ i˧˥ ŋ ɕ i˨˩˦ t aj˥˩]
* 垂死病:
[ʈʂʰ w ej˧˥ s z̩˨˩˦ i˥˩ ŋ]
* 閉幕式:
[ i˥˩ m u˥˩ ʂ ʐ̩˥˩]
* 菲律賓:
[f ej˥ ʎ y˥˩ n]
Occurrences:
344
Examples:
* 源源不絕:
[ɥ e˧˥ n ɥ e˧˥ n u˥˩ tɕʷ e˧˥]
* 普洱茶:
[ u˨˩˦ ʔ o˨˩˦ ɻ ʈʂʰ a˧˥]
* 念念不忘:
[n j e˥˩ n n j e˥˩ n u˥˩ w a˥˩ ŋ]
* 巴不得乘:
[p u˥˩ i˥˩ ʈʂʰ o˧˥ ŋ]
Occurrences:
1,551
Examples:
* 华盛顿:
[ a˧˥ ʂ o˥˩ ŋ t w ə˥˩ n]
* 大材小用:
[t a˥˩ tsʰ aj˧˥ ɕ aw˨˩˦ j u˥˩ ŋ]
* 自作多情:
[ts z̩˥˩ ts w o˥˩ t w tɕʰ i˧˥ ŋ]
* 電影院:
[t j e˥˩ ɲ i˨˩˦ ŋ ɥ e˥˩ n]
Occurrences:
577
Examples:
* 地下室:
[ i˥˩ ɕ a˥˩ ʂ ʐ̩˥˩]
* 甜甜圈:
[ e˧˥ n e˧˥ n tɕʷ n]
* 鼎山街:
[ i˨˩˦ ŋ ʂ n ]
* 房地産:
[f a˧˥ ŋ i˥˩ ʈʂʰ a˨˩˦ n]
Occurrences:
637
Examples:
* 山东省:
[ʂ n ŋ ʂ o˨˩˦ ŋ]
* 大同區:
[t a˥˩ u˧˥ ŋ tɕʰ ]
* 印度洋:
[i˥˩ n u˥˩ j a˧˥ ŋ]
* 同盟路:
[ u˧˥ ŋ m o˧˥ ŋ l u˥˩]
Occurrences:
1,193
Examples:
* 親切感:
[tɕʰ n tɕʰ e˥˩ k a˨˩˦ n]
* 高屏溪:
[k aw˥ i˧˥ ŋ ɕ ]
* 狗尾續貂:
[k ow˨˩˦ w ej˨˩˦ ɕ y˥˩ t j aw˥]
* 岡山南路:
[k ŋ ʂ n n a˧˥ n l u˥˩]
Occurrences:
478
Examples:
* 恐固力:
[ u˨˩˦ ŋ u˥˩ ʎ i˥˩]
* 農民工:
[n u˧˥ ŋ i˧˥ n ŋ]
* 共和國:
[ u˥˩ ŋ x o˧˥ k w o˧˥]
* 辦公室:
[p a˥˩ n ŋ ʂ ʐ̩˨˩˦]
Occurrences:
416
Examples:
* 一九九二年:
[ ow˨˩˦ ow˨˩˦ ʔ o˥˩ ɻ n j e˧˥ n]
* 八安橋:
[p ʔ n tɕʰ aw˧˥]
* 奥林匹克:
[ʔ aw˥˩ ʎ i˧˥ n i˨˩˦ o˥˩]
* 安樂區:
[ʔ n l o˥˩ tɕʰ ]

Aspirated

Occurrences:
374
Examples:
* 詹姆斯龐德:
[ʈʂ n m u˨˩˦ s z̩˥ a˧˥ ŋ t o˧˥]
* 派齣所:
[ aj˥˩ ʈʂʰ s w o˨˩˦]
* 旁觀者:
[ a˧˥ ŋ k w n ʈʂ o˨˩˦]
* 香菜派:
[ɕ ŋ tsʰ aj˥˩ aj˥˩]
Occurrences:
845
Examples:
* 愛因斯坦:
[ʔ aj˥˩ n s z̩˥ a˨˩˦ n]
* 動態式:
[ u˥˩ ŋ aj˥˩ ʂ ʐ̩˥˩]
* 颱南人:
[ aj˧˥ n a˧˥ n ʐ ə˧˥ n]
* 堂車站:
[ a˧˥ ŋ ʈʂʰ ʈʂ a˥˩ n]
Occurrences:
624
Examples:
* 莫斯科:
[m w o˥˩ s z̩˥ ]
* 科帕县:
[ a˥˩ ɕ e˥˩ n]
* 社科院:
[ʂ o˥˩ ɥ e˥˩ n]
* 齣海口:
[ʈʂʰ x aj˨˩˦ ow˨˩˦]

Affricate Plain

Occurrences:
872
Examples:
* 资本主义:
[ts z̩˥ p ə˨˩˦ n ʈʂ u˨˩˦ i˥˩]
* 足球队:
[ts u˧˥ tɕʰ ow˧˥ t w ej˥˩]
* 不敢造次:
[ u˥˩ k a˨˩˦ n ts aw˥˩ tsʰ z̩˥˩]
* 議事組:
[i˥˩ ʂ ʐ̩˥˩ ts u˨˩˦]
Occurrences:
2,052
Examples:
* 當局者:
[t ŋ y˧˥ ʈʂ o˨˩˦]
* 中科院:
[ʈʂ ŋ ɥ e˥˩ n]
* 張學良:
[ʈʂ ŋ ɕʷ e˧˥ l j a˧˥ ŋ]
* 指日可待:
[ʈʂ ʐ̩˨˩˦ ʐ̩˥˩ o˨˩˦ t aj˥˩]
Occurrences:
2,411
Examples:
* 九一八:
[ ow˨˩˦ p ]
* 科學傢:
[ ɕʷ e˧˥ ]
* 社會局:
[ʂ o˥˩ ej˥˩ y˧˥]
* 北京市:
[p ej˨˩˦ ŋ ʂ ʐ̩˥˩]
Occurrences:
237
Examples:
* 甜甜圈:
[ j e˧˥ n j e˧˥ n tɕʷ n]
* 泉州市:
[tɕʷ e˧˥ n ʈʂ ow˥ ʂ ʐ̩˥˩]
* 生活圈:
[ʂ ŋ o˧˥ tɕʷ n]
* 不可或缺:
[ u˥˩ o˨˩˦ o˥˩ tɕʷ ]

Aspirated

Occurrences:
506
Examples:
* 北山村:
[p ej˨˩˦ ʂ n tsʰ w ə˥ n]
* 小林村:
[ɕ aw˨˩˦ ʎ i˧˥ n tsʰ w ə˥ n]
* 促進社:
[tsʰ u˥˩ i˥˩ n ʂ o˥˩]
* 參與者:
[tsʰ ɲ y˨˩˦ ʈʂ o˨˩˦]
Occurrences:
1,293
Examples:
* 迴收場:
[ ej˧˥ ʂ ow˥ ʈʂʰ a˧˥ ŋ]
* 籌備委:
[ʈʂʰ ow˧˥ p ej˥˩ w ej˨˩˦]
* 斷腸人:
[t w a˥˩ n ʈʂʰ a˧˥ ŋ ʐ ə˧˥ n]
* 警察局:
[ i˨˩˦ ŋ ʈʂʰ a˧˥ y˧˥]
Occurrences:
1,250
Examples:
* 高架橋:
[k aw˥ a˥˩ tɕʰ aw˧˥]
* 保护区:
[p aw˨˩˦ u˥˩ tɕʰ ]
* 機器人:
[ tɕʰ i˥˩ ʐ ə˧˥ n]
* 一日韆裏:
[i˧˥ ʐ̩˥˩ tɕʰ n ʎ i˨˩˦]

Sibilant

Occurrences:
618
Examples:
* 星期三:
[ɕ ŋ tɕʰ i˧˥ s n]
* 史語所:
[ʂ ʐ̩˨˩˦ y˨˩˦ s w o˨˩˦]
* 一哄而散:
[i˧˥ u˥˩ ŋ ʔ o˧˥ ɻ s a˥˩ n]
* 我行我素:
[w o˨˩˦ ɕ i˧˥ ŋ w o˨˩˦ s u˥˩]
Occurrences:
0
Examples:
Occurrences:
575
Examples:
* 資本傢:
[ts z̩˥ p ə˨˩˦ n ]
* 龍慈路:
[l u˧˥ ŋ tsʰ z̩˧˥ l u˥˩]
* 弗内斯:
[f u˧˥ n ej˥˩ s z̩˥]
* 魏公子:
[w ej˥˩ ŋ ts z̩˩]
Occurrences:
2,389
Examples:
* 九十日:
[ ow˨˩˦ ʂ ʐ̩˧˥ ʐ̩˥˩]
* 二十分:
[ʔ o˥˩ ɻ ʂ ʐ̩˧˥ f ə˥˩ n]
* 文化市:
[w ə˧˥ n a˥˩ ʂ ʐ̩˥˩]
* 名副其實:
[ i˧˥ ŋ f u˥˩ tɕʰ i˧˥ ʂ ʐ̩˧˥]
Occurrences:
626
Examples:
* 仁愛院:
[ʐ ə˧˥ n ʔ aj˥˩ ɥ e˥˩ n]
* 年輕人:
[n j e˧˥ n tɕʰ ŋ ʐ ə˧˥ n]
* 花花惹:
[ ʐ o˨˩˦]
* 瑞隆路:
[ʐ w ej˥˩ l u˧˥ ŋ l u˥˩]
Occurrences:
1,481
Examples:
* 大口吃肉:
[t a˥˩ ow˨˩˦ ʈʂʰ ʐ̩˥ ʐ ow˥˩]
* 同安市:
[ u˧˥ ŋ ʔ n ʂ ʐ̩˥˩]
* 隻不過:
[ʈʂ ʐ̩˨˩˦ k w o˥˩]
* 落井下石:
[l w o˥˩ i˨˩˦ ŋ ɕ a˥˩ ʂ ʐ̩˧˥]
Occurrences:
2,142
Examples:
* 一下子:
[i˧˥ ɕ a˥˩ ts z̩˩]
* 西西里岛:
[ɕ ɕ ʎ i˨˩˦ t aw˨˩˦]
* 匈牙利:
[ɕ ŋ j a˧˥ ʎ i˥˩]
* 忠孝橋:
[ʈʂ ŋ ɕ aw˥˩ tɕʰ aw˧˥]
Occurrences:
226
Examples:
* 學鋼琴:
[ɕʷ e˧˥ k ŋ tɕʰ i˧˥ n]
* 凱鏇路:
[ aj˨˩˦ ɕʷ e˧˥ n l u˥˩]
* 心理学家:
[ɕ n ʎ i˨˩˦ ɕʷ e˧˥ ]
* 下雪天:
[ɕ a˥˩ ɕʷ e˨˩˦ j n]

Fricative

Occurrences:
1,118
Examples:
* 菲律賓:
[f ej˥ ʎ y˥˩ n]
* 豐原區:
[f ŋ ɥ e˧˥ n tɕʰ ]
* 保護法:
[p aw˨˩˦ u˥˩ f a˨˩˦]
* 福壽街:
[f u˧˥ ʂ ow˥˩ ]

Approximant

Occurrences:
3,411
Examples:
* 文心南路:
[w ə˧˥ n ɕ n n a˧˥ n l u˥˩]
* 模範生:
[m w o˧˥ f a˥˩ n ʂ ŋ]
* 專業品:
[ʈʂ w n j e˥˩ i˨˩˦ n]
* 電磁波:
[t j e˥˩ n tsʰ z̩˧˥ p w ]
Occurrences:
131
Examples:
* 一哄而散:
[ u˥˩ ŋ ʔ o˧˥ ɻ s a˥˩ n]
* 公二八:
[ ŋ ʔ o˥˩ ɻ p ]
* 十全二路:
[ʂ ʐ̩˧˥ tɕʷ e˧˥ n ʔ o˥˩ ɻ l u˥˩]
* 二零一二年:
[ʔ o˥˩ ɻ ʎ i˧˥ ŋ ʔ o˥˩ ɻ n j e˧˥ n]
Occurrences:
2,550
Examples:
* 今年初:
[ n n j e˧˥ n ʈʂʰ ]
* 大不瞭:
[t a˥˩ u˥˩ l j aw˨˩˦]
* 連鎖店:
[l j e˧˥ n s w o˨˩˦ e˥˩ n]
* 顺天府:
[ʂ w ə˥˩ n j n f u˨˩˦]
Occurrences:
386
Examples:
* 设计院:
[ʂ o˥˩ i˥˩ ɥ e˥˩ n]
* 遠東站:
[ɥ e˨˩˦ n ŋ ʈʂ a˥˩ n]
* 王音樂:
[w a˧˥ ŋ n ɥ e˥˩]
* 研究院:
[j e˧˥ n ow˥ ɥ e˥˩ n]

Lateral

Occurrences:
1,414
Examples:
* 縣芬路:
[ɕ e˥˩ n f ə˥ n l u˥˩]
* 興東路:
[ɕ i˥˩ ŋ ŋ l u˥˩]
* 洛根县:
[l w o˥˩ k ə˥ n ɕ e˥˩ n]
* 復興南路:
[f u˥˩ ɕ i˥˩ ŋ n a˧˥ n l u˥˩]
Occurrences:
755
Examples:
* 既得利益:
[ i˥˩ t o˧˥ ʎ i˥˩ i˥˩]
* 叙利亚:
[ɕ y˥˩ ʎ i˥˩ j a˨˩˦]
* 罹患率:
[ʎ i˧˥ a˥˩ n ʎ y˥˩]
* 一零三年:
[ ʎ i˧˥ ŋ s n n j e˧˥ n]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
5,207
Examples:
* 好景不常:
[x aw˨˩˦ i˨˩˦ ŋ u˥˩ ʈʂʰ a˧˥ ŋ]
* 區桌椅:
[tɕʰ ʈʂ w i˨˩˦]
* 黎明路:
[ʎ i˧˥ i˧˥ ŋ l u˥˩]
* 關係人:
[k w n ɕ ʐ ə˧˥ n]
Occurrences:
1,160
Examples:
* 平谷区:
[ i˧˥ ŋ u˨˩˦ tɕʰ y˨˩˦]
* 五髒俱全:
[u˨˩˦ ts a˥˩ ŋ y˥˩ tɕʷ e˧˥ n]
* 语言学:
[y˨˩˦ j e˧˥ n ɕʷ e˧˥]
* 燕巢區:
[j e˥˩ n ʈʂʰ aw˧˥ tɕʰ ]
Occurrences:
4,262
Examples:
* 差不多:
[ʈʂʰ a˥˩ t w ]
* 輔導人:
[f u˨˩˦ t aw˨˩˦ ʐ ə˧˥ n]
* 紅樹林:
[ u˧˥ ŋ ʂ u˥˩ ʎ i˧˥ n]
* 福利部:
[f u˧˥ ʎ i˥˩ u˥˩]

Close-Mid

Occurrences:
3,517
Examples:
* 瓜拉雪兰:
[k w l ɕʷ e˨˩˦ l a˧˥ n]
* 星聚點:
[ɕ ŋ y˥˩ t j e˨˩˦ n]
* 遠離市:
[ɥ e˨˩˦ n ʎ i˧˥ ʂ ʐ̩˥˩]
* 自由權:
[ts z̩˥˩ j ow˧˥ tɕʷ e˧˥ n]
Occurrences:
1,576
Examples:
* 營業稅:
[i˧˥ ŋ j e˥˩ ʂ w ej˥˩]
* 鬼鬼祟祟:
[k w ej˨˩˦ k w ej˨˩˦ s w ej˥˩ s w ej˥˩]
* 麵對麵:
[m j e˥˩ n t w ej˥˩ m j e˥˩ n]
* 淡水人:
[t a˥˩ n ʂ w ej˨˩˦ ʐ ə˧˥ n]
Occurrences:
3,436
Examples:
* 可能性:
[ o˨˩˦ n o˧˥ ŋ ɕ i˥˩ ŋ]
* 康特拉科:
[ ŋ o˥˩ l ]
* 二鍋頭:
[ʔ o˥˩ ɻ k w ow˧˥]
* 所得稅:
[s w o˨˩˦ t o˧˥ ʂ w ej˥˩]
Occurrences:
1,396
Examples:
* 修道院:
[ɕ ow˥ t aw˥˩ ɥ e˥˩ n]
* 苟延殘喘:
[k ow˨˩˦ j e˧˥ n tsʰ a˧˥ n ʈʂʰ w a˨˩˦ n]
* 宇宙人:
[y˨˩˦ ʈʂ ow˥˩ ʐ ə˧˥ n]
* 有朝一日:
[j ow˨˩˦ ʈʂ aw˥ i˧˥ ʐ̩˥˩]
Occurrences:
1,459
Examples:
* 敦化南路:
[t w ə˥ n a˥˩ n a˧˥ n l u˥˩]
* 照本宣科:
[ʈʂ aw˥˩ p ə˨˩˦ n ɕʷ n ]
* 三十分:
[s n ʂ ʐ̩˧˥ f ə˥˩ n]
* 年輕人:
[n j e˧˥ n tɕʰ ŋ ʐ ə˧˥ n]

Open-Mid

Open

Occurrences:
6,376
Examples:
* 五髒俱全:
[u˨˩˦ ts a˥˩ ŋ y˥˩ tɕʷ e˧˥ n]
* 司法人:
[s z̩˥ f a˨˩˦ ʐ ə˧˥ n]
* 幸福感:
[ɕ i˥˩ ŋ f u˧˥ k a˨˩˦ n]
* 拉阿魯:
[l ʔ l u˨˩˦]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥

  • ˥˩

  • ˧

  • ˧˥

  • ˨˩˦

  • ˩