Mandarin (China) MFA dictionary v3.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Standard Mandarin Chinese

  • Phone set: MFA

  • Number of words: 75,580

  • Phones: a aj aj˥ aj˥˩ aj˧ aj˧˥ aj˨˩˦ aj˩ aw aw˥ aw˥˩ aw˧ aw˧˥ aw˨˩˦ aw˩ a˥˩ a˧˥ a˨˩˦ e ej ej˥ ej˥˩ ej˧ ej˧˥ ej˨˩˦ ej˩ e˥˩ e˧˥ e˨˩˦ f i i˥˩ i˧˥ i˨˩˦ j k l m m̩˥ m̩˧ m̩˨˩˦ n n̩˥˩ n̩˧˥ n̩˨˩˦ o ow ow˥ ow˥˩ ow˧ ow˧˥ ow˨˩˦ ow˩ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ tɕʷ u u˥˩ u˧˥ u˨˩˦ w x y y˥˩ y˧˥ y˨˩˦ z̩˥ z̩˥˩ z̩˧ z̩˧˥ z̩˨˩˦ z̩˩ ŋ ŋ̍ ŋ̍˥˩ ŋ̍˧˥ ŋ̍˨˩˦ ɕ ɕʷ ə ə˥ ə˥˩ ə˧ ə˧˥ ə˨˩˦ ə˩ ɥ ɲ ɻ ʂ ʈʂ ʈʂʰ ʎ ʐ ʐ̩ ʐ̩˥ ʐ̩˥˩ ʐ̩˧ ʐ̩˧˥ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v3.0.0

  • Citation:

@techreport{mfa_mandarin_china_mfa_dictionary_2024,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (China) MFA dictionary v3.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (China) MFA dictionary v3_0_0.html}},
	year={2024},
	month={Feb},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_china_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/mandarin/mfa/Mandarin (China) MFA dictionary v3_0_0.dict).

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
5,447
Examples:
* 给面子:
[k ej˨˩˦ m j e˥˩ n ts z̩˩]
* 马尔代夫:
[m a˨˩˦ ʔ o˨˩˦ ɻ t aj˥˩ f ]
* 墨尔本:
[m w o˥˩ ʔ o˨˩˦ ɻ p ə˨˩˦ n]
* 实作忙:
[ʂ ʐ̩˧˥ ts w o˥˩ m a˧˥ ŋ]
Occurrences:
1,578
Examples:
* 米高逃走:
[ i˨˩˦ k aw˥ j aw˧˥ ts ow˨˩˦]
* 投寄明:
[ ow˧˥ i˥˩ i˧˥ ŋ]
* 民族性:
[ i˧˥ n ts u˧˥ ɕ i˥˩ ŋ]
* 玉米麵:
[y˥˩ i˨˩˦ m j e˥˩ n]
Occurrences:
3
Examples:
Occurrences:
38,333
Examples:
* 教职员们:
[ aw˥˩ ʈʂ ʐ̩˧˥ ɥ e˧˥ n m ə˧ n]
* 四十四年:
[s z̩˥˩ ʂ ʐ̩˧˥ s z̩˥˩ n j e˧˥ n]
* 朱诺之一:
[ʈʂ n w o˥˩ ʈʂ ʐ̩˥ ]
* 鄱阳县:
[ w o˧˥ j a˧˥ ŋ ɕ e˥˩ n]
Occurrences:
4
Examples:
Occurrences:
1,581
Examples:
* 爱沙尼亚人:
[ʔ aj˥˩ ʂ ɲ i˧˥ j a˨˩˦ ʐ ə˧˥ n]
* 霓虹灯:
[ɲ i˧˥ u˧˥ ŋ t ŋ]
* 林依晨:
[ʎ i˧˥ ɲ ʈʂʰ ə˧˥ n]
* 尼西亚:
[ɲ i˧˥ ɕ j a˥˩]
Occurrences:
32,069
Examples:
* 太子町:
[ aj˥˩ ts z̩˨˩˦ ŋ]
* 堅定不移:
[ n i˥˩ ŋ u˥˩ i˧˥]
* 平山鬱夫:
[ i˧˥ ŋ ʂ ɲ y˥˩ f ]
* 东南亚:
[ ŋ n a˧˥ n j a˥˩]
Occurrences:
5
Examples:

Stop Plain

Occurrences:
5,779
Examples:
* 哈利波:
[x ʎ i˥˩ p w ]
* 奥德贝尔:
[ʔ aw˥˩ t o˧˥ p ej˥˩ ʔ o˨˩˦ ɻ]
* 圣吕班:
[ʂ o˥˩ ŋ ʎ y˨˩˦ p n]
* 苯甲酸:
[p ə˨˩˦ n a˨˩˦ s w n]
Occurrences:
2,343
Examples:
* 畢業自:
[ i˥˩ j e˥˩ ts z̩˥˩]
* 招聘会:
[ʈʂ aw˥ i˥˩ ŋ ej˥˩]
* 贫民窟:
[ i˧˥ n i˧˥ n ]
* 改兵部:
[k aj˨˩˦ ŋ u˥˩]
Occurrences:
1,968
Examples:
* 普宁市:
[ u˨˩˦ ɲ i˧˥ ŋ ʂ ʐ̩˥˩]
* 阿布鲁佐:
[ʔ u˥˩ l u˨˩˦ ts w o˨˩˦]
* 妙不可言:
[m j aw˥˩ u˥˩ o˨˩˦ j e˧˥ n]
* 候补县:
[x ow˥˩ u˨˩˦ ɕ e˥˩ n]
Occurrences:
7,632
Examples:
* 几点钟:
[ i˨˩˦ t j e˨˩˦ n ʈʂ ŋ]
* 别动队:
[p j e˧˥ u˥˩ ŋ t w ej˥˩]
* 篮球队:
[l a˧˥ n tɕʰ ow˧˥ t w ej˥˩]
* 优缺点:
[j ow˥ tɕʷ t j e˨˩˦ n]
Occurrences:
2,690
Examples:
* 救生艇:
[ ow˥˩ ʂ ŋ i˨˩˦ ŋ]
* 尽收眼底:
[ i˥˩ n ʂ ow˥ j e˨˩˦ n i˨˩˦]
* 蒂罗尔州:
[ i˥˩ l w ʔ o˨˩˦ ɻ ʈʂ ow˥]
* 丁岩礁:
[ ŋ j e˧˥ n aw˥]
Occurrences:
2,851
Examples:
* 读者群:
[ u˧˥ ʈʂ o˨˩˦ tɕʰ y˧˥ n]
* 超高度:
[ʈʂʰ aw˥ k aw˥ u˥˩]
* 北关东:
[p ej˨˩˦ k w n ŋ]
* 东半部:
[ ŋ p a˥˩ n u˥˩]
Occurrences:
5,766
Examples:
* 关牧村:
[k w n m u˥˩ tsʰ w ə˥ n]
* 嵩景官:
[s ŋ i˨˩˦ ŋ k w n]
* 文化馆:
[w ə˧˥ n a˥˩ k w a˨˩˦ n]
* 馆藏量:
[k w a˨˩˦ n tsʰ a˧˥ ŋ l j a˥˩ ŋ]
Occurrences:
2,225
Examples:
* 游泳裤:
[j ow˧˥ j u˨˩˦ ŋ u˥˩]
* 母公司:
[m u˨˩˦ ŋ s z̩˥]
* 古时候:
[ u˨˩˦ ʂ ʐ̩˧˥ x ow˥˩]
* 鹿谷乡:
[l u˥˩ u˨˩˦ ɕ ŋ]
Occurrences:
4,645
Examples:
* 成交额:
[ʈʂʰ o˧˥ ŋ aw˥ ʔ o˧˥]
* 王安明:
[w a˧˥ ŋ ʔ n i˧˥ ŋ]
* 灰原哀:
[ ej˥ ɥ e˧˥ n ʔ aj˥]
* 特里尔:
[ o˥˩ ʎ i˨˩˦ ʔ o˨˩˦ ɻ]

Aspirated

Occurrences:
2,184
Examples:
* 保守派:
[p aw˨˩˦ ʂ ow˨˩˦ aj˥˩]
* 培训班:
[ ej˧˥ ɕ y˥˩ n p n]
* 配音员:
[ ej˥˩ n ɥ e˧˥ n]
* 什叶派:
[ʂ ʐ̩˧˥ j e˥˩ aj˥˩]
Occurrences:
5,111
Examples:
* 马普托河:
[m a˨˩˦ u˨˩˦ w x o˧˥]
* 步兵团:
[ u˥˩ ŋ w a˧˥ n]
* 斯特雷门虾:
[s z̩˥ o˥˩ l ej˧˥ m ə˧˥ n ɕ ]
* 鳍天竺:
[tɕʰ i˧˥ j n ʈʂ u˧˥]
Occurrences:
4,720
Examples:
* 昆士兰:
[ w ə˥ n ʂ ʐ̩˥˩ l a˧˥ n]
* take:
[ a˧˥ ə˩]
* 迪卡兵:
[ i˧˥ a˨˩˦ ŋ]
* 多纳扎克:
[t w n a˥˩ ʈʂ o˥˩]

Affricate Plain

Occurrences:
4,249
Examples:
* 沟帮子镇:
[k ow˥ p ŋ ts z̩˩ ʈʂ ə˥˩ n]
* 布依族:
[ u˥˩ ts u˧˥]
* 王子轩:
[w a˧˥ ŋ ts z̩˨˩˦ ɕʷ n]
* 自由化:
[ts z̩˥˩ j ow˧˥ a˥˩]
Occurrences:
10,094
Examples:
* 强迫症:
[tɕʰ a˨˩˦ ŋ w o˥˩ ʈʂ o˥˩ ŋ]
* 三灶镇:
[s n s aw˥˩ ʈʂ ə˥˩ n]
* 计算长:
[ i˥˩ s w a˥˩ n ʈʂ a˨˩˦ ŋ]
* 至本机:
[ʈʂ ʐ̩˥˩ p ə˨˩˦ n ]
Occurrences:
12,266
Examples:
* 农机具:
[n u˧˥ ŋ y˥˩]
* 转向架:
[ʈʂ w a˨˩˦ n ɕ a˥˩ ŋ a˥˩]
* 蒋大为:
[ a˨˩˦ ŋ t a˥˩ w ej˥˩]
* 吉林市:
[ i˧˥ ʎ i˧˥ n ʂ ʐ̩˥˩]
Occurrences:
1,023
Examples:
* 半决赛:
[p a˥˩ n tɕʷ e˧˥ s aj˥˩]
* 文化圈:
[w ə˧˥ n a˥˩ tɕʷ n]
* 产权局:
[ʈʂʰ a˨˩˦ n tɕʷ e˧˥ n y˧˥]
* 温泉县:
[w ə˥ n tɕʷ e˧˥ n ɕ e˥˩ n]

Aspirated

Occurrences:
2,534
Examples:
* 翡翠台:
[f ej˨˩˦ tsʰ w ej˥˩ aj˧˥]
* 没错儿:
[m ej˧˥ tsʰ w o˥˩ ɻ]
* 红磡邨:
[ u˧˥ ŋ a˥˩ n tsʰ w ə˥ n]
* 从头再来:
[tsʰ u˧˥ ŋ ow˧˥ ts aj˥˩ l aj˧˥]
Occurrences:
6,285
Examples:
* 持有者:
[ʈʂʰ ʐ̩˧˥ j ow˨˩˦ ʈʂ o˨˩˦]
* 葛西城:
[k o˧˥ ɕ ʈʂʰ o˧˥ ŋ]
* 栾川县:
[l w a˧˥ n ʈʂʰ w n ɕ e˥˩ n]
* 张候车:
[ʈʂ ŋ x ow˥˩ ʈʂʰ ]
Occurrences:
5,933
Examples:
* 勇往直前:
[j u˨˩˦ ŋ w a˨˩˦ ŋ ʈʂ ʐ̩˧˥ tɕʰ e˧˥ n]
* 爱奇艺:
[ʔ aj˥˩ tɕʰ i˧˥ i˥˩]
* 任丘市:
[ʐ ə˧˥ n tɕʰ ow˥ ʂ ʐ̩˥˩]
* 河西区:
[x o˧˥ ɕ tɕʰ ]

Sibilant

Occurrences:
5,280
Examples:
* 分组赛:
[f ə˥ n ts u˨˩˦ s aj˥˩]
* 宋神宗:
[s u˥˩ ŋ ʂ ə˧˥ n ts ŋ]
* 红土赛季:
[ u˧˥ ŋ u˨˩˦ s aj˥˩ i˥˩]
* 戈斯索:
[k s z̩˥ s w o˨˩˦]
Occurrences:
0
Examples:
Occurrences:
3,987
Examples:
* sim:
[ʔ aj˧˥ s z̩˧ ʔ m]
* 茨城县:
[tsʰ z̩˧˥ ʈʂʰ o˧˥ ŋ ɕ e˥˩ n]
* 钉子户:
[ ŋ ts z̩˩ u˥˩]
* 百子湾:
[p aj˨˩˦ ts z̩˨˩˦ w n]
Occurrences:
12,091
Examples:
* 乐松山:
[l o˥˩ s ŋ ʂ n]
* 实实在在:
[ʂ ʐ̩˧˥ ʂ ʐ̩˧˥ ts aj˥˩ ts aj˥˩]
* 李科鼠:
[ʎ i˨˩˦ ʂ u˨˩˦]
* 城市化:
[ʈʂʰ o˧˥ ŋ ʂ ʐ̩˥˩ a˥˩]
Occurrences:
2,751
Examples:
* 张瑞妍:
[ʈʂ ŋ ʐ w ej˥˩ j e˧˥ n]
* 奠基人:
[t j e˥˩ n ʐ ə˧˥ n]
* 容中爾甲:
[ʐ u˧˥ ŋ ʈʂ ŋ ʔ o˨˩˦ ɻ a˨˩˦]
* 强人所难:
[tɕʰ a˨˩˦ ŋ ʐ ə˧˥ n s w o˨˩˦ n a˧˥ n]
Occurrences:
6,924
Examples:
* 知识分子:
[ʈʂ ʐ̩˥ ʂ ʐ̩˩ f ə˥˩ n ts z̩˨˩˦]
* 没事儿:
[m ej˧˥ ʂ ʐ̩ ə˥˩ ɻ]
* 支付币:
[ʈʂ ʐ̩˥ f u˥˩ i˥˩]
* 蓄电池:
[ɕ y˥˩ t j e˥˩ n ʈʂʰ ʐ̩˧˥]
Occurrences:
10,724
Examples:
* 知识性:
[ʈʂ ʐ̩˥ ʂ ʐ̩˩ ɕ i˥˩ ŋ]
* 训练营:
[ɕ y˥˩ n l j e˥˩ ɲ i˧˥ ŋ]
* 小型化:
[ɕ aw˨˩˦ ɕ i˧˥ ŋ a˥˩]
* 大大小小:
[t a˥˩ t a˥˩ ɕ aw˨˩˦ ɕ aw˨˩˦]
Occurrences:
1,097
Examples:
* 候选人:
[x ow˥˩ ɕʷ e˨˩˦ n ʐ ə˧˥ n]
* 西雪梨:
[ɕ ɕʷ e˨˩˦ ʎ i˧˥]
* 商学院:
[ʂ ŋ ɕʷ e˧˥ ɥ e˥˩ n]
* 高血压:
[k aw˥ ɕʷ e˨˩˦ j ]

Fricative

Occurrences:
6,125
Examples:
* 西南方:
[ɕ n a˧˥ n f ŋ]
* 解放碑:
[ e˨˩˦ f a˥˩ ŋ p ej˥]
* 肺水肿:
[f ej˥˩ ʂ w ej˨˩˦ ʈʂ u˨˩˦ ŋ]
* 克里斯托弗:
[ o˥˩ ʎ i˨˩˦ s z̩˥ w f u˧˥]

Approximant

Occurrences:
18,807
Examples:
* 法国人:
[f a˥˩ k w o˧˥ ʐ ə˧˥ n]
* 清水河:
[tɕʰ ŋ ʂ w ej˨˩˦ x o˧˥]
* 文学社:
[w ə˧˥ n ɕʷ e˧˥ ʂ o˥˩]
* 歌剧团:
[k y˥˩ w a˧˥ n]
Occurrences:
2,072
Examples:
* 后藤圭二:
[x ow˥˩ o˧˥ ŋ k w ej˥ ʔ o˥˩ ɻ]
* 塞尔维亚:
[s aj˥˩ ʔ o˨˩˦ ɻ w ej˧˥ j a˨˩˦]
* 阿尔泰牧居:
[ʔ ʔ o˨˩˦ ɻ aj˥˩ m u˥˩ ]
* 贝尔法斯特:
[p ej˥˩ ʔ o˨˩˦ ɻ f a˨˩˦ s z̩˥ o˥˩]
Occurrences:
13,129
Examples:
* 黄顶夜:
[ a˧˥ ŋ i˨˩˦ ŋ j e˥˩]
* sony:
[s n j]
* 抛物面:
[ aw˥ u˥˩ m j e˥˩ n]
* 塞尔维亚人:
[s aj˥˩ ʔ o˨˩˦ ɻ w ej˧˥ j a˨˩˦ ʐ ə˧˥ n]
Occurrences:
1,882
Examples:
* 发源于:
[f ɥ e˧˥ ɲ y˧˥]
* 许耀元:
[ɕ y˨˩˦ j aw˥˩ ɥ e˧˥ n]
* 管弦乐团:
[k w a˨˩˦ n ɕ e˧˥ n ɥ e˥˩ w a˧˥ n]
* 农科院:
[n u˧˥ ŋ ɥ e˥˩ n]

Lateral

Occurrences:
8,620
Examples:
* 重量级:
[ʈʂ u˥˩ ŋ l j a˥˩ ŋ i˧˥]
* 格列布大公:
[k o˧˥ l j e˥˩ u˥˩ t a˥˩ ŋ]
* 上永路:
[ʂ a˥˩ ŋ j u˨˩˦ ŋ l u˥˩]
* 离哈维拉:
[ʎ i˧˥ x w ej˧˥ l ]
Occurrences:
4,258
Examples:
* 黎巴嫩:
[ʎ i˧˥ p n ə˥˩ n]
* 谢里特:
[ɕ e˥˩ ʎ i˨˩˦ o˥˩]
* 的黎波里:
[ ʎ i˧˥ p w ʎ i˨˩˦]
* 立图书馆:
[ʎ i˥˩ u˧˥ ʂ k w a˨˩˦ n]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
27,194
Examples:
* 不稳定性:
[ u˥˩ w ə˨˩˦ n i˥˩ ŋ ɕ i˥˩ ŋ]
* 休息站:
[ɕ ow˥ ɕ ʈʂ a˥˩ n]
* 乌力吉:
[ ʎ i˥˩ i˧˥]
* 可以攻玉:
[ o˨˩˦ i˨˩˦ ŋ y˥˩]
Occurrences:
5,857
Examples:
* 国空军:
[k w o˧˥ ŋ n]
* 教育观:
[ aw˥˩ y˥˩ k w n]
* 密度区:
[ i˥˩ u˥˩ tɕʰ ]
* 朱振宇:
[ʈʂ ʈʂ ə˥˩ ɲ y˨˩˦]
Occurrences:
21,882
Examples:
* 一贫如洗:
[ i˧˥ n ʐ u˧˥ ɕ i˨˩˦]
* 葫芦岛市:
[ u˧˥ l t aw˨˩˦ ʂ ʐ̩˥˩]
* 瞧不起:
[tɕʰ aw˧˥ tɕʰ i˨˩˦]
* 乌兰察布:
[ l a˧˥ n ʈʂʰ a˧˥ u˥˩]

Close-Mid

Occurrences:
16,354
Examples:
* 特许权:
[ o˥˩ ɕ y˨˩˦ tɕʷ e˧˥ n]
* 愈演愈烈:
[y˥˩ j e˨˩˦ ɲ y˥˩ l j e˥˩]
* 心血管:
[ɕ n ɕʷ e˥˩ k w a˨˩˦ n]
* 点点滴滴:
[t j e˨˩˦ n t j e˨˩˦ n ]
Occurrences:
8,828
Examples:
* 物美价廉:
[u˥˩ m ej˨˩˦ a˥˩ l j e˧˥ n]
* 莫洛泽维奇:
[m w o˥˩ l w o˥˩ ts o˧˥ w ej˧˥ tɕʰ i˧˥]
* 林妹妹:
[ʎ i˧˥ n m ej˥˩ m ej˩]
* 指挥部:
[ʈʂ ʐ̩˨˩˦ ej˥ u˥˩]
Occurrences:
22,690
Examples:
* 独裁主义者:
[ u˧˥ tsʰ aj˧˥ ʈʂ u˨˩˦ i˥˩ ʈʂ o˨˩˦]
* too:
[ u˨˩˦ ]
* 结核病:
[ e˧˥ x o˧˥ i˥˩ ŋ]
* soso:
[s s ]
Occurrences:
7,185
Examples:
* 交朋友:
[ aw˥ o˧˥ ŋ j ow˧]
* 沙河口区:
[ʂ x o˧˥ ow˨˩˦ tɕʰ ]
* 游行者:
[j ow˧˥ ɕ i˧˥ ŋ ʈʂ o˨˩˦]
* 利沃夫州:
[ʎ i˥˩ w o˥˩ f ʈʂ ow˥]
Occurrences:
7,647
Examples:
* 金正恩:
[ n ʈʂ o˥˩ ŋ ʔ ə˥ n]
* 干什么:
[k a˥˩ n ʂ ə˧˥ n m ə˧]
* 董存瑞:
[ u˨˩˦ ŋ tsʰ w ə˧˥ n ʐ w ej˥˩]
* 孩子們:
[x aj˧˥ ts z̩˧ m ə˧ n]

Open-Mid

Open

Occurrences:
36,372
Examples:
* 地方官:
[ i˥˩ f ŋ k w n]
* 大彆山:
[t a˥˩ p j e˧˥ ʂ n]
* 石矿场:
[ʂ ʐ̩˧˥ w a˥˩ ŋ ʈʂʰ a˨˩˦ ŋ]
* 亚毛无心:
[j a˨˩˦ m aw˧˥ u˧˥ ɕ n]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥

  • ˥˩

  • ˧

  • ˧˥

  • ˨˩˦

  • ˩