Mandarin MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 119,703

  • Phones: ai˥˥ ai˥˩ ai˦ ai˧˥ ai˨ ai˨˩˦ ai˩ au˥˥ au˥˩ au˦ au˧˥ au˨ au˨˩˦ au˩ a˥˥ a˥˩ a˧˥ a˨˩˦ ei˥˥ ei˥˩ ei˦ ei˧˥ ei˨ ei˨˩˦ ei˩ e˥˥ e˥˩ e˧˥ e˨˩˦ f i˥˥ i˥˩ i˧˥ i˨˩˦ j k l m n ou˥˥ ou˥˩ ou˦ ou˧˥ ou˨ ou˨˩˦ ou˩ o˥˥ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ u˥˥ u˥˩ u˧˥ u˨˩˦ w x y˥˥ y˥˩ y˧˥ y˨˩˦ z̩˥˥ z̩˥˩ z̩˦ z̩˧˥ z̩˨ z̩˨˩˦ z̩˩ ŋ ŋ̍˧˥ ɕ ə˥˥ ə˥˩ ə˦ ə˧˥ ə˨ ə˨˩˦ ə˩ ɥ ɻ ʂ ʈʂ ʈʂʰ ʐ ʐ̩˥˥ ʐ̩˥˩ ʐ̩˦ ʐ̩˧˥ ʐ̩˨ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_mandarin_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
10,901
Examples:
* 蒙彼利埃:
[m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥]
* 启蒙运动:
[tɕʰ i˨˩˦ m o˧˥ ŋ y˥˩ n t u˥˩ ŋ]
* 三地门:
[s a˥˥ n t i˥˩ m ə˧˥ n]
* 大五码:
[t a˥˩ u˨˩˦ m a˨˩˦]
Occurrences:
63,083
Examples:
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 阳泉市:
[j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩]
* 奥萨苏纳:
[ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩]
* 汉时关:
[x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n]
Occurrences:
50,129
Examples:
* 开裆裤:
[ ai˥˥ t a˥˥ ŋ u˥˩]
* 五二零:
[u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ]
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 阳泉市:
[j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩]
Occurrences:
2
Examples:

Stop

Occurrences:
15,389
Examples:
* 大爆炸:
[t a˥˩ p au˥˩ ʈʂ a˥˩]
* 蒙彼利埃:
[m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥]
* 预备役:
[y˥˩ p ei˥˩ i˥˩]
* 心照不宣:
[ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n]
Occurrences:
17,629
Examples:
* 大爆炸:
[t a˥˩ p au˥˩ ʈʂ a˥˩]
* 开裆裤:
[ ai˥˥ t a˥˥ ŋ u˥˩]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
* 交叉点:
[ j au˥˥ ʈʂʰ a˥˥ t j e˨˩˦ n]
Occurrences:
12,785
Examples:
* 汉时关:
[x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n]
* 爱国歌:
[ʔ ai˥˩ k w o˧˥ k o˥˥]
* 预告片:
[y˥˩ k au˥˩ j a˥˩ ɻ]
* 故事梗概:
[k u˥˩ ʂ ʐ̩˥˩ k o˨˩˦ ŋ k ai˥˩]
Occurrences:
5,901
Examples:
* 五二零:
[u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ]
* 奥萨苏纳:
[ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩]
* 蒙彼利埃:
[m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥]
* 爱国歌:
[ʔ ai˥˩ k w o˧˥ k o˥˥]

Affricate

Occurrences:
7,308
Examples:
* 新娘子:
[ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
* 资产阶级:
[ts z̩˥˥ ʈʂʰ a˨˩˦ n j e˥˥ i˧˥]
* 人字拖:
[ʐ ə˧˥ n ts z̩˥˩ w o˥˥]
Occurrences:
15,212
Examples:
* 大爆炸:
[t a˥˩ p au˥˩ ʈʂ a˥˩]
* 心照不宣:
[ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n]
* 牌洲湾:
[ ai˧˥ ʈʂ ou˥˥ w a˥˥ n]
* 赵屯儿:
[ʈʂ au˥˩ w ə˧˥ n ʔ o˧˥ ɻ]
Occurrences:
20,864
Examples:
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 医学界:
[i˥˥ ɕ ɥ e˧˥ j e˥˩]
* 交叉点:
[ j au˥˥ ʈʂʰ a˥˥ t j e˨˩˦ n]
* 资产阶级:
[ts z̩˥˥ ʈʂʰ a˨˩˦ n j e˥˥ i˧˥]

Sibilant

Occurrences:
7,916
Examples:
* 奥萨苏纳:
[ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩]
* 帕斯卡:
[ a˥˩ s z̩˥˥ a˨˩˦]
* 三二六:
[s a˥˥ n ʔ o˥˩ ɻ l j ou˥˩]
* 三地门:
[s a˥˥ n t i˥˩ m ə˧˥ n]
Occurrences:
6,140
Examples:
* 汉字文化圈:
[x a˥˩ n ts z̩˥˩ w ə˧˥ n x w a˥˩ tɕʰ ɥ e˥˥ n]
* 词汇范畴:
[tsʰ z̩˧˥ x w ei˥˩ f a˥˩ n ʈʂʰ ou˧˥]
* 资源回收筒:
[ts z̩˥˥ ɥ e˧˥ n x w ei˧˥ ʂ ou˥˥ u˨˩˦ ŋ]
* 天主子:
[ j e˥˥ n ʈʂ u˨˩˦ ts z̩˨˩˦]
Occurrences:
19,590
Examples:
* 阳泉市:
[j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩]
* 汉时关:
[x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n]
* 完全平方数:
[w a˧˥ n tɕʰ ɥ e˧˥ n i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
Occurrences:
4,322
Examples:
* 原来如此:
[ɥ e˧˥ n l ai˧˥ ʐ u˧˥ tsʰ z̩˨˩˦]
* 人字拖:
[ʐ ə˧˥ n ts z̩˥˩ w o˥˥]
* 入海口:
[ʐ u˥˩ x ai˨˩˦ ou˨˩˦]
* 人人乐:
[ʐ ə˧˥ n ʐ ə˧˥ n ɥ e˥˩]
Occurrences:
11,374
Examples:
* 汉时关:
[x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n]
* 钥匙链:
[j au˥˩ ʂ ʐ̩˩ l j e˥˩ n]
* 干湿计:
[k a˥˥ n ʂ ʐ̩˥˥ i˥˩]
* 故事梗概:
[k u˥˩ ʂ ʐ̩˥˩ k o˨˩˦ ŋ k ai˥˩]
Occurrences:
17,847
Examples:
* 新娘子:
[ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨]
* 山阳县:
[ʂ a˥˥ n j a˧˥ ŋ ɕ j e˥˩ n]
* 医学界:
[i˥˥ ɕ ɥ e˧˥ j e˥˩]
* 心照不宣:
[ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n]

Fricative

Occurrences:
9,086
Examples:
* 立法会:
[l i˥˩ f a˨˩˦ x w ei˥˩]
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 完全平方数:
[w a˧˥ n tɕʰ ɥ e˧˥ n i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩]
* 梵阿玲:
[f a˥˩ n ʔ a˥˥ l i˧˥ ŋ]

Approximant

Occurrences:
36,328
Examples:
* 立法会:
[l i˥˩ f a˨˩˦ x w ei˥˩]
* 汉时关:
[x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n]
* 完全平方数:
[w a˧˥ n tɕʰ ɥ e˧˥ n i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
Occurrences:
5,119
Examples:
* 五二零:
[u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ]
* 爱国歌:
[ʔ ai˥˩ k w o˧˥ k o˥˥ ɻ]
* 三二六:
[s a˥˥ n ʔ o˥˩ ɻ l j ou˥˩]
* 预告片:
[y˥˩ k au˥˩ j a˥˩ ɻ]
Occurrences:
44,754
Examples:
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 阳泉市:
[j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩]
* 新娘子:
[ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
Occurrences:
6,719
Examples:
* 阳泉市:
[j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩]
* 完全平方数:
[w a˧˥ n tɕʰ ɥ e˧˥ n i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩]
* 医学界:
[i˥˥ ɕ ɥ e˧˥ j e˥˩]
* 心照不宣:
[ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n]

Lateral

Occurrences:
19,041
Examples:
* 立法会:
[l i˥˩ f a˨˩˦ x w ei˥˩]
* 五二零:
[u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ]
* 蒙彼利埃:
[m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
45,407
Examples:
* 五二零:
[u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ]
* 蒙彼利埃:
[m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥]
* 土地爷:
[ u˨˩˦ t j e˧˥]
* 东西部:
[t u˥˥ ŋ ɕ p u˥˩]
Occurrences:
8,832
Examples:
* 预告片:
[y˥˩ k au˥˩ j a˥˩ ɻ]
* 嫁出去:
[ j a˥˩ ʈʂʰ u˥˥ tɕʰ ]
* 尼科巴群岛:
[n i˧˥ o˥˥ p a˥˥ tɕʰ y˧˥ n t au˨˩˦]
* 渔洋关:
[y˧˥ j a˧˥ ŋ k w a˥˥ n]
Occurrences:
34,483
Examples:
* 备不住:
[p ei˥˩ p ʈʂ u˥˩]
* 加不上:
[ j a˥˥ p ʂ a˥˩ ŋ]
* 开裆裤:
[ ai˥˥ t a˥˥ ŋ u˥˩]
* 撑不住:
[ʈʂʰ o˥˥ ŋ p ʈʂ u˥˩]

Close-Mid

Occurrences:
25,368
Examples:
* 新姑爷:
[ɕ i˥˥ n k u˥˥ j ]
* 物业费:
[u˥˩ j e˥˩ f ei˥˩]
* 太平洋保险:
[ ai˥˩ i˧˥ ŋ j a˧˥ ŋ p au˨˩˦ ɕ j e˨˩˦ n]
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
Occurrences:
32,711
Examples:
* 计算机科学:
[ i˥˩ s w a˥˩ n i˥˥ o˥˥ ɕ ɥ e˧˥]
* 量身定做:
[l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩]
* 放风筝:
[f a˥˩ ŋ f o˥˥ ŋ ʈʂ ŋ]
* 掏耳朵:
[ au˥˥ ʔ o˨˩˦ ɻ t w ]
Occurrences:
11,950
Examples:
* 人字拖:
[ʐ ə˧˥ n ts z̩˥˩ w o˥˥]
* 哪有什么:
[n a˨˩˦ j ou˨˩˦ ʂ o˧˥ m ə˨]
* 看个够:
[ a˥˩ n k ə˩ k ou˥˩]
* 河溪镇:
[x o˧˥ ɕ i˥˥ ʈʂ ə˥˩ n]

Open-Mid

Open

Occurrences:
57,645
Examples:
* 昨天早上:
[ts w o˧˥ j e˥˥ n ts au˨˩˦ ʂ ŋ]
* 奥萨苏纳:
[ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩]
* 卡方检验:
[ a˨˩˦ f a˥˥ ŋ j e˨˩˦ n j e˥˩ n]
* 戳嵴樑骨:
[ʈʂʰ w o˥˥ i˨˩˦ l j ŋ k u˨˩˦]

Diphthongs#

  • ai

  • au

  • ei

  • ou

Tones#

  • ˥˥

  • ˥˩

  • ˦

  • ˧˥

  • ˨

  • ˨˩˦

  • ˩