Mandarin (Erhua) MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Beijing Mandarin

  • Phone set: MFA

  • Number of words: 75,259

  • Phones: ai˥˥ ai˥˩ ai˦ ai˧˥ ai˨ ai˨˩˦ ai˩ au˥˥ au˥˩ au˦ au˧˥ au˨ au˨˩˦ au˩ a˥˥ a˥˩ a˧˥ a˨˩˦ ei˥˥ ei˥˩ ei˦ ei˧˥ ei˨ ei˨˩˦ ei˩ e˥˥ e˥˩ e˧˥ e˨˩˦ f i˥˥ i˥˩ i˧˥ i˨˩˦ j k l m n ou˥˥ ou˥˩ ou˦ ou˧˥ ou˨ ou˨˩˦ ou˩ o˥˥ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ u˥˥ u˥˩ u˧˥ u˨˩˦ w x y˥˥ y˥˩ y˧˥ y˨˩˦ z̩˥˥ z̩˥˩ z̩˦ z̩˧˥ z̩˨ z̩˨˩˦ z̩˩ ŋ ŋ̍˧˥ ɕ ə˥˥ ə˥˩ ə˦ ə˧˥ ə˨ ə˨˩˦ ə˩ ɥ ɻ ʂ ʈʂ ʈʂʰ ʐ ʐ̩˥˥ ʐ̩˥˩ ʐ̩˦ ʐ̩˧˥ ʐ̩˨ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_mandarin_erhua_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (Erhua) MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Erhua) MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_erhua_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
6,902
Examples:
* 所罗门群岛:
[s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦]
* 凯门鳄:
[ aj˨˩˦ m ə˧˥ n ʔ o˥˩]
* 莫干山:
[m w o˥˩ k a˥˥ n ʂ a˥˥ n]
* 闺中密友:
[k w ej˥˥ ʈʂ u˥˥ ŋ m i˥˩ j ow˨˩˦]
Occurrences:
37,000
Examples:
* 腮腺炎:
[s aj˥˥ ɕ j e˥˩ n j e˧˥ n]
* 纳斐塔里:
[n a˥˩ f ej˨˩˦ a˨˩˦ l i˨˩˦]
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
* 外兴安岭:
[w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ]
Occurrences:
28,701
Examples:
* 子宫颈:
[ts z̩˨˩˦ k u˥˥ ŋ i˨˩˦ ŋ]
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
* 宫古岛:
[k u˥˥ ŋ k u˨˩˦ t aw˨˩˦]
* 草头黄:
[tsʰ aw˨˩˦ ow˧˥ x w a˧˥ ŋ]
Occurrences:
2
Examples:

Stop

Occurrences:
8,933
Examples:
* 便利贴:
[p j e˥˩ n l i˥˩ j e˥˥]
* 货币主义:
[x w o˥˩ p i˥˩ ʈʂ u˨˩˦ i˥˩]
* 巴拿语:
[p a˥˥ n a˧˥ y˨˩˦]
* 古尔邦节:
[k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ j e˧˥]
Occurrences:
10,558
Examples:
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
* 宫古岛:
[k u˥˥ ŋ k u˨˩˦ t aw˨˩˦]
* 抗毒素:
[ a˥˩ ŋ t u˧˥ s u˥˩]
* 所罗门群岛:
[s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦]
Occurrences:
7,874
Examples:
* 子宫颈:
[ts z̩˨˩˦ k u˥˥ ŋ i˨˩˦ ŋ]
* 宫古岛:
[k u˥˥ ŋ k u˨˩˦ t aw˨˩˦]
* 古尔邦节:
[k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ j e˧˥]
* 感染群:
[k a˨˩˦ n ʐ a˨˩˦ n tɕʰ y˧˥ n]
Occurrences:
3,191
Examples:
* 玉尔其:
[y˥˩ ʔ o˨˩˦ ɻ tɕʰ i˧˥]
* 外兴安岭:
[w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ]
* 叶尔羌河:
[j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥]
* 古尔邦节:
[k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ j e˧˥]

Affricate

Occurrences:
4,735
Examples:
* 子宫颈:
[ts z̩˨˩˦ k u˥˥ ŋ i˨˩˦ ŋ]
* 董教总:
[t u˨˩˦ ŋ j aw˥˩ ts u˨˩˦ ŋ]
* 森林火灾:
[s ə˥˥ n l i˧˥ n x w o˨˩˦ ts aj˥˥]
* 报贩子:
[p aw˥˩ f a˥˩ n ts z̩˩]
Occurrences:
8,793
Examples:
* 海玉竹:
[x aj˨˩˦ y˥˩ ʈʂ u˧˥]
* 伏龙芝:
[f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥]
* 李富庄:
[l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ]
* 货币主义:
[x w o˥˩ p i˥˩ ʈʂ u˨˩˦ i˥˩]
Occurrences:
12,555
Examples:
* 子宫颈:
[ts z̩˨˩˦ k u˥˥ ŋ i˨˩˦ ŋ]
* 持械抢劫:
[ʈʂʰ ʐ̩˧˥ ɕ j e˥˩ tɕʰ j a˨˩˦ ŋ j e˧˥]
* 古尔邦节:
[k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ j e˧˥]
* 董教总:
[t u˨˩˦ ŋ j aw˥˩ ts u˨˩˦ ŋ]

Sibilant

Occurrences:
4,501
Examples:
* 腮腺炎:
[s aj˥˥ ɕ j e˥˩ n j e˧˥ n]
* 抗毒素:
[ a˥˩ ŋ t u˧˥ s u˥˩]
* 所罗门群岛:
[s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦]
* 星期四:
[ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩]
Occurrences:
3,912
Examples:
* 星期四:
[ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩]
* 马驹子:
[m a˨˩˦ y˥˥ ts z̩˨]
* 勃拉姆斯:
[p w o˧˥ l a˥˥ m u˨˩˦ s z̩˥˥]
* 电子贝斯:
[t j e˥˩ n ts z̩˨˩˦ p ej˥˩ s z̩˥˥]
Occurrences:
11,448
Examples:
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
* 传达室:
[ʈʂʰ w a˧˥ n t a˧˥ ʂ ʐ̩˨˩˦]
* 商业区:
[ʂ a˥˥ ŋ j e˥˩ tɕʰ y˥˥]
* 顾俊沙:
[k u˥˩ y˥˩ n ʂ a˥˥]
Occurrences:
2,592
Examples:
* 萨拉热窝:
[s a˥˩ l a˥˥ ʐ o˥˩ w o˥˥]
* 人行道:
[ʐ ə˧˥ n ɕ i˧˥ ŋ t aw˥˩]
* 感染群:
[k a˨˩˦ n ʐ a˨˩˦ n tɕʰ y˧˥ n]
* 碎肉器:
[s w ej˥˩ ʐ ow˥˩ tɕʰ i˥˩]
Occurrences:
6,404
Examples:
* 小市民:
[ɕ j aw˨˩˦ ʂ ʐ̩˥˩ m i˧˥ n]
* 百年国耻:
[p aj˨˩˦ n j e˧˥ n k w o˧˥ ʈʂʰ ʐ̩˨˩˦]
* 伏龙芝:
[f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥]
* 胆结石:
[t a˨˩˦ n j e˧˥ ʂ ʐ̩˧˥]
Occurrences:
10,545
Examples:
* 腮腺炎:
[s aj˥˥ ɕ j e˥˩ n j e˧˥ n]
* 外兴安岭:
[w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ]
* 持械抢劫:
[ʈʂʰ ʐ̩˧˥ ɕ j e˥˩ tɕʰ j a˨˩˦ ŋ j e˧˥]
* 星期四:
[ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩]

Fricative

Occurrences:
5,538
Examples:
* 纳斐塔里:
[n a˥˩ f ej˨˩˦ a˨˩˦ l i˨˩˦]
* 伏龙芝:
[f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥]
* 李富庄:
[l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ]
* 报贩子:
[p aw˥˩ f a˥˩ n ts z̩˩]

Approximant

Occurrences:
21,299
Examples:
* 草头黄:
[tsʰ aw˨˩˦ ow˧˥ x w a˧˥ ŋ]
* 外兴安岭:
[w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ]
* 李富庄:
[l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ]
* 冤枉路:
[ɥ e˥˥ n w ŋ l u˥˩]
Occurrences:
3,579
Examples:
* 玉尔其:
[y˥˩ ʔ o˨˩˦ ɻ tɕʰ i˧˥]
* 叶尔羌河:
[j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥]
* 古尔邦节:
[k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ j e˧˥]
* 小姑儿:
[ɕ j aw˨˩˦ k u˥˥ ɻ]
Occurrences:
25,985
Examples:
* 腮腺炎:
[s aj˥˥ ɕ j e˥˩ n j e˧˥ n]
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
* 便利贴:
[p j e˥˩ n l i˥˩ j e˥˥]
* 叶尔羌河:
[j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥]
Occurrences:
4,109
Examples:
* 冤枉路:
[ɥ e˥˥ n w ŋ l u˥˩]
* 古文字学:
[k u˨˩˦ w ə˧˥ n ts z̩˥˩ ɕ ɥ e˧˥]
* 古元古代:
[k u˨˩˦ ɥ e˧˥ n k u˨˩˦ t aj˥˩]
* 语言学家:
[y˨˩˦ j e˧˥ n ɕ ɥ e˧˥ j a˥˥]

Lateral

Occurrences:
11,057
Examples:
* 纳斐塔里:
[n a˥˩ f ej˨˩˦ a˨˩˦ l i˨˩˦]
* 伏龙芝:
[f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥]
* 外兴安岭:
[w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ]
* 李富庄:
[l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
26,672
Examples:
* 腓立比书:
[f ej˧˥ l i˥˩ p i˨˩˦ ʂ u˥˥]
* 李富庄:
[l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ]
* 没有关系:
[m ej˧˥ j ow˨˩˦ k w a˥˥ n ɕ ]
* 势利眼:
[ʂ ʐ̩˥˩ l j a˨˩˦ ɻ]
Occurrences:
5,463
Examples:
* 愉景湾:
[y˧˥ i˨˩˦ ŋ w a˥˥ n]
* 古典藏语:
[k u˨˩˦ t j e˨˩˦ n ts a˥˩ ŋ y˨˩˦]
* 文昌帝君:
[w ə˧˥ n ʈʂʰ a˥˥ ŋ t i˥˩ y˥˥ n]
* 姪女婿:
[ʈʂ ʐ̩˧˥ n y˨˩˦ ɕ ]
Occurrences:
21,271
Examples:
* 只不过:
[ʈʂ ʐ̩˨˩˦ p k w o˥˩]
* 胡椒瓶:
[x u˧˥ j aw˥˥ i˧˥ ŋ]
* 恨不能:
[x ə˥˩ n p n o˧˥ ŋ]
* 表姐夫:
[p j aw˨˩˦ j e˨˩˦ f ]

Close-Mid

Occurrences:
15,113
Examples:
* 老爷爷:
[l aw˨˩˦ j e˧˥ j ]
* 新约全书:
[ɕ i˥˥ n ɥ e˥˥ tɕʰ ɥ e˧˥ n ʂ u˥˥]
* 新姑爷:
[ɕ i˥˥ n k u˥˥ j ]
* 野战医院:
[j e˨˩˦ ʈʂ a˥˩ n i˥˥ ɥ e˥˩ n]
Occurrences:
7,254
Examples:
* 回回教:
[x w ej˧˥ x w ej˨ j aw˥˩]
* 纳斐塔里:
[n a˥˩ f ej˨˩˦ a˨˩˦ l i˨˩˦]
* 碎肉器:
[s w ej˥˩ ʐ ow˥˩ tɕʰ i˥˩]
* 未央宫:
[w ej˥˩ j a˥˥ ŋ k u˥˥ ŋ]
Occurrences:
18,898
Examples:
* 没见过世面:
[m ej˧˥ j e˥˩ n k w ʂ ʐ̩˥˩ m j e˥˩ n]
* 救生队:
[ j ow˥˩ ʂ o˥˥ ŋ t w ej˥˩]
* 夫妻老婆店:
[f u˥˥ tɕʰ i˥˥ l aw˨˩˦ w t j e˥˩ n]
* 小老婆:
[ɕ j aw˨˩˦ l aw˨˩˦ w ]
Occurrences:
7,767
Examples:
* 草头黄:
[tsʰ aw˨˩˦ ow˧˥ x w a˧˥ ŋ]
* 西贝柳斯:
[ɕ i˥˥ p ej˥˩ l j ow˨˩˦ s z̩˥˥]
* 奶油包:
[n aj˨˩˦ j ow˧˥ p aw˥˥]
* 手电筒:
[ʂ ow˨˩˦ t j e˥˩ n u˨˩˦ ŋ]
Occurrences:
7,146
Examples:
* 意味着:
[i˥˩ w ej˥˩ ʈʂ ə˩]
* 混混儿:
[x w ə˥˩ n x w ə˩ ɻ]
* 放得开:
[f a˥˩ ŋ t ə˩ aj˥˥]
* 凯门鳄:
[ aj˨˩˦ m ə˧˥ n ʔ o˥˩]

Open-Mid

Open

Occurrences:
34,421
Examples:
* 哈利路亚:
[x a˥˥ l i˥˩ l u˥˩ j a˨˩˦]
* 大姑娘:
[t a˥˩ k u˥˥ n j ŋ]
* 叶尔羌河:
[j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥]
* 指甲钳:
[ʈʂ ʐ̩˨˩˦ j tɕʰ j e˧˥ n]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥˥

  • ˥˩

  • ˦

  • ˧˥

  • ˨

  • ˨˩˦

  • ˩