Mandarin (China) MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Mandarin Chinese

  • Dialect: Standard Mandarin Chinese

  • Phone set: MFA

  • Number of words: 119,455

  • Phones: ai˥˥ ai˥˩ ai˦ ai˧˥ ai˨ ai˨˩˦ ai˩ au˥˥ au˥˩ au˦ au˧˥ au˨ au˨˩˦ au˩ a˥˥ a˥˩ a˧˥ a˨˩˦ ei˥˥ ei˥˩ ei˦ ei˧˥ ei˨ ei˨˩˦ ei˩ e˥˥ e˥˩ e˧˥ e˨˩˦ f i˥˥ i˥˩ i˧˥ i˨˩˦ j k l m n ou˥˥ ou˥˩ ou˦ ou˧˥ ou˨ ou˨˩˦ ou˩ o˥˥ o˥˩ o˧˥ o˨˩˦ p s t ts tsʰ tɕʰ u˥˥ u˥˩ u˧˥ u˨˩˦ w x y˥˥ y˥˩ y˧˥ y˨˩˦ z̩˥˥ z̩˥˩ z̩˦ z̩˧˥ z̩˨ z̩˨˩˦ z̩˩ ŋ ŋ̍˧˥ ɕ ə˥˥ ə˥˩ ə˦ ə˧˥ ə˨ ə˨˩˦ ə˩ ɥ ɻ ʂ ʈʂ ʈʂʰ ʐ ʐ̩˥˥ ʐ̩˥˩ ʐ̩˦ ʐ̩˧˥ ʐ̩˨ ʐ̩˨˩˦ ʐ̩˩ ʔ

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_mandarin_china_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Mandarin (China) MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (China) MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary mandarin_china_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Mandarin Chinese transcripts.

This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
10,590
Examples:
* 费尔曼:
[f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n]
* 小毛驴:
[ɕ j aw˨˩˦ m aw˧˥ l y˧˥]
* 民主党:
[m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ]
* 玛窦福音:
[m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n]
Occurrences:
62,422
Examples:
* 早第三纪:
[ts aw˨˩˦ t i˥˩ s a˥˥ n i˥˩]
* 音乐茶座:
[i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩]
* 新篠津:
[ɕ i˥˥ n ɕ j aw˨˩˦ i˥˥ n]
* 亚历山大:
[j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩]
Occurrences:
49,692
Examples:
* 装饰城:
[ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ]
* 孔乙己:
[ u˨˩˦ ŋ i˨˩˦ i˨˩˦]
* 王家桥:
[w a˧˥ ŋ j a˥˥ tɕʰ j aw˧˥]
* 二重唱:
[ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ]
Occurrences:
2
Examples:

Stop

Occurrences:
15,018
Examples:
* 比凤姐:
[p i˨˩˦ f o˥˩ ŋ j e˨˩˦]
* 一百一十七:
[i˥˥ p aj˨˩˦ i˥˥ ʂ ʐ̩˧˥ tɕʰ i˥˥]
* 苏必略湖:
[s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥]
* 两百块:
[l j a˨˩˦ ŋ p aj˨˩˦ w aj˥˩]
Occurrences:
17,251
Examples:
* 早第三纪:
[ts aw˨˩˦ t i˥˩ s a˥˥ n i˥˩]
* 亚历山大:
[j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩]
* 民主党:
[m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ]
* 玛窦福音:
[m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n]
Occurrences:
12,473
Examples:
* 老公公:
[l aw˨˩˦ k u˥˥ ŋ k ŋ]
* 光福镇:
[k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n]
* 科技股份:
[ o˥˥ i˥˩ k u˨˩˦ f ə˥˩ n]
* 国籍法:
[k w o˧˥ i˧˥ f a˨˩˦]
Occurrences:
5,838
Examples:
* 二重唱:
[ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ]
* 费尔曼:
[f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n]
* 岸和田:
[ʔ a˥˩ n x o˧˥ j e˧˥ n]
* 澳门人:
[ʔ aw˥˩ m ə˧˥ n ʐ ə˧˥ n]

Affricate

Occurrences:
7,195
Examples:
* 早第三纪:
[ts aw˨˩˦ t i˥˩ s a˥˥ n i˥˩]
* 音乐茶座:
[i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩]
* 川楝子:
[ʈʂʰ w a˥˥ n l j e˥˩ n ts z̩˨˩˦]
* 代罪羔羊:
[t aj˥˩ ts w ej˥˩ k aw˥˥ j a˧˥ ŋ]
Occurrences:
15,021
Examples:
* 装饰城:
[ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ]
* 民主党:
[m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ]
* 光福镇:
[k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n]
* 周华雄:
[ʈʂ ow˥˥ x w a˧˥ ɕ j u˧˥ ŋ]
Occurrences:
20,526
Examples:
* 早第三纪:
[ts aw˨˩˦ t i˥˩ s a˥˥ n i˥˩]
* 孔乙己:
[ u˨˩˦ ŋ i˨˩˦ i˨˩˦]
* 王家桥:
[w a˧˥ ŋ j a˥˥ tɕʰ j aw˧˥]
* 新篠津:
[ɕ i˥˥ n ɕ j aw˨˩˦ i˥˥ n]

Sibilant

Occurrences:
7,824
Examples:
* 早第三纪:
[ts aw˨˩˦ t i˥˩ s a˥˥ n i˥˩]
* 十四日:
[ʂ ʐ̩˧˥ s z̩˥˩ ʐ̩˥˩]
* 苏必略湖:
[s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥]
* 索菲亚:
[s w o˨˩˦ f ej˥˥ j a˨˩˦]
Occurrences:
6,100
Examples:
* 柯尔克孜人:
[ o˥˥ ʔ o˨˩˦ ɻ o˥˩ ts z̩˥˥ ʐ ə˧˥ n]
* 过日子:
[k w o˥˩ ʐ̩˥˩ ts z̩˩]
* 思想观念:
[s z̩˥˥ ɕ j a˨˩˦ ŋ k w a˥˥ n n j e˥˩ n]
* 形容词:
[ɕ i˧˥ ŋ ʐ u˧˥ ŋ tsʰ z̩˧˥]
Occurrences:
19,266
Examples:
* 装饰城:
[ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ]
* 十四日:
[ʂ ʐ̩˧˥ s z̩˥˩ ʐ̩˥˩]
* 亚历山大:
[j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩]
* 文科生:
[w ə˧˥ n o˥˥ ʂ o˥˥ ŋ]
Occurrences:
4,256
Examples:
* 双人间:
[ʂ w a˥˥ ŋ ʐ ə˧˥ n j e˥˥ n]
* 瑞士麦片:
[ʐ w ej˥˩ ʂ ʐ̩˥˩ m aj˥˩ j e˥˩ n]
* 开车人:
[ aj˥˥ ʈʂʰ o˥˥ ʐ ə˧˥ n]
* 九畹遗容:
[ j ow˨˩˦ w a˨˩˦ n i˧˥ ʐ u˧˥ ŋ]
Occurrences:
11,304
Examples:
* 装饰城:
[ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ]
* 破魔师:
[ w o˥˩ m w o˧˥ ʂ ʐ̩˥˥]
* 六百九十九:
[l j ow˥˩ p aj˨˩˦ j ow˨˩˦ ʂ ʐ̩˧˥ j ow˨˩˦]
* 脚趾头:
[ j aw˨˩˦ ʈʂ ʐ̩˨˩˦ ow˧˥]
Occurrences:
17,554
Examples:
* 新篠津:
[ɕ i˥˥ n ɕ j aw˨˩˦ i˥˥ n]
* 小熊维尼:
[ɕ j aw˨˩˦ ɕ j u˧˥ ŋ w ej˧˥ n i˧˥]
* 小毛驴:
[ɕ j aw˨˩˦ m aw˧˥ l y˧˥]
* 周华雄:
[ʈʂ ow˥˥ x w a˧˥ ɕ j u˧˥ ŋ]

Fricative

Occurrences:
8,927
Examples:
* 比凤姐:
[p i˨˩˦ f o˥˩ ŋ j e˨˩˦]
* 费尔曼:
[f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n]
* 光福镇:
[k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n]
* 科技股份:
[ o˥˥ i˥˩ k u˨˩˦ f ə˥˩ n]

Approximant

Occurrences:
35,507
Examples:
* 装饰城:
[ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ]
* 音乐茶座:
[i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩]
* 王家桥:
[w a˧˥ ŋ j a˥˥ tɕʰ j aw˧˥]
* 罗汉橙:
[l w o˧˥ x a˥˩ n ʈʂʰ o˧˥ ŋ]
Occurrences:
2,562
Examples:
* 二重唱:
[ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ]
* 费尔曼:
[f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n]
* 柯尔克孜人:
[ o˥˥ ʔ o˨˩˦ ɻ o˥˩ ts z̩˥˥ ʐ ə˧˥ n]
* 耳机线:
[ʔ o˨˩˦ ɻ i˥˥ ɕ j e˥˩ n]
Occurrences:
43,610
Examples:
* 王家桥:
[w a˧˥ ŋ j a˥˥ tɕʰ j aw˧˥]
* 新篠津:
[ɕ i˥˥ n ɕ j aw˨˩˦ i˥˥ n]
* 比凤姐:
[p i˨˩˦ f o˥˩ ŋ j e˨˩˦]
* 亚历山大:
[j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩]
Occurrences:
6,615
Examples:
* 音乐茶座:
[i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩]
* 苏必略湖:
[s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥]
* 原生代:
[ɥ e˧˥ n ʂ o˥˥ ŋ t aj˥˩]
* 神经学:
[ʂ ə˧˥ n i˥˥ ŋ ɕ ɥ e˧˥]

Lateral

Occurrences:
18,761
Examples:
* 老公公:
[l aw˨˩˦ k u˥˥ ŋ k ŋ]
* 亚历山大:
[j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩]
* 罗汉橙:
[l w o˧˥ x a˥˩ n ʈʂʰ o˧˥ ŋ]
* 小毛驴:
[ɕ j aw˨˩˦ m aw˧˥ l y˧˥]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
44,773
Examples:
* 配起来:
[ ej˥˩ tɕʰ l aj˩]
* 闹脾气:
[n aw˥˩ i˧˥ tɕʰ ]
* 有色玻璃:
[j ow˨˩˦ s o˥˩ p w o˥˥ l ]
* 孔乙己:
[ u˨˩˦ ŋ i˨˩˦ i˨˩˦]
Occurrences:
8,736
Examples:
* 情趣内衣:
[tɕʰ i˧˥ ŋ tɕʰ y˥˩ n ej˥˩ i˥˥]
* 长宁区:
[ʈʂʰ a˧˥ ŋ n i˧˥ ŋ tɕʰ y˥˥]
* 绿豆汤:
[l y˥˩ t ow˥˩ a˥˥ ŋ]
* 租出去:
[ts u˥˥ ʈʂʰ u˥˥ tɕʰ ]
Occurrences:
34,067
Examples:
* 不得好死:
[p t o˧˥ x aw˨˩˦ s z̩˨˩˦]
* 差不差:
[ʈʂʰ a˥˩ p ʈʂʰ a˥˥]
* 别有用意:
[p j e˧˥ j ow˨˩˦ j u˥˩ ŋ i˥˩]
* 玛窦福音:
[m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n]

Close-Mid

Occurrences:
25,091
Examples:
* 里边儿:
[l i˨˩˦ p j n ʔ o˧˥ ɻ]
* 今天下午:
[ i˥˥ n j e˥˥ n ɕ j a˥˩ u˨˩˦]
* 王爷府:
[w a˧˥ ŋ j f u˨˩˦]
* 打哈欠:
[t a˨˩˦ x a˥˥ tɕʰ j n]
Occurrences:
12,664
Examples:
* 张志辉:
[ʈʂ a˥˥ ŋ ʈʂ ʐ̩˥˩ x w ej˥˥]
* 魁北克市:
[ w ej˧˥ p ej˨˩˦ o˥˩ ʂ ʐ̩˥˩]
* 画眉鸟:
[x w a˥˩ m ej˧˥ n j aw˨˩˦]
* 北沙参:
[p ej˨˩˦ ʂ a˥˥ ʂ ə˥˥ n]
Occurrences:
32,056
Examples:
* 柯尔克孜人:
[ o˥˥ ʔ o˨˩˦ ɻ o˥˩ ts z̩˥˥ ʐ ə˧˥ n]
* 国籍法:
[k w o˧˥ i˧˥ f a˨˩˦]
* 外甥女婿:
[w aj˥˩ ʂ ŋ n y˨˩˦ ɕ ]
* 老婆子:
[l aw˨˩˦ w ts z̩˨˩˦]
Occurrences:
12,600
Examples:
* 落马洲:
[l w o˥˩ m a˨˩˦ ʈʂ ow˥˥]
* 九畹遗容:
[ j ow˨˩˦ w a˨˩˦ n i˧˥ ʐ u˧˥ ŋ]
* 宇宙论:
[y˨˩˦ ʈʂ ow˥˩ l w ə˥˩ n]
* 邮箱地址:
[j ow˧˥ ɕ j a˥˥ ŋ t i˥˩ ʈʂ ʐ̩˨˩˦]
Occurrences:
11,559
Examples:
* 哈德森湾:
[x a˥˥ t o˧˥ s ə˥˥ n w a˥˥ n]
* 鲍德温:
[p aw˥˩ t o˧˥ w ə˥˥ n]
* 北沙参:
[p ej˨˩˦ ʂ a˥˥ ʂ ə˥˥ n]
* 赶得上:
[k a˨˩˦ n t ə˩ ʂ a˥˩ ŋ]

Open-Mid

Open

Occurrences:
56,196
Examples:
* 玛窦福音:
[m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n]
* 国籍法:
[k w o˧˥ i˧˥ f a˨˩˦]
* 比方说:
[p i˨˩˦ f ŋ ʂ w o˥˥]
* 喇嘛教:
[l a˨˩˦ m j aw˥˩]

Diphthongs#

  • aj

  • aw

  • ej

  • ow

Tones#

  • ˥˥

  • ˥˩

  • ˦

  • ˧˥

  • ˨

  • ˨˩˦

  • ˩