Mandarin MFA dictionary v2.0.0#
@techreport{mfa_mandarin_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 10,901 Examples: * 蒙彼利埃: [m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥] * 启蒙运动: [tɕʰ i˨˩˦ m o˧˥ ŋ y˥˩ n t u˥˩ ŋ] * 三地门: [s a˥˥ n t i˥˩ m ə˧˥ n] * 大五码: [t a˥˩ u˨˩˦ m a˨˩˦] |
Occurrences: 63,083 Examples: * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 阳泉市: [j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩] * 奥萨苏纳: [ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩] * 汉时关: [x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n] |
Occurrences: 50,129 Examples: * 开裆裤: [kʰ ai˥˥ t a˥˥ ŋ kʰ u˥˩] * 五二零: [u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ] * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 阳泉市: [j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 15,389 Examples: * 大爆炸: [t a˥˩ p au˥˩ ʈʂ a˥˩] * 蒙彼利埃: [m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥] * 预备役: [y˥˩ p ei˥˩ i˥˩] * 心照不宣: [ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n] |
Occurrences: 17,629 Examples: * 大爆炸: [t a˥˩ p au˥˩ ʈʂ a˥˩] * 开裆裤: [kʰ ai˥˥ t a˥˥ ŋ kʰ u˥˩] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] * 交叉点: [tɕ j au˥˥ ʈʂʰ a˥˥ t j e˨˩˦ n] |
Occurrences: 12,785 Examples: * 汉时关: [x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n] * 爱国歌: [ʔ ai˥˩ k w o˧˥ k o˥˥] * 预告片: [y˥˩ k au˥˩ pʰ j a˥˩ ɻ] * 故事梗概: [k u˥˩ ʂ ʐ̩˥˩ k o˨˩˦ ŋ k ai˥˩] |
Occurrences: 5,901 Examples: * 五二零: [u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ] * 奥萨苏纳: [ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩] * 蒙彼利埃: [m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥] * 爱国歌: [ʔ ai˥˩ k w o˧˥ k o˥˥] |
|||
Affricate |
Occurrences: 7,308 Examples: * 新娘子: [ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] * 资产阶级: [ts z̩˥˥ ʈʂʰ a˨˩˦ n tɕ j e˥˥ tɕ i˧˥] * 人字拖: [ʐ ə˧˥ n ts z̩˥˩ tʰ w o˥˥] |
Occurrences: 15,212 Examples: * 大爆炸: [t a˥˩ p au˥˩ ʈʂ a˥˩] * 心照不宣: [ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n] * 牌洲湾: [pʰ ai˧˥ ʈʂ ou˥˥ w a˥˥ n] * 赵屯儿: [ʈʂ au˥˩ tʰ w ə˧˥ n ʔ o˧˥ ɻ] |
Occurrences: 20,864 Examples: * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 医学界: [i˥˥ ɕ ɥ e˧˥ tɕ j e˥˩] * 交叉点: [tɕ j au˥˥ ʈʂʰ a˥˥ t j e˨˩˦ n] * 资产阶级: [ts z̩˥˥ ʈʂʰ a˨˩˦ n tɕ j e˥˥ tɕ i˧˥] |
||||
Sibilant |
Occurrences: 7,916 Examples: * 奥萨苏纳: [ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩] * 帕斯卡: [pʰ a˥˩ s z̩˥˥ kʰ a˨˩˦] * 三二六: [s a˥˥ n ʔ o˥˩ ɻ l j ou˥˩] * 三地门: [s a˥˥ n t i˥˩ m ə˧˥ n] Occurrences: 6,140 Examples: * 汉字文化圈: [x a˥˩ n ts z̩˥˩ w ə˧˥ n x w a˥˩ tɕʰ ɥ e˥˥ n] * 词汇范畴: [tsʰ z̩˧˥ x w ei˥˩ f a˥˩ n ʈʂʰ ou˧˥] * 资源回收筒: [ts z̩˥˥ ɥ e˧˥ n x w ei˧˥ ʂ ou˥˥ tʰ u˨˩˦ ŋ] * 天主子: [tʰ j e˥˥ n ʈʂ u˨˩˦ ts z̩˨˩˦] |
Occurrences: 19,590 Examples: * 阳泉市: [j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩] * 汉时关: [x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n] * 完全平方数: [w a˧˥ n tɕʰ ɥ e˧˥ n pʰ i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] Occurrences: 4,322 Examples: * 原来如此: [ɥ e˧˥ n l ai˧˥ ʐ u˧˥ tsʰ z̩˨˩˦] * 人字拖: [ʐ ə˧˥ n ts z̩˥˩ tʰ w o˥˥] * 入海口: [ʐ u˥˩ x ai˨˩˦ kʰ ou˨˩˦] * 人人乐: [ʐ ə˧˥ n ʐ ə˧˥ n ɥ e˥˩] Occurrences: 11,374 Examples: * 汉时关: [x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n] * 钥匙链: [j au˥˩ ʂ ʐ̩˩ l j e˥˩ n] * 干湿计: [k a˥˥ n ʂ ʐ̩˥˥ tɕ i˥˩] * 故事梗概: [k u˥˩ ʂ ʐ̩˥˩ k o˨˩˦ ŋ k ai˥˩] |
Occurrences: 17,847 Examples: * 新娘子: [ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨] * 山阳县: [ʂ a˥˥ n j a˧˥ ŋ ɕ j e˥˩ n] * 医学界: [i˥˥ ɕ ɥ e˧˥ tɕ j e˥˩] * 心照不宣: [ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n] |
||||
Fricative |
Occurrences: 9,086 Examples: * 立法会: [l i˥˩ f a˨˩˦ x w ei˥˩] * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 完全平方数: [w a˧˥ n tɕʰ ɥ e˧˥ n pʰ i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩] * 梵阿玲: [f a˥˩ n ʔ a˥˥ l i˧˥ ŋ] |
||||||
Approximant |
Occurrences: 36,328 Examples: * 立法会: [l i˥˩ f a˨˩˦ x w ei˥˩] * 汉时关: [x a˥˩ n ʂ ʐ̩˧˥ k w a˥˥ n] * 完全平方数: [w a˧˥ n tɕʰ ɥ e˧˥ n pʰ i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] |
Occurrences: 5,119 Examples: * 五二零: [u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ] * 爱国歌: [ʔ ai˥˩ k w o˧˥ k o˥˥ ɻ] * 三二六: [s a˥˥ n ʔ o˥˩ ɻ l j ou˥˩] * 预告片: [y˥˩ k au˥˩ pʰ j a˥˩ ɻ] |
Occurrences: 44,754 Examples: * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 阳泉市: [j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩] * 新娘子: [ɕ i˥˥ n n j a˧˥ ŋ ts z̩˨] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] Occurrences: 6,719 Examples: * 阳泉市: [j a˧˥ ŋ tɕʰ ɥ e˧˥ n ʂ ʐ̩˥˩] * 完全平方数: [w a˧˥ n tɕʰ ɥ e˧˥ n pʰ i˧˥ ŋ f a˥˥ ŋ ʂ u˥˩] * 医学界: [i˥˥ ɕ ɥ e˧˥ tɕ j e˥˩] * 心照不宣: [ɕ i˥˥ n ʈʂ au˥˩ p u˥˩ ɕ ɥ e˥˥ n] |
||||
Lateral |
Occurrences: 19,041 Examples: * 立法会: [l i˥˩ f a˨˩˦ x w ei˥˩] * 五二零: [u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ] * 蒙彼利埃: [m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 45,407 Examples: * 五二零: [u˨˩˦ ʔ o˥˩ ɻ l i˧˥ ŋ] * 蒙彼利埃: [m o˧˥ ŋ p i˨˩˦ l i˥˩ ʔ ai˥˥] * 土地爷: [tʰ u˨˩˦ t i˦ j e˧˥] * 东西部: [t u˥˥ ŋ ɕ i˨ p u˥˩] Occurrences: 8,832 Examples: * 预告片: [y˥˩ k au˥˩ pʰ j a˥˩ ɻ] * 嫁出去: [tɕ j a˥˩ ʈʂʰ u˥˥ tɕʰ y˨] * 尼科巴群岛: [n i˧˥ kʰ o˥˥ p a˥˥ tɕʰ y˧˥ n t au˨˩˦] * 渔洋关: [y˧˥ j a˧˥ ŋ k w a˥˥ n] |
Occurrences: 34,483 Examples: * 备不住: [p ei˥˩ p u˩ ʈʂ u˥˩] * 加不上: [tɕ j a˥˥ p u˩ ʂ a˥˩ ŋ] * 开裆裤: [kʰ ai˥˥ t a˥˥ ŋ kʰ u˥˩] * 撑不住: [ʈʂʰ o˥˥ ŋ p u˨ ʈʂ u˥˩] |
|||
Close-Mid |
Occurrences: 25,368 Examples: * 新姑爷: [ɕ i˥˥ n k u˥˥ j e˨] * 物业费: [u˥˩ j e˥˩ f ei˥˩] * 太平洋保险: [tʰ ai˥˩ pʰ i˧˥ ŋ j a˧˥ ŋ p au˨˩˦ ɕ j e˨˩˦ n] * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] |
Occurrences: 32,711 Examples: * 计算机科学: [tɕ i˥˩ s w a˥˩ n tɕ i˥˥ kʰ o˥˥ ɕ ɥ e˧˥] * 量身定做: [l j a˥˩ ŋ ʂ ə˥˥ n t i˥˩ ŋ ts w o˥˩] * 放风筝: [f a˥˩ ŋ f o˥˥ ŋ ʈʂ o˨ ŋ] * 掏耳朵: [tʰ au˥˥ ʔ o˨˩˦ ɻ t w o˦] |
|||
Occurrences: 11,950 Examples: * 人字拖: [ʐ ə˧˥ n ts z̩˥˩ tʰ w o˥˥] * 哪有什么: [n a˨˩˦ j ou˨˩˦ ʂ o˧˥ m ə˨] * 看个够: [kʰ a˥˩ n k ə˩ k ou˥˩] * 河溪镇: [x o˧˥ ɕ i˥˥ ʈʂ ə˥˩ n] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 57,645 Examples: * 昨天早上: [ts w o˧˥ tʰ j e˥˥ n ts au˨˩˦ ʂ a˦ ŋ] * 奥萨苏纳: [ʔ au˥˩ s a˥˩ s u˥˥ n a˥˩] * 卡方检验: [kʰ a˨˩˦ f a˥˥ ŋ tɕ j e˨˩˦ n j e˥˩ n] * 戳嵴樑骨: [ʈʂʰ w o˥˥ tɕ i˨˩˦ l j a˦ ŋ k u˨˩˦] |
Diphthongs#
ai
au
ei
ou
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩