Mandarin (Erhua) MFA dictionary v2.0.0#
@techreport{mfa_mandarin_erhua_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (Erhua) MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Erhua) MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_erhua_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 6,902 Examples: * 所罗门群岛: [s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦] * 凯门鳄: [kʰ aj˨˩˦ m ə˧˥ n ʔ o˥˩] * 莫干山: [m w o˥˩ k a˥˥ n ʂ a˥˥ n] * 闺中密友: [k w ej˥˥ ʈʂ u˥˥ ŋ m i˥˩ j ow˨˩˦] |
Occurrences: 37,000 Examples: * 腮腺炎: [s aj˥˥ ɕ j e˥˩ n j e˧˥ n] * 纳斐塔里: [n a˥˩ f ej˨˩˦ tʰ a˨˩˦ l i˨˩˦] * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] * 外兴安岭: [w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ] |
Occurrences: 28,701 Examples: * 子宫颈: [ts z̩˨˩˦ k u˥˥ ŋ tɕ i˨˩˦ ŋ] * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] * 宫古岛: [k u˥˥ ŋ k u˨˩˦ t aw˨˩˦] * 草头黄: [tsʰ aw˨˩˦ tʰ ow˧˥ x w a˧˥ ŋ] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 8,933 Examples: * 便利贴: [p j e˥˩ n l i˥˩ tʰ j e˥˥] * 货币主义: [x w o˥˩ p i˥˩ ʈʂ u˨˩˦ i˥˩] * 巴拿语: [p a˥˥ n a˧˥ y˨˩˦] * 古尔邦节: [k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ tɕ j e˧˥] |
Occurrences: 10,558 Examples: * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] * 宫古岛: [k u˥˥ ŋ k u˨˩˦ t aw˨˩˦] * 抗毒素: [kʰ a˥˩ ŋ t u˧˥ s u˥˩] * 所罗门群岛: [s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦] |
Occurrences: 7,874 Examples: * 子宫颈: [ts z̩˨˩˦ k u˥˥ ŋ tɕ i˨˩˦ ŋ] * 宫古岛: [k u˥˥ ŋ k u˨˩˦ t aw˨˩˦] * 古尔邦节: [k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ tɕ j e˧˥] * 感染群: [k a˨˩˦ n ʐ a˨˩˦ n tɕʰ y˧˥ n] |
Occurrences: 3,191 Examples: * 玉尔其: [y˥˩ ʔ o˨˩˦ ɻ tɕʰ i˧˥] * 外兴安岭: [w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ] * 叶尔羌河: [j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥] * 古尔邦节: [k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ tɕ j e˧˥] |
|||
Affricate |
Occurrences: 4,735 Examples: * 子宫颈: [ts z̩˨˩˦ k u˥˥ ŋ tɕ i˨˩˦ ŋ] * 董教总: [t u˨˩˦ ŋ tɕ j aw˥˩ ts u˨˩˦ ŋ] * 森林火灾: [s ə˥˥ n l i˧˥ n x w o˨˩˦ ts aj˥˥] * 报贩子: [p aw˥˩ f a˥˩ n ts z̩˩] |
Occurrences: 8,793 Examples: * 海玉竹: [x aj˨˩˦ y˥˩ ʈʂ u˧˥] * 伏龙芝: [f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥] * 李富庄: [l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ] * 货币主义: [x w o˥˩ p i˥˩ ʈʂ u˨˩˦ i˥˩] |
Occurrences: 12,555 Examples: * 子宫颈: [ts z̩˨˩˦ k u˥˥ ŋ tɕ i˨˩˦ ŋ] * 持械抢劫: [ʈʂʰ ʐ̩˧˥ ɕ j e˥˩ tɕʰ j a˨˩˦ ŋ tɕ j e˧˥] * 古尔邦节: [k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ tɕ j e˧˥] * 董教总: [t u˨˩˦ ŋ tɕ j aw˥˩ ts u˨˩˦ ŋ] |
||||
Sibilant |
Occurrences: 4,501 Examples: * 腮腺炎: [s aj˥˥ ɕ j e˥˩ n j e˧˥ n] * 抗毒素: [kʰ a˥˩ ŋ t u˧˥ s u˥˩] * 所罗门群岛: [s w o˨˩˦ l w o˧˥ m ə˧˥ n tɕʰ y˧˥ n t aw˨˩˦] * 星期四: [ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩] Occurrences: 3,912 Examples: * 星期四: [ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩] * 马驹子: [m a˨˩˦ tɕ y˥˥ ts z̩˨] * 勃拉姆斯: [p w o˧˥ l a˥˥ m u˨˩˦ s z̩˥˥] * 电子贝斯: [t j e˥˩ n ts z̩˨˩˦ p ej˥˩ s z̩˥˥] |
Occurrences: 11,448 Examples: * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] * 传达室: [ʈʂʰ w a˧˥ n t a˧˥ ʂ ʐ̩˨˩˦] * 商业区: [ʂ a˥˥ ŋ j e˥˩ tɕʰ y˥˥] * 顾俊沙: [k u˥˩ tɕ y˥˩ n ʂ a˥˥] Occurrences: 2,592 Examples: * 萨拉热窝: [s a˥˩ l a˥˥ ʐ o˥˩ w o˥˥] * 人行道: [ʐ ə˧˥ n ɕ i˧˥ ŋ t aw˥˩] * 感染群: [k a˨˩˦ n ʐ a˨˩˦ n tɕʰ y˧˥ n] * 碎肉器: [s w ej˥˩ ʐ ow˥˩ tɕʰ i˥˩] Occurrences: 6,404 Examples: * 小市民: [ɕ j aw˨˩˦ ʂ ʐ̩˥˩ m i˧˥ n] * 百年国耻: [p aj˨˩˦ n j e˧˥ n k w o˧˥ ʈʂʰ ʐ̩˨˩˦] * 伏龙芝: [f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥] * 胆结石: [t a˨˩˦ n tɕ j e˧˥ ʂ ʐ̩˧˥] |
Occurrences: 10,545 Examples: * 腮腺炎: [s aj˥˥ ɕ j e˥˩ n j e˧˥ n] * 外兴安岭: [w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ] * 持械抢劫: [ʈʂʰ ʐ̩˧˥ ɕ j e˥˩ tɕʰ j a˨˩˦ ŋ tɕ j e˧˥] * 星期四: [ɕ i˥˥ ŋ tɕʰ i˥˥ s z̩˥˩] |
||||
Fricative |
Occurrences: 5,538 Examples: * 纳斐塔里: [n a˥˩ f ej˨˩˦ tʰ a˨˩˦ l i˨˩˦] * 伏龙芝: [f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥] * 李富庄: [l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ] * 报贩子: [p aw˥˩ f a˥˩ n ts z̩˩] |
||||||
Approximant |
Occurrences: 21,299 Examples: * 草头黄: [tsʰ aw˨˩˦ tʰ ow˧˥ x w a˧˥ ŋ] * 外兴安岭: [w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ] * 李富庄: [l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ] * 冤枉路: [ɥ e˥˥ n w a˨ ŋ l u˥˩] |
Occurrences: 3,579 Examples: * 玉尔其: [y˥˩ ʔ o˨˩˦ ɻ tɕʰ i˧˥] * 叶尔羌河: [j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥] * 古尔邦节: [k u˨˩˦ ʔ o˨˩˦ ɻ p a˥˥ ŋ tɕ j e˧˥] * 小姑儿: [ɕ j aw˨˩˦ k u˥˥ ɻ] |
Occurrences: 25,985 Examples: * 腮腺炎: [s aj˥˥ ɕ j e˥˩ n j e˧˥ n] * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] * 便利贴: [p j e˥˩ n l i˥˩ tʰ j e˥˥] * 叶尔羌河: [j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥] Occurrences: 4,109 Examples: * 冤枉路: [ɥ e˥˥ n w a˨ ŋ l u˥˩] * 古文字学: [k u˨˩˦ w ə˧˥ n ts z̩˥˩ ɕ ɥ e˧˥] * 古元古代: [k u˨˩˦ ɥ e˧˥ n k u˨˩˦ t aj˥˩] * 语言学家: [y˨˩˦ j e˧˥ n ɕ ɥ e˧˥ tɕ j a˥˥] |
||||
Lateral |
Occurrences: 11,057 Examples: * 纳斐塔里: [n a˥˩ f ej˨˩˦ tʰ a˨˩˦ l i˨˩˦] * 伏龙芝: [f u˧˥ l u˧˥ ŋ ʈʂ ʐ̩˥˥] * 外兴安岭: [w aj˥˩ ɕ i˥˥ ŋ ʔ a˥˥ n l i˨˩˦ ŋ] * 李富庄: [l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 26,672 Examples: * 腓立比书: [f ej˧˥ l i˥˩ p i˨˩˦ ʂ u˥˥] * 李富庄: [l i˨˩˦ f u˥˩ ʈʂ w a˥˥ ŋ] * 没有关系: [m ej˧˥ j ow˨˩˦ k w a˥˥ n ɕ i˨] * 势利眼: [ʂ ʐ̩˥˩ l i˩ j a˨˩˦ ɻ] Occurrences: 5,463 Examples: * 愉景湾: [y˧˥ tɕ i˨˩˦ ŋ w a˥˥ n] * 古典藏语: [k u˨˩˦ t j e˨˩˦ n ts a˥˩ ŋ y˨˩˦] * 文昌帝君: [w ə˧˥ n ʈʂʰ a˥˥ ŋ t i˥˩ tɕ y˥˥ n] * 姪女婿: [ʈʂ ʐ̩˧˥ n y˨˩˦ ɕ y˦] |
Occurrences: 21,271 Examples: * 只不过: [ʈʂ ʐ̩˨˩˦ p u˦ k w o˥˩] * 胡椒瓶: [x u˧˥ tɕ j aw˥˥ pʰ i˧˥ ŋ] * 恨不能: [x ə˥˩ n p u˩ n o˧˥ ŋ] * 表姐夫: [p j aw˨˩˦ tɕ j e˨˩˦ f u˦] |
|||
Close-Mid |
Occurrences: 15,113 Examples: * 老爷爷: [l aw˨˩˦ j e˧˥ j e˨] * 新约全书: [ɕ i˥˥ n ɥ e˥˥ tɕʰ ɥ e˧˥ n ʂ u˥˥] * 新姑爷: [ɕ i˥˥ n k u˥˥ j e˨] * 野战医院: [j e˨˩˦ ʈʂ a˥˩ n i˥˥ ɥ e˥˩ n] Occurrences: 7,254 Examples: * 回回教: [x w ej˧˥ x w ej˨ tɕ j aw˥˩] * 纳斐塔里: [n a˥˩ f ej˨˩˦ tʰ a˨˩˦ l i˨˩˦] * 碎肉器: [s w ej˥˩ ʐ ow˥˩ tɕʰ i˥˩] * 未央宫: [w ej˥˩ j a˥˥ ŋ k u˥˥ ŋ] |
Occurrences: 18,898 Examples: * 没见过世面: [m ej˧˥ tɕ j e˥˩ n k w o˩ ʂ ʐ̩˥˩ m j e˥˩ n] * 救生队: [tɕ j ow˥˩ ʂ o˥˥ ŋ t w ej˥˩] * 夫妻老婆店: [f u˥˥ tɕʰ i˥˥ l aw˨˩˦ pʰ w o˦ t j e˥˩ n] * 小老婆: [ɕ j aw˨˩˦ l aw˨˩˦ pʰ w o˦] Occurrences: 7,767 Examples: * 草头黄: [tsʰ aw˨˩˦ tʰ ow˧˥ x w a˧˥ ŋ] * 西贝柳斯: [ɕ i˥˥ p ej˥˩ l j ow˨˩˦ s z̩˥˥] * 奶油包: [n aj˨˩˦ j ow˧˥ p aw˥˥] * 手电筒: [ʂ ow˨˩˦ t j e˥˩ n tʰ u˨˩˦ ŋ] |
|||
Occurrences: 7,146 Examples: * 意味着: [i˥˩ w ej˥˩ ʈʂ ə˩] * 混混儿: [x w ə˥˩ n x w ə˩ ɻ] * 放得开: [f a˥˩ ŋ t ə˩ kʰ aj˥˥] * 凯门鳄: [kʰ aj˨˩˦ m ə˧˥ n ʔ o˥˩] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 34,421 Examples: * 哈利路亚: [x a˥˥ l i˥˩ l u˥˩ j a˨˩˦] * 大姑娘: [t a˥˩ k u˥˥ n j a˨ ŋ] * 叶尔羌河: [j e˥˩ ʔ o˨˩˦ ɻ tɕʰ j a˥˥ ŋ x o˧˥] * 指甲钳: [ʈʂ ʐ̩˨˩˦ tɕ j a˦ tɕʰ j e˧˥ n] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩