Mandarin (China) MFA dictionary v2.0.0#
@techreport{mfa_mandarin_china_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (China) MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (China) MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_china_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 10,590 Examples: * 费尔曼: [f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n] * 小毛驴: [ɕ j aw˨˩˦ m aw˧˥ l y˧˥] * 民主党: [m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ] * 玛窦福音: [m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n] |
Occurrences: 62,422 Examples: * 早第三纪: [ts aw˨˩˦ t i˥˩ s a˥˥ n tɕ i˥˩] * 音乐茶座: [i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩] * 新篠津: [ɕ i˥˥ n ɕ j aw˨˩˦ tɕ i˥˥ n] * 亚历山大: [j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩] |
Occurrences: 49,692 Examples: * 装饰城: [ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ] * 孔乙己: [kʰ u˨˩˦ ŋ i˨˩˦ tɕ i˨˩˦] * 王家桥: [w a˧˥ ŋ tɕ j a˥˥ tɕʰ j aw˧˥] * 二重唱: [ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 15,018 Examples: * 比凤姐: [p i˨˩˦ f o˥˩ ŋ tɕ j e˨˩˦] * 一百一十七: [i˥˥ p aj˨˩˦ i˥˥ ʂ ʐ̩˧˥ tɕʰ i˥˥] * 苏必略湖: [s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥] * 两百块: [l j a˨˩˦ ŋ p aj˨˩˦ kʰ w aj˥˩] |
Occurrences: 17,251 Examples: * 早第三纪: [ts aw˨˩˦ t i˥˩ s a˥˥ n tɕ i˥˩] * 亚历山大: [j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩] * 民主党: [m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ] * 玛窦福音: [m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n] |
Occurrences: 12,473 Examples: * 老公公: [l aw˨˩˦ k u˥˥ ŋ k u˨ ŋ] * 光福镇: [k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n] * 科技股份: [kʰ o˥˥ tɕ i˥˩ k u˨˩˦ f ə˥˩ n] * 国籍法: [k w o˧˥ tɕ i˧˥ f a˨˩˦] |
Occurrences: 5,838 Examples: * 二重唱: [ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ] * 费尔曼: [f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n] * 岸和田: [ʔ a˥˩ n x o˧˥ tʰ j e˧˥ n] * 澳门人: [ʔ aw˥˩ m ə˧˥ n ʐ ə˧˥ n] |
|||
Affricate |
Occurrences: 7,195 Examples: * 早第三纪: [ts aw˨˩˦ t i˥˩ s a˥˥ n tɕ i˥˩] * 音乐茶座: [i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩] * 川楝子: [ʈʂʰ w a˥˥ n l j e˥˩ n ts z̩˨˩˦] * 代罪羔羊: [t aj˥˩ ts w ej˥˩ k aw˥˥ j a˧˥ ŋ] |
Occurrences: 15,021 Examples: * 装饰城: [ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ] * 民主党: [m i˧˥ n ʈʂ u˨˩˦ t a˨˩˦ ŋ] * 光福镇: [k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n] * 周华雄: [ʈʂ ow˥˥ x w a˧˥ ɕ j u˧˥ ŋ] |
Occurrences: 20,526 Examples: * 早第三纪: [ts aw˨˩˦ t i˥˩ s a˥˥ n tɕ i˥˩] * 孔乙己: [kʰ u˨˩˦ ŋ i˨˩˦ tɕ i˨˩˦] * 王家桥: [w a˧˥ ŋ tɕ j a˥˥ tɕʰ j aw˧˥] * 新篠津: [ɕ i˥˥ n ɕ j aw˨˩˦ tɕ i˥˥ n] |
||||
Sibilant |
Occurrences: 7,824 Examples: * 早第三纪: [ts aw˨˩˦ t i˥˩ s a˥˥ n tɕ i˥˩] * 十四日: [ʂ ʐ̩˧˥ s z̩˥˩ ʐ̩˥˩] * 苏必略湖: [s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥] * 索菲亚: [s w o˨˩˦ f ej˥˥ j a˨˩˦] Occurrences: 6,100 Examples: * 柯尔克孜人: [kʰ o˥˥ ʔ o˨˩˦ ɻ kʰ o˥˩ ts z̩˥˥ ʐ ə˧˥ n] * 过日子: [k w o˥˩ ʐ̩˥˩ ts z̩˩] * 思想观念: [s z̩˥˥ ɕ j a˨˩˦ ŋ k w a˥˥ n n j e˥˩ n] * 形容词: [ɕ i˧˥ ŋ ʐ u˧˥ ŋ tsʰ z̩˧˥] |
Occurrences: 19,266 Examples: * 装饰城: [ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ] * 十四日: [ʂ ʐ̩˧˥ s z̩˥˩ ʐ̩˥˩] * 亚历山大: [j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩] * 文科生: [w ə˧˥ n kʰ o˥˥ ʂ o˥˥ ŋ] Occurrences: 4,256 Examples: * 双人间: [ʂ w a˥˥ ŋ ʐ ə˧˥ n tɕ j e˥˥ n] * 瑞士麦片: [ʐ w ej˥˩ ʂ ʐ̩˥˩ m aj˥˩ pʰ j e˥˩ n] * 开车人: [kʰ aj˥˥ ʈʂʰ o˥˥ ʐ ə˧˥ n] * 九畹遗容: [tɕ j ow˨˩˦ w a˨˩˦ n i˧˥ ʐ u˧˥ ŋ] Occurrences: 11,304 Examples: * 装饰城: [ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ] * 破魔师: [pʰ w o˥˩ m w o˧˥ ʂ ʐ̩˥˥] * 六百九十九: [l j ow˥˩ p aj˨˩˦ tɕ j ow˨˩˦ ʂ ʐ̩˧˥ tɕ j ow˨˩˦] * 脚趾头: [tɕ j aw˨˩˦ ʈʂ ʐ̩˨˩˦ tʰ ow˧˥] |
Occurrences: 17,554 Examples: * 新篠津: [ɕ i˥˥ n ɕ j aw˨˩˦ tɕ i˥˥ n] * 小熊维尼: [ɕ j aw˨˩˦ ɕ j u˧˥ ŋ w ej˧˥ n i˧˥] * 小毛驴: [ɕ j aw˨˩˦ m aw˧˥ l y˧˥] * 周华雄: [ʈʂ ow˥˥ x w a˧˥ ɕ j u˧˥ ŋ] |
||||
Fricative |
Occurrences: 8,927 Examples: * 比凤姐: [p i˨˩˦ f o˥˩ ŋ tɕ j e˨˩˦] * 费尔曼: [f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n] * 光福镇: [k w a˥˥ ŋ f u˧˥ ʈʂ ə˥˩ n] * 科技股份: [kʰ o˥˥ tɕ i˥˩ k u˨˩˦ f ə˥˩ n] |
||||||
Approximant |
Occurrences: 35,507 Examples: * 装饰城: [ʈʂ w a˥˥ ŋ ʂ ʐ̩˥˩ ʈʂʰ o˧˥ ŋ] * 音乐茶座: [i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩] * 王家桥: [w a˧˥ ŋ tɕ j a˥˥ tɕʰ j aw˧˥] * 罗汉橙: [l w o˧˥ x a˥˩ n ʈʂʰ o˧˥ ŋ] |
Occurrences: 2,562 Examples: * 二重唱: [ʔ o˥˩ ɻ ʈʂʰ u˧˥ ŋ ʈʂʰ a˥˩ ŋ] * 费尔曼: [f ej˥˩ ʔ o˨˩˦ ɻ m a˥˩ n] * 柯尔克孜人: [kʰ o˥˥ ʔ o˨˩˦ ɻ kʰ o˥˩ ts z̩˥˥ ʐ ə˧˥ n] * 耳机线: [ʔ o˨˩˦ ɻ tɕ i˥˥ ɕ j e˥˩ n] |
Occurrences: 43,610 Examples: * 王家桥: [w a˧˥ ŋ tɕ j a˥˥ tɕʰ j aw˧˥] * 新篠津: [ɕ i˥˥ n ɕ j aw˨˩˦ tɕ i˥˥ n] * 比凤姐: [p i˨˩˦ f o˥˩ ŋ tɕ j e˨˩˦] * 亚历山大: [j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩] Occurrences: 6,615 Examples: * 音乐茶座: [i˥˥ n ɥ e˥˩ ʈʂʰ a˧˥ ts w o˥˩] * 苏必略湖: [s u˥˥ p i˥˩ l ɥ e˥˩ x u˧˥] * 原生代: [ɥ e˧˥ n ʂ o˥˥ ŋ t aj˥˩] * 神经学: [ʂ ə˧˥ n tɕ i˥˥ ŋ ɕ ɥ e˧˥] |
||||
Lateral |
Occurrences: 18,761 Examples: * 老公公: [l aw˨˩˦ k u˥˥ ŋ k u˨ ŋ] * 亚历山大: [j a˨˩˦ l i˥˩ ʂ a˥˥ n t a˥˩] * 罗汉橙: [l w o˧˥ x a˥˩ n ʈʂʰ o˧˥ ŋ] * 小毛驴: [ɕ j aw˨˩˦ m aw˧˥ l y˧˥] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 44,773 Examples: * 配起来: [pʰ ej˥˩ tɕʰ i˩ l aj˩] * 闹脾气: [n aw˥˩ pʰ i˧˥ tɕʰ i˨] * 有色玻璃: [j ow˨˩˦ s o˥˩ p w o˥˥ l i˨] * 孔乙己: [kʰ u˨˩˦ ŋ i˨˩˦ tɕ i˨˩˦] Occurrences: 8,736 Examples: * 情趣内衣: [tɕʰ i˧˥ ŋ tɕʰ y˥˩ n ej˥˩ i˥˥] * 长宁区: [ʈʂʰ a˧˥ ŋ n i˧˥ ŋ tɕʰ y˥˥] * 绿豆汤: [l y˥˩ t ow˥˩ tʰ a˥˥ ŋ] * 租出去: [ts u˥˥ ʈʂʰ u˥˥ tɕʰ y˨] |
Occurrences: 34,067 Examples: * 不得好死: [p u˩ t o˧˥ x aw˨˩˦ s z̩˨˩˦] * 差不差: [ʈʂʰ a˥˩ p u˩ ʈʂʰ a˥˥] * 别有用意: [p j e˧˥ j ow˨˩˦ j u˥˩ ŋ i˥˩] * 玛窦福音: [m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n] |
|||
Close-Mid |
Occurrences: 25,091 Examples: * 里边儿: [l i˨˩˦ p j e˦ n ʔ o˧˥ ɻ] * 今天下午: [tɕ i˥˥ n tʰ j e˥˥ n ɕ j a˥˩ u˨˩˦] * 王爷府: [w a˧˥ ŋ j e˨ f u˨˩˦] * 打哈欠: [t a˨˩˦ x a˥˥ tɕʰ j e˨ n] Occurrences: 12,664 Examples: * 张志辉: [ʈʂ a˥˥ ŋ ʈʂ ʐ̩˥˩ x w ej˥˥] * 魁北克市: [kʰ w ej˧˥ p ej˨˩˦ kʰ o˥˩ ʂ ʐ̩˥˩] * 画眉鸟: [x w a˥˩ m ej˧˥ n j aw˨˩˦] * 北沙参: [p ej˨˩˦ ʂ a˥˥ ʂ ə˥˥ n] |
Occurrences: 32,056 Examples: * 柯尔克孜人: [kʰ o˥˥ ʔ o˨˩˦ ɻ kʰ o˥˩ ts z̩˥˥ ʐ ə˧˥ n] * 国籍法: [k w o˧˥ tɕ i˧˥ f a˨˩˦] * 外甥女婿: [w aj˥˩ ʂ o˩ ŋ n y˨˩˦ ɕ y˦] * 老婆子: [l aw˨˩˦ pʰ w o˦ ts z̩˨˩˦] Occurrences: 12,600 Examples: * 落马洲: [l w o˥˩ m a˨˩˦ ʈʂ ow˥˥] * 九畹遗容: [tɕ j ow˨˩˦ w a˨˩˦ n i˧˥ ʐ u˧˥ ŋ] * 宇宙论: [y˨˩˦ ʈʂ ow˥˩ l w ə˥˩ n] * 邮箱地址: [j ow˧˥ ɕ j a˥˥ ŋ t i˥˩ ʈʂ ʐ̩˨˩˦] |
|||
Occurrences: 11,559 Examples: * 哈德森湾: [x a˥˥ t o˧˥ s ə˥˥ n w a˥˥ n] * 鲍德温: [p aw˥˩ t o˧˥ w ə˥˥ n] * 北沙参: [p ej˨˩˦ ʂ a˥˥ ʂ ə˥˥ n] * 赶得上: [k a˨˩˦ n t ə˩ ʂ a˥˩ ŋ] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 56,196 Examples: * 玛窦福音: [m a˨˩˦ t ow˥˩ f u˧˥ i˥˥ n] * 国籍法: [k w o˧˥ tɕ i˧˥ f a˨˩˦] * 比方说: [p i˨˩˦ f a˦ ŋ ʂ w o˥˥] * 喇嘛教: [l a˨˩˦ m a˦ tɕ j aw˥˩] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩