Mandarin (China) MFA dictionary v2.0.0a#
@techreport{mfa_mandarin_china_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (China) MFA dictionary v2.0.0a},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (China) MFA dictionary v2_0_0a.html}},
year={2022},
month={May},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_china_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 14,685 Examples: * 接骨木: [t ɕ j e ˥ ˥ k u ˨ ˩ ˦ m u ˥ ˩] * 吉列尔莫: [t ɕ i ˧ ˥ l j e ˥ ˩ ʔ o ˨ ˩ ˦ ɻ m w o ˥ ˩] * 二人民: [ʔ o ˥ ˩ ɻ ʐ ə ˧ ˥ n m i ˧ ˥ n] * 凯勒曼: [k ʰ a j ˨ ˩ ˦ l o ˥ ˩ m a ˥ ˩ n] |
Occurrences: 83,565 Examples: * 氢氟酸: [t ɕ ʰ i ˥ ˥ ŋ f u ˧ ˥ s w a ˥ ˥ n] * 张缵辞: [ʈ ʂ a ˥ ˥ ŋ t s w a ˨ ˩ ˦ n t s ʰ z ̩ ˧ ˥] * 裸体美女: [l w o ˨ ˩ ˦ t ʰ i ˨ ˩ ˦ m e j ˨ ˩ ˦ n y ˨ ˩ ˦] * 米沙鄢: [m i ˨ ˩ ˦ ʂ a ˥ ˥ j e ˥ ˥ n] |
Occurrences: 65,312 Examples: * 脐带绕颈: [t ɕ ʰ i ˧ ˥ t a j ˥ ˩ ʐ a w ˥ ˩ t ɕ i ˨ ˩ ˦ ŋ] * 兴趣爱好: [ɕ i ˥ ˩ ŋ t ɕ ʰ y ˥ ˩ ʔ a j ˥ ˩ x a w ˥ ˩] * 限定版: [ɕ j e ˥ ˩ n t i ˥ ˩ ŋ p a ˨ ˩ ˦ n] * 王龙龙: [w a ˧ ˥ ŋ l u ˧ ˥ ŋ l u ˧ ˥ ŋ] Occurrences: 3 Examples: |
||||
Stop |
Occurrences: 19,842 Examples: * 霍拉布伦: [x w o ˥ ˩ l a ˥ ˥ p u ˥ ˩ l w ə ˧ ˥ n] * 波本威士忌: [p w o ˥ ˥ p ə ˨ ˩ ˦ n w e j ˥ ˥ ʂ ʐ ̩ ˥ ˩ t ɕ i ˥ ˩] * 贝森路: [p e j ˥ ˩ s ə ˥ ˥ n l u ˥ ˩] * 格勒诺布尔: [k o ˧ ˥ l o ˥ ˩ n w o ˥ ˩ p u ˥ ˩ ʔ o ˨ ˩ ˦ ɻ] |
Occurrences: 23,089 Examples: * 机械唯物论: [t ɕ i ˥ ˥ ɕ j e ˥ ˩ w e j ˧ ˥ u ˥ ˩ l w ə ˥ ˩ n] * 德才兼备: [t o ˧ ˥ t s ʰ a j ˧ ˥ t ɕ j e ˥ ˥ n p e j ˥ ˩] * 可穿戴式: [k ʰ o ˨ ˩ ˦ ʈ ʂ ʰ w a ˥ ˥ n t a j ˥ ˩ ʂ ʐ ̩ ˥ ˩] * 圣热内德: [ʂ o ˥ ˩ ŋ ʐ o ˥ ˩ n e j ˥ ˩ t o ˧ ˥] |
Occurrences: 16,386 Examples: * 圣克莱: [ʂ o ˥ ˩ ŋ k ʰ o ˥ ˩ l a j ˧ ˥] * 共鸣音: [k u ˥ ˩ ŋ m i ˧ ˥ ŋ i ˥ ˥ n] * 回头客: [x w e j ˧ ˥ t ʰ o w ˧ ˥ k ʰ o ˥ ˩] * 关节痛: [k w a ˥ ˥ n t ɕ j e ˧ ˥ t ʰ u ˥ ˩ ŋ] |
Occurrences: 9,944 Examples: * 康瓦尔一词: [k ʰ a ˥ ˥ ŋ w a ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ i ˥ ˥ t s ʰ z ̩ ˧ ˥] * 凹凸性: [ʔ a w ˥ ˥ t ʰ u ˥ ˥ ɕ i ˥ ˩ ŋ] * 二十种: [ʔ o ˥ ˩ ɻ ʂ ʐ ̩ ˧ ˥ ʈ ʂ u ˨ ˩ ˦ ŋ] * 马斯基埃: [m a ˨ ˩ ˦ s z ̩ ˥ ˥ t ɕ i ˥ ˥ ʔ a j ˥ ˥] |
|||
Affricate |
Occurrences: 9,249 Examples: * 奉子成婚: [f o ˥ ˩ ŋ t s z ̩ ˨ ˩ ˦ ʈ ʂ ʰ o ˧ ˥ ŋ x w ə ˥ ˥ n] * 巨乌贼: [t ɕ y ˥ ˩ u ˥ ˥ t s e j ˧ ˥] * 椰子糖: [j e ˥ ˥ t s z ̩ ˨ t ʰ a ˧ ˥ ŋ] * 原住民族: [ɥ e ˧ ˥ n ʈ ʂ u ˥ ˩ m i ˧ ˥ n t s u ˧ ˥] |
Occurrences: 20,399 Examples: * 沉没成本: [ʈ ʂ ʰ ə ˧ ˥ n m w o ˥ ˩ ʈ ʂ ʰ o ˧ ˥ ŋ p ə ˨ ˩ ˦ n] * 指派性别: [ʈ ʂ ʐ ̩ ˨ ˩ ˦ p ʰ a j ˥ ˩ ɕ i ˥ ˩ ŋ p j e ˧ ˥] * 赤水市: [ʈ ʂ ʰ ʐ ̩ ˥ ˩ ʂ w e j ˨ ˩ ˦ ʂ ʐ ̩ ˥ ˩] * 陆朝安人: [l u ˥ ˩ ʈ ʂ ʰ a w ˧ ˥ ʔ a ˥ ˥ n ʐ ə ˧ ˥ n] |
Occurrences: 26,412 Examples: * 加拿大籍: [t ɕ j a ˥ ˥ n a ˧ ˥ t a ˥ ˩ t ɕ i ˧ ˥] * 军中从事: [t ɕ y ˥ ˥ n ʈ ʂ u ˥ ˥ ŋ t s ʰ u ˧ ˥ ŋ ʂ ʐ ̩ ˥ ˩] * 教士队: [t ɕ j a w ˥ ˩ ʂ ʐ ̩ ˥ ˩ t w e j ˥ ˩] * 工商业界: [k u ˥ ˥ ŋ ʂ a ˥ ˥ ŋ j e ˥ ˩ t ɕ j e ˥ ˩] |
||||
Sibilant |
Occurrences: 11,751 Examples: * 座囊菌: [t s w o ˥ ˩ n a ˧ ˥ ŋ t ɕ y ˥ ˥ n] * 苏宏杰: [s u ˥ ˥ x u ˧ ˥ ŋ t ɕ j e ˧ ˥] * 第三张: [t i ˥ ˩ s a ˥ ˥ n ʈ ʂ a ˥ ˥ ŋ] * 大汉族主义: [t a ˥ ˩ x a ˥ ˩ n t s u ˧ ˥ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩] Occurrences: 0 Examples: * 四氯化碳: [s z ̩ ˥ ˩ l y ˥ ˩ x w a ˥ ˩ t ʰ a ˥ ˩ n] * 五四路: [u ˨ ˩ ˦ s z ̩ ˥ ˩ l u ˥ ˩] * 一九四七: [i ˥ ˥ t ɕ j o w ˨ ˩ ˦ s z ̩ ˥ ˩ t ɕ ʰ i ˥ ˥] * 肚子疼: [t u ˥ ˩ t s z ̩ ˩ t ʰ o ˧ ˥ ŋ] Occurrences: 8,954 Examples: * 四十八: [s z ̩ ˥ ˩ ʂ ʐ ̩ ˧ ˥ p a ˥ ˥] * 有会子: [j o w ˨ ˩ ˦ x w e j ˨ ˩ ˦ t s z ̩ ˦] * 不定冠词: [p u ˥ ˩ t i ˥ ˩ ŋ k w a ˥ ˩ n t s ʰ z ̩ ˧ ˥] * 补助词: [p u ˨ ˩ ˦ ʈ ʂ u ˥ ˩ t s ʰ z ̩ ˧ ˥] |
Occurrences: 25,877 Examples: * 鹤成鸟: [x o ˥ ˩ ʈ ʂ ʰ o ˧ ˥ ŋ n j a w ˨ ˩ ˦] * 扎莫希奇: [ʈ ʂ a ˥ ˥ m w o ˥ ˩ ɕ i ˥ ˥ t ɕ ʰ i ˧ ˥] * 苏州话: [s u ˥ ˥ ʈ ʂ o w ˥ ˥ x w a ˥ ˩] * 上不网: [ʂ a ˥ ˩ ŋ p u ˥ ˩ w a ˨ ˩ ˦ ŋ] Occurrences: 5,760 Examples: * 五百八十二: [u ˨ ˩ ˦ p a j ˨ ˩ ˦ p a ˥ ˥ ʂ ʐ ̩ ˧ ˥ ʔ o ˥ ˩ ɻ] * 张瑞芬: [ʈ ʂ a ˥ ˥ ŋ ʐ w e j ˥ ˩ f ə ˥ ˥ n] * 第七十: [t i ˥ ˩ t ɕ ʰ i ˥ ˥ ʂ ʐ ̩ ˧ ˥] * 注意事项: [ʈ ʂ u ˥ ˩ i ˥ ˩ ʂ ʐ ̩ ˥ ˩ ɕ j a ˥ ˩ ŋ] Occurrences: 15,221 Examples: * 指导科: [ʈ ʂ ʐ ̩ ˨ ˩ ˦ t a w ˨ ˩ ˦ k ʰ o ˥ ˥] * 松江市: [s u ˥ ˥ ŋ t ɕ j a ˥ ˥ ŋ ʂ ʐ ̩ ˥ ˩] * 麻醉师: [m a ˧ ˥ t s w e j ˥ ˩ ʂ ʐ ̩ ˥ ˥] * 蒂尔施内克: [t i ˥ ˩ ʔ o ˨ ˩ ˦ ɻ ʂ ʐ ̩ ˥ ˥ n e j ˥ ˩ k ʰ o ˥ ˩] |
Occurrences: 23,534 Examples: * 公家庙: [k u ˥ ˥ ŋ t ɕ j a ˥ ˥ m j a w ˥ ˩] * 最低价: [t s w e j ˥ ˩ t i ˥ ˥ t ɕ j a ˥ ˩] * 接电话: [t ɕ j e ˥ ˥ t j e ˥ ˩ n x w a ˥ ˩] * 勒谢拉尔: [l o ˥ ˩ ɕ j e ˥ ˩ l a ˥ ˥ ʔ o ˨ ˩ ˦] |
||||
Fricative |
Occurrences: 12,061 Examples: * 犯错误: [f a ˥ ˩ n t s ʰ w o ˥ ˩ u ˥ ˩] * 阿尔法: [ʔ a ˥ ˥ ʔ o ˨ ˩ ˦ ɻ f a ˨ ˩ ˦] * 三八妇女节: [s a ˥ ˥ n p a ˥ ˥ f u ˥ ˩ n y ˨ ˩ ˦ t ɕ j e ˧ ˥] * 翻跟头: [f a ˥ ˥ n k ə ˥ ˥ n t ʰ o w ˨] |
||||||
Approximant |
Occurrences: 47,672 Examples: * 反作用力: [f a ˨ ˩ ˦ n t s w o ˥ ˩ j u ˥ ˩ ŋ l i ˥ ˩] * 循声望去: [ɕ y ˧ ˥ n ʂ o ˥ ˥ ŋ w a ˥ ˩ ŋ t ɕ ʰ y ˥ ˩] * 欧洲人: [ʔ o w ˥ ˥ ʈ ʂ o w ˥ ˥ ʐ ə ˧ ˥ n] * 有法可依: [j o w ˨ ˩ ˦ f a ˨ ˩ ˦ k ʰ o ˨ ˩ ˦ i ˥ ˥] |
Occurrences: 4,556 Examples: * 齐齐哈尔市: [t ɕ ʰ i ˧ ˥ t ɕ ʰ i ˧ ˥ x a ˥ ˥ ʔ o ˨ ˩ ˦ ɻ ʂ ʐ ̩ ˥ ˩] * 苏德尔堡: [s u ˥ ˥ t o ˧ ˥ ʔ o ˨ ˩ ˦ ɻ p a w ˨ ˩ ˦] * 埃尔南德斯: [ʔ a j ˥ ˥ ʔ o ˨ ˩ ˦ ɻ n a ˧ ˥ n t o ˧ ˥ s z ̩ ˥ ˥] * 奥费尔格: [ʔ a w ˥ ˩ f e j ˥ ˩ ʔ o ˨ ˩ ˦ ɻ k o ˧ ˥] |
Occurrences: 56,334 Examples: * 维修点: [w e j ˧ ˥ ɕ j o w ˥ ˥ t j e ˨ ˩ ˦ n] * 请假条: [t ɕ ʰ i ˨ ˩ ˦ ŋ t ɕ j a ˥ ˩ t ʰ j a w ˧ ˥] * 黑石头: [x e j ˥ ˥ ʂ ʐ ̩ ˧ ˥ t ʰ o w ˨] * 黄嘌呤: [x w a ˧ ˥ ŋ p ʰ j a w ˥ ˩ l i ˥ ˩ ŋ] Occurrences: 8,518 Examples: * 确认书: [t ɕ ʰ ɥ e ˥ ˩ ʐ ə ˥ ˩ n ʂ u ˥ ˥] * 商学院: [ʂ a ˥ ˥ ŋ ɕ ɥ e ˧ ˥ ɥ e ˥ ˩ n] * 齐宣王: [t ɕ ʰ i ˧ ˥ ɕ ɥ e ˥ ˥ n w a ˧ ˥ ŋ] * 翰林院: [x a ˥ ˩ n l i ˧ ˥ n ɥ e ˥ ˩ n] |
||||
Lateral |
Occurrences: 27,138 Examples: * 留守司: [l j o w ˧ ˥ ʂ o w ˨ ˩ ˦ s z ̩ ˥ ˥] * 职业联赛: [ʈ ʂ ʐ ̩ ˧ ˥ j e ˥ ˩ l j e ˧ ˥ n s a j ˥ ˩] * 流浮山: [l j o w ˧ ˥ f u ˧ ˥ ʂ a ˥ ˥ n] * 芦草坡: [l u ˧ ˥ t s ʰ a w ˨ ˩ ˦ p ʰ w o ˥ ˥] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 58,970 Examples: * 请说出: [t ɕ ʰ i ˨ ˩ ˦ ŋ ʂ w o ˥ ˥ ʈ ʂ ʰ u ˥ ˥] * 热土地: [ʐ o ˥ ˩ t ʰ u ˨ ˩ ˦ t i ˦] * 记不起来: [t ɕ i ˥ ˩ p u ˥ ˩ t ɕ ʰ i ˩ l a j ˩] * 西兰岛: [ɕ i ˥ ˥ l a ˧ ˥ n t a w ˨ ˩ ˦] Occurrences: 11,975 Examples: * 选举史: [ɕ ɥ e ˨ ˩ ˦ n t ɕ y ˨ ˩ ˦ ʂ ʐ ̩ ˨ ˩ ˦] * 猴屿乡: [x o w ˧ ˥ y ˨ ˩ ˦ ɕ j a ˥ ˥ ŋ] * 军事法院: [t ɕ y ˥ ˥ n ʂ ʐ ̩ ˥ ˩ f a ˨ ˩ ˦ ɥ e ˥ ˩ n] * 外甥女婿: [w a j ˥ ˩ ʂ o ˩ ŋ n y ˨ ˩ ˦ ɕ y ˦] |
Occurrences: 46,526 Examples: * 五坡岭: [u ˨ ˩ ˦ p ʰ w o ˥ ˥ l i ˨ ˩ ˦ ŋ] * 水浒传: [ʂ w e j ˨ ˩ ˦ x u ˨ ˩ ˦ ʈ ʂ w a ˥ ˩ n] * 小拇指: [ɕ j a w ˨ ˩ ˦ m u ˩ ʈ ʂ ʐ ̩ ˨ ˩ ˦] * 有恃无恐: [j o w ˨ ˩ ˦ ʂ ʐ ̩ ˥ ˩ u ˧ ˥ k ʰ u ˨ ˩ ˦ ŋ] |
|||
Close-Mid |
Occurrences: 33,380 Examples: * 老爷子: [l a w ˨ ˩ ˦ j e ˦ t s z ̩ ˨ ˩ ˦] * 佳兆业: [t ɕ j a ˥ ˥ ʈ ʂ a w ˥ ˩ j e ˥ ˩] * 墨竹工卡县: [m w o ˥ ˩ ʈ ʂ u ˧ ˥ k u ˥ ˥ ŋ k ʰ a ˨ ˩ ˦ ɕ j e ˥ ˩ n] * 犬儒主义: [t ɕ ʰ ɥ e ˨ ˩ ˦ n ʐ u ˧ ˥ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩] Occurrences: 17,454 Examples: * 悬壅垂: [ɕ ɥ e ˧ ˥ n j u ˥ ˥ ŋ ʈ ʂ ʰ w e j ˧ ˥] * 哥哥妹妹: [k o ˥ ˥ k o ˥ ˥ m e j ˥ ˩ m e j ˩] * 大内氏: [t a ˥ ˩ n e j ˥ ˩ ʂ ʐ ̩ ˥ ˩] * 维也纳: [w e j ˧ ˥ j e ˨ ˩ ˦ n a ˥ ˩] |
Occurrences: 47,652 Examples: * 约定俗成: [ɥ e ˥ ˥ t i ˥ ˩ ŋ s u ˧ ˥ ʈ ʂ ʰ o ˧ ˥ ŋ] * 没见过世面: [m e j ˧ ˥ t ɕ j e ˥ ˩ n k w o ˩ ʂ ʐ ̩ ˥ ˩ m j e ˥ ˩ n] * 咬耳朵: [j a w ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ t w o ˦] * 怕老婆: [p ʰ a ˥ ˩ l a w ˨ ˩ ˦ p ʰ w o ˦] Occurrences: 15,663 Examples: * 刘家良: [l j o w ˧ ˥ t ɕ j a ˥ ˥ l j a ˧ ˥ ŋ] * 后头湾: [x o w ˥ ˩ t ʰ o w ˩ w a ˥ ˥ n] * 木头村: [m u ˥ ˩ t ʰ o w ˩ t s ʰ w ə ˥ ˥ n] * 熘冰场: [l j o w ˥ ˥ p i ˥ ˥ ŋ ʈ ʂ ʰ a ˧ ˥ ŋ] |
|||
Occurrences: 15,745 Examples: * 洛神花: [l w o ˥ ˩ ʂ ə ˧ ˥ n x w a ˥ ˥] * 志愿者们: [ʈ ʂ ʐ ̩ ˥ ˩ ɥ e ˥ ˩ n ʈ ʂ o ˨ ˩ ˦ m ə ˦ n] * 博士论文: [p w o ˧ ˥ ʂ ʐ ̩ ˥ ˩ l w ə ˥ ˩ n w ə ˧ ˥ n] * 查本机: [ʈ ʂ ʰ a ˧ ˥ p ə ˨ ˩ ˦ n t ɕ i ˥ ˥] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 75,953 Examples: * 早上好: [t s a w ˨ ˩ ˦ ʂ a ˦ ŋ x a w ˨ ˩ ˦] * 空间站: [k ʰ u ˥ ˥ ŋ t ɕ j e ˥ ˥ n ʈ ʂ a ˥ ˩ n] * 哪里话: [n a ˨ ˩ ˦ l i ˨ ˩ ˦ x w a ˥ ˩] * 埃托瓦县: [ʔ a j ˥ ˥ t ʰ w o ˥ ˥ w a ˨ ˩ ˦ ɕ j e ˥ ˩ n] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩