Mandarin (Erhua) MFA dictionary v2.0.0a#
@techreport{mfa_mandarin_erhua_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (Erhua) MFA dictionary v2.0.0a},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Erhua) MFA dictionary v2_0_0a.html}},
year={2022},
month={May},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_erhua_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 7,250 Examples: * 语言联盟: [y ˨ ˩ ˦ j e ˧ ˥ n l j e ˧ ˥ n m o ˧ ˥ ŋ] * 卖猪仔: [m a j ˥ ˩ ʈ ʂ u ˥ ˥ t s a j ˨ ˩ ˦] * 母乳餵养: [m u ˨ ˩ ˦ ʐ u ˨ ˩ ˦ w e j ˥ ˩ j a ˨ ˩ ˦ ŋ] * 脑膜炎: [n a w ˨ ˩ ˦ m w o ˧ ˥ ɻ j e ˧ ˥ n] |
Occurrences: 38,951 Examples: * 投资人: [t ʰ o w ˧ ˥ t s z ̩ ˥ ˥ ʐ ə ˧ ˥ n] * 心眼儿: [ɕ i ˥ ˥ n j a ˨ ˩ ˦ ɻ] * 万事得: [w a ˥ ˩ n ʂ ʐ ̩ ˥ ˩ t o ˧ ˥] * 天成四年: [t ʰ j e ˥ ˥ n ʈ ʂ ʰ o ˧ ˥ ŋ s z ̩ ˥ ˩ n j e ˧ ˥ n] |
Occurrences: 30,203 Examples: * 家庭暴力: [t ɕ j a ˥ ˥ t ʰ i ˧ ˥ ŋ p a w ˥ ˩ l i ˥ ˩] * 共时语言学: [k u ˥ ˩ ŋ ʂ ʐ ̩ ˧ ˥ y ˨ ˩ ˦ j e ˧ ˥ n ɕ ɥ e ˧ ˥] * 生成元: [ʂ o ˥ ˥ ŋ ʈ ʂ ʰ o ˧ ˥ ŋ ɥ e ˧ ˥ n] * 盲肠炎: [m a ˧ ˥ ŋ ʈ ʂ ʰ a ˧ ˥ ŋ j e ˧ ˥ n] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 9,456 Examples: * 反对派: [f a ˨ ˩ ˦ n t w e j ˥ ˩ p ʰ a j ˥ ˩] * 癞皮狗: [l a j ˥ ˩ p ʰ i ˧ ˥ k o w ˨ ˩ ˦] * 社会性别: [ʂ o ˥ ˩ x w e j ˥ ˩ ɕ i ˥ ˩ ŋ p j e ˧ ˥] * 阿布达比: [ʔ a ˥ ˥ p u ˥ ˩ t a ˧ ˥ p i ˨ ˩ ˦] |
Occurrences: 11,104 Examples: * 我的世界: [w o ˨ ˩ ˦ t ə ˦ ʂ ʐ ̩ ˥ ˩ t ɕ j e ˥ ˩] * 首都丘: [ʂ o w ˨ ˩ ˦ t u ˥ ˥ t ɕ ʰ j o w ˥ ˥] * 基础设施: [t ɕ i ˥ ˥ ʈ ʂ ʰ u ˨ ˩ ˦ ʂ o ˥ ˩ ʂ ʐ ̩ ˥ ˥] * 市场价值: [ʂ ʐ ̩ ˥ ˩ ʈ ʂ ʰ a ˧ ˥ ŋ t ɕ j a ˥ ˩ ʈ ʂ ʐ ̩ ˧ ˥] |
Occurrences: 8,230 Examples: * 可靠性: [k ʰ o ˨ ˩ ˦ k ʰ a w ˥ ˩ ɕ i ˥ ˩ ŋ] * 古气候学家: [k u ˨ ˩ ˦ t ɕ ʰ i ˥ ˩ x o w ˥ ˩ ɕ ɥ e ˧ ˥ t ɕ j a ˥ ˥] * 红圪垯: [x u ˧ ˥ ŋ k o ˥ ˥ t a ˨] * 考纳斯: [k ʰ a w ˨ ˩ ˦ n a ˥ ˩ s z ̩ ˥ ˥] |
Occurrences: 3,528 Examples: * 艾依河: [ʔ a j ˥ ˩ i ˥ ˥ x o ˧ ˥] * 奥林匹斯山: [ʔ a w ˥ ˩ l i ˧ ˥ n p ʰ i ˨ ˩ ˦ s z ̩ ˥ ˥ ʂ a ˥ ˥ n] * 爱国主义: [ʔ a j ˥ ˩ k w o ˧ ˥ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩] * 卡尔瓦多斯: [k ʰ a ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ w a ˨ ˩ ˦ t w o ˥ ˥ s z ̩ ˥ ˥] |
|||
Affricate |
Occurrences: 4,901 Examples: * 最高峰: [t s w e j ˥ ˩ k a w ˥ ˥ f o ˥ ˥ ŋ] * 除此以外: [ʈ ʂ ʰ u ˧ ˥ t s ʰ z ̩ ˨ ˩ ˦ i ˨ ˩ ˦ w a j ˥ ˩] * 两下子: [l j a ˨ ˩ ˦ ŋ ɕ j a ˥ ˩ t s z ̩ ˩] * 鸭跖草: [j a ˥ ˥ ʈ ʂ ʐ ̩ ˧ ˥ t s ʰ a w ˨ ˩ ˦] |
Occurrences: 9,318 Examples: * 搭帐篷: [t a ˥ ˥ ʈ ʂ a ˥ ˩ ŋ p ʰ o ˧ ˥ ŋ] * 常态化: [ʈ ʂ ʰ a ˧ ˥ ŋ t ʰ a j ˥ ˩ x w a ˥ ˩] * 和平共处: [x o ˧ ˥ p ʰ i ˧ ˥ ŋ k u ˥ ˩ ŋ ʈ ʂ ʰ u ˨ ˩ ˦] * 等差中项: [t o ˨ ˩ ˦ ŋ ʈ ʂ ʰ a ˥ ˩ ʈ ʂ u ˥ ˥ ŋ ɕ j a ˥ ˩ ŋ] |
Occurrences: 13,180 Examples: * 上加成素: [ʂ a ˥ ˩ ŋ t ɕ j a ˥ ˥ ʈ ʂ ʰ o ˧ ˥ ŋ s u ˥ ˩] * 万叶集: [w a ˥ ˩ n j e ˥ ˩ t ɕ i ˧ ˥] * 强巴拉康: [t ɕ ʰ j a ˧ ˥ ŋ p a ˥ ˥ l a ˥ ˥ k ʰ a ˥ ˥ ŋ] * 均州路: [t ɕ y ˥ ˥ n ʈ ʂ o w ˥ ˥ l u ˥ ˩] |
||||
Sibilant |
Occurrences: 4,858 Examples: * 大难不死: [t a ˥ ˩ n a ˥ ˩ n p u ˥ ˩ s z ̩ ˨ ˩ ˦] * 侍从官: [ʂ ʐ ̩ ˥ ˩ t s ʰ u ˧ ˥ ŋ k w a ˥ ˥ n] * 犯罪分子: [f a ˥ ˩ n t s w e j ˥ ˩ f ə ˥ ˩ n t s z ̩ ˨ ˩ ˦] * 藤井寺: [t ʰ o ˧ ˥ ŋ t ɕ i ˨ ˩ ˦ ŋ s z ̩ ˥ ˩] Occurrences: 0 Examples: * 人称代名词: [ʐ ə ˧ ˥ n ʈ ʂ ʰ o ˥ ˥ ŋ t a j ˥ ˩ m i ˧ ˥ ŋ t s ʰ z ̩ ˧ ˥] * 话匣子: [x w a ˥ ˩ ɕ j a ˧ ˥ t s z ̩ ˨] * 自我控制: [t s z ̩ ˥ ˩ w o ˨ ˩ ˦ k ʰ u ˥ ˩ ŋ ʈ ʂ ʐ ̩ ˥ ˩] * 姊妹城市: [t s z ̩ ˨ ˩ ˦ m e j ˥ ˩ ʈ ʂ ʰ o ˧ ˥ ŋ ʂ ʐ ̩ ˥ ˩] Occurrences: 4,177 Examples: * 孔子学院: [k ʰ u ˨ ˩ ˦ ŋ t s z ̩ ˨ ˩ ˦ ɕ ɥ e ˧ ˥ ɥ e ˥ ˩ n] * 堂姊妹: [t ʰ a ˧ ˥ ŋ t s z ̩ ˨ ˩ ˦ m e j ˥ ˩] * 构词学: [k o w ˥ ˩ t s ʰ z ̩ ˧ ˥ ɕ ɥ e ˧ ˥] * 死亡人数: [s z ̩ ˨ ˩ ˦ w a ˧ ˥ ŋ ʐ ə ˧ ˥ n ʂ u ˥ ˩] |
Occurrences: 12,125 Examples: * 老老实实: [l a w ˨ ˩ ˦ l a w ˨ ˩ ˦ ʂ ʐ ̩ ˧ ˥ ʂ ʐ ̩ ˧ ˥] * 马克思主义: [m a ˨ ˩ ˦ k ʰ o ˥ ˩ s z ̩ ˥ ˥ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩] * 圣劳伦斯河: [ʂ o ˥ ˩ ŋ l a w ˧ ˥ l w ə ˧ ˥ n s z ̩ ˥ ˥ x o ˧ ˥] * 禁忌证: [t ɕ i ˥ ˩ n t ɕ i ˥ ˩ ʈ ʂ o ˥ ˩ ŋ] Occurrences: 2,735 Examples: * 採石场: [t s ʰ a j ˨ ˩ ˦ ʂ ʐ ̩ ˧ ˥ ʈ ʂ ʰ a ˧ ˥ ŋ] * 烧乳猪: [ʂ a w ˥ ˥ ʐ u ˨ ˩ ˦ ʈ ʂ u ˥ ˥] * 小布什: [ɕ j a w ˨ ˩ ˦ p u ˥ ˩ ʂ ʐ ̩ ˧ ˥] * 制作人: [ʈ ʂ ʐ ̩ ˥ ˩ t s w o ˥ ˩ ʐ ə ˧ ˥ n] Occurrences: 6,796 Examples: * 蒋风之: [t ɕ j a ˨ ˩ ˦ ŋ f o ˥ ˥ ŋ ʈ ʂ ʐ ̩ ˥ ˥] * 不值得: [p u ˥ ˩ ʈ ʂ ʐ ̩ ˧ ˥ t o ˧ ˥] * 蜘蛛网: [ʈ ʂ ʐ ̩ ˥ ˥ ʈ ʂ u ˥ ˥ w a ˨ ˩ ˦ ŋ] * 捕蝇纸: [p u ˨ ˩ ˦ i ˧ ˥ ŋ ʈ ʂ ʐ ̩ ˨ ˩ ˦] |
Occurrences: 11,084 Examples: * 相模原: [ɕ j a ˥ ˩ ŋ m w o ˧ ˥ ɥ e ˧ ˥ n] * 无条件: [u ˧ ˥ t ʰ j a w ˧ ˥ t ɕ j e ˥ ˩ n] * 人均收入: [ʐ ə ˧ ˥ n t ɕ y ˥ ˥ n ʂ o w ˥ ˥ ʐ u ˥ ˩] * 小夜曲: [ɕ j a w ˨ ˩ ˦ j e ˥ ˩ t ɕ ʰ y ˥ ˥] |
||||
Fricative |
Occurrences: 5,861 Examples: * 良附丸: [l j a ˧ ˥ ŋ f u ˥ ˩ w a ˧ ˥ n] * 东非洲: [t u ˥ ˥ ŋ f e j ˥ ˥ ʈ ʂ o w ˥ ˥] * 发行人: [f a ˥ ˥ ɕ i ˧ ˥ ŋ ʐ ə ˧ ˥ n] * 麻婆豆腐: [m a ˧ ˥ p ʰ w o ˧ ˥ t o w ˥ ˩ f u ˩] |
||||||
Approximant |
Occurrences: 22,464 Examples: * 热化学卡: [ʐ o ˥ ˩ x w a ˥ ˩ ɕ ɥ e ˧ ˥ k ʰ a ˨ ˩ ˦] * 米粮川: [m i ˨ ˩ ˦ l j a ˧ ˥ ŋ ʈ ʂ ʰ w a ˥ ˥ n] * 烤肉串: [k ʰ a w ˨ ˩ ˦ ʐ o w ˥ ˩ ʈ ʂ ʰ w a ˥ ˩ n] * 生煎馒头: [ʂ o ˥ ˥ ŋ t ɕ j e ˥ ˥ n m a ˧ ˥ n t ʰ o w ˨] |
Occurrences: 3,789 Examples: * 足球杯赛: [t s u ˧ ˥ t ɕ ʰ j o w ˧ ˥ ɻ p e j ˥ ˥ s a j ˥ ˩] * 阿穆尔河: [ʔ a ˥ ˥ m u ˥ ˩ ʔ o ˨ ˩ ˦ ɻ x o ˧ ˥] * 五十二: [u ˨ ˩ ˦ ʂ ʐ ̩ ˧ ˥ ʔ o ˥ ˩ ɻ] * 奥尔科特: [ʔ a w ˥ ˩ ʔ o ˨ ˩ ˦ ɻ k ʰ o ˥ ˥ t ʰ o ˥ ˩] |
Occurrences: 27,144 Examples: * 保险套: [p a w ˨ ˩ ˦ ɕ j e ˨ ˩ ˦ n t ʰ a w ˥ ˩] * 养蚕业: [j a ˨ ˩ ˦ ŋ t s ʰ a ˧ ˥ n j e ˥ ˩] * 科西嘉岛: [k ʰ o ˥ ˥ ɕ i ˥ ˥ t ɕ j a ˥ ˥ t a w ˨ ˩ ˦] * 烤宽面条: [k ʰ a w ˨ ˩ ˦ k ʰ w a ˥ ˥ n m j e ˥ ˩ n t ʰ j a w ˧ ˥] Occurrences: 4,281 Examples: * 伊朗高原: [i ˥ ˥ l a ˨ ˩ ˦ ŋ k a w ˥ ˥ ɥ e ˧ ˥ n] * 救护员: [t ɕ j o w ˥ ˩ x u ˥ ˩ ɥ e ˧ ˥ n] * 微量元素: [w e j ˥ ˥ l j a ˥ ˩ ŋ ɥ e ˧ ˥ n s u ˥ ˩] * 加利略: [t ɕ j a ˥ ˥ l i ˥ ˩ l ɥ e ˥ ˩] |
||||
Lateral |
Occurrences: 11,828 Examples: * 黄鼠狼: [x w a ˧ ˥ ŋ ʂ u ˨ ˩ ˦ l a ˧ ˥ ŋ] * 失业率: [ʂ ʐ ̩ ˥ ˥ j e ˥ ˩ l y ˥ ˩] * 千里光: [t ɕ ʰ j e ˥ ˥ n l i ˨ ˩ ˦ k w a ˥ ˥ ŋ] * 十八罗汉: [ʂ ʐ ̩ ˧ ˥ p a ˥ ˥ l w o ˧ ˥ x a ˥ ˩ n] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 28,118 Examples: * 势利眼: [ʂ ʐ ̩ ˥ ˩ l i ˩ j a ˨ ˩ ˦ ɻ] * 比较等级: [p i ˨ ˩ ˦ t ɕ j a w ˥ ˩ t o ˨ ˩ ˦ ŋ t ɕ i ˧ ˥] * 摇滚音乐: [j a w ˧ ˥ k w ə ˨ ˩ ˦ n i ˥ ˥ n ɥ e ˥ ˩] * 威化饼: [w e j ˥ ˥ x w a ˥ ˩ p i ˨ ˩ ˦ ŋ] Occurrences: 5,739 Examples: * 底特律: [t i ˨ ˩ ˦ t ʰ o ˥ ˩ l y ˥ ˩] * 康巴语: [k ʰ a ˥ ˥ ŋ p a ˥ ˥ y ˨ ˩ ˦] * 外甥女婿: [w a j ˥ ˩ ʂ o ˩ ŋ n y ˨ ˩ ˦ ɕ y ˦] * 近古藏语: [t ɕ i ˥ ˩ n k u ˨ ˩ ˦ t s a ˥ ˩ ŋ y ˨ ˩ ˦] |
Occurrences: 22,404 Examples: * 上岁数: [ʂ a ˥ ˩ ŋ s w e j ˥ ˩ ʂ u ˩] * 易卜拉欣: [i ˥ ˩ p u ˨ ˩ ˦ l a ˥ ˥ ɕ i ˥ ˥ n] * 肉毒桿菌: [ʐ o w ˥ ˩ t u ˧ ˥ k a ˨ ˩ ˦ n t ɕ y ˥ ˥ n] * 重金属: [ʈ ʂ u ˥ ˩ ŋ t ɕ i ˥ ˥ n ʂ u ˨ ˩ ˦] |
|||
Close-Mid |
Occurrences: 15,850 Examples: * 歇斯底里: [ɕ j e ˥ ˥ s z ̩ ˥ ˥ t i ˨ ˩ ˦ l i ˨ ˩ ˦] * 显微镜座: [ɕ j e ˨ ˩ ˦ n w e j ˥ ˥ t ɕ i ˥ ˩ ŋ t s w o ˥ ˩] * 挑战者深渊: [t ʰ j a w ˨ ˩ ˦ ʈ ʂ a ˥ ˩ n ʈ ʂ o ˨ ˩ ˦ ʂ ə ˥ ˥ n ɥ e ˥ ˥ n] * 大少爷: [t a ˥ ˩ ʂ a w ˥ ˩ j e ˩] Occurrences: 7,704 Examples: * 威斯康辛: [w e j ˥ ˥ s z ̩ ˥ ˥ k ʰ a ˥ ˥ ŋ ɕ i ˥ ˥ n] * 气累脖儿: [t ɕ ʰ i ˥ ˩ l e j ˩ p w o ˧ ˥ ɻ] * 老妹妹: [l a w ˨ ˩ ˦ m e j ˥ ˩ m e j ˩] * 向日葵油: [ɕ j a ˥ ˩ ŋ ʐ ̩ ˥ ˩ k ʰ w e j ˧ ˥ j o w ˧ ˥] |
Occurrences: 20,274 Examples: * 以珥月: [i ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ ɥ e ˥ ˩] * 十字军东征: [ʂ ʐ ̩ ˧ ˥ t s z ̩ ˥ ˩ t ɕ y ˥ ˥ n t u ˥ ˥ ŋ ʈ ʂ o ˥ ˥ ŋ] * 成长率: [ʈ ʂ ʰ o ˧ ˥ ŋ ʈ ʂ a ˨ ˩ ˦ ŋ l y ˥ ˩] * 牛仔裤: [n j o w ˧ ˥ t s a j ˨ ˩ ˦ k ʰ u ˥ ˩] Occurrences: 8,064 Examples: * 枕头套: [ʈ ʂ ə ˨ ˩ ˦ n t ʰ o w ˦ t ʰ a w ˥ ˩] * 河西走廊: [x o ˧ ˥ ɕ i ˥ ˥ t s o w ˨ ˩ ˦ l a ˧ ˥ ŋ] * 星期六: [ɕ i ˥ ˥ ŋ t ɕ ʰ i ˥ ˥ l j o w ˥ ˩] * 虎头蜂: [x u ˨ ˩ ˦ t ʰ o w ˧ ˥ f o ˥ ˥ ŋ] |
|||
Occurrences: 7,558 Examples: * 人贩子: [ʐ ə ˧ ˥ n f a ˥ ˩ n t s z ̩ ˩] * 你们俩: [n i ˨ ˩ ˦ m ə ˦ n l j a ˨ ˩ ˦] * 大部份: [t a ˥ ˩ p u ˥ ˩ f ə ˥ ˩ n] * 黑加仑: [x e j ˥ ˥ t ɕ j a ˥ ˥ l w ə ˧ ˥ n] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 36,275 Examples: * 科西嘉岛: [k ʰ o ˥ ˥ ɕ i ˥ ˥ t ɕ j a ˥ ˥ t a w ˨ ˩ ˦] * 化粪池: [x w a ˥ ˩ f ə ˥ ˩ n ʈ ʂ ʰ ʐ ̩ ˧ ˥] * 嘴巴子: [t s w e j ˨ ˩ ˦ p a ˦ t s z ̩ ˩] * 哈巴狗: [x a ˥ ˩ p a ˩ k o w ˨ ˩ ˦] |
Diphthongs#
aj
aw
ej
io
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩