Mandarin (Taiwan) MFA dictionary v2.0.0#
@techreport{mfa_mandarin_taiwan_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (Taiwan) MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_taiwan_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 7,105 Examples: * 石決明: [ʂ ʐ̩˧˥ tɕ ɥ e˧˥ m i˧˥ ŋ] * 龍彌你: [l u˧˥ ŋ m i˧˥ n i˨˩˦] * 變體假名: [p j e˥˩ n tʰ i˨˩˦ tɕ j a˨˩˦ m i˧˥ ŋ] * 流星馬: [l j ow˧˥ ɕ i˥˥ ŋ m a˨˩˦] |
Occurrences: 39,160 Examples: * 烏剋蘭語: [u˥˥ kʰ o˥˩ l a˧˥ n y˨˩˦] * 陰莖骨: [i˥˥ n tɕ i˥˥ ŋ k u˨˩˦] * 刺五加片: [tsʰ z̩˥˩ u˨˩˦ tɕ j a˥˥ pʰ j e˥˩ n] * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] |
Occurrences: 30,436 Examples: * 石決明: [ʂ ʐ̩˧˥ tɕ ɥ e˧˥ m i˧˥ ŋ] * 陰莖骨: [i˥˥ n tɕ i˥˥ ŋ k u˨˩˦] * 豆腐渣工程: [t ow˥˩ f u˩ ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ] * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 9,354 Examples: * 不會吧: [p u˥˩ x w ej˥˩ p a˩] * 杜伊斯堡: [t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦] * 熊本熊: [ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ] * 白雲岩: [p aj˧˥ y˧˥ n j e˧˥ n] |
Occurrences: 11,030 Examples: * 豆腐渣工程: [t ow˥˩ f u˩ ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ] * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] * 杜伊斯堡: [t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦] |
Occurrences: 8,160 Examples: * 陰莖骨: [i˥˥ n tɕ i˥˥ ŋ k u˨˩˦] * 豆腐渣工程: [t ow˥˩ f u˩ ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ] * 錫林郭勒: [ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩] * 工聯會: [k u˥˥ ŋ l j e˧˥ n x w ej˥˩] |
Occurrences: 3,497 Examples: * 富拉爾基: [f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ tɕ i˥˥] * 阿羅漢: [ʔ a˥˥ l w o˧˥ x a˥˩ n] * 卡爾加裏: [kʰ a˨˩˦ ʔ o˨˩˦ ɻ tɕ j a˥˥ l i˨˩˦] * 巴彥洪戈爾: [p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ] |
|||
Affricate |
Occurrences: 5,010 Examples: * 天龍座: [tʰ j e˥˥ n l u˧˥ ŋ ts w o˥˩] * 連字號: [l j e˧˥ n ts z̩˥˩ x aw˥˩] * 操作者: [tsʰ aw˥˥ ts w o˥˩ ʈʂ o˨˩˦] * 女真子: [n y˨˩˦ ʈʂ ə˥˥ n ts z̩˨˩˦] |
Occurrences: 9,431 Examples: * 豆腐渣工程: [t ow˥˩ f u˩ ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ] * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] * 癥候群: [ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n] * 互質數: [x u˥˩ ʈʂ ʐ̩˥˩ ʂ u˥˩] |
Occurrences: 13,254 Examples: * 石決明: [ʂ ʐ̩˧˥ tɕ ɥ e˧˥ m i˧˥ ŋ] * 陰莖骨: [i˥˥ n tɕ i˥˥ ŋ k u˨˩˦] * 刺五加片: [tsʰ z̩˥˩ u˨˩˦ tɕ j a˥˥ pʰ j e˥˩ n] * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] |
||||
Sibilant |
Occurrences: 4,814 Examples: * 杜伊斯堡: [t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦] * 四方形: [s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ] * 孫王營: [s w ə˥˥ n w a˧˥ ŋ i˧˥ ŋ] * 維納斯: [w ej˧˥ n a˥˩ s z̩˥˥] Occurrences: 4,182 Examples: * 龜茲文: [tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n] * 杜伊斯堡: [t u˥˩ i˥˥ s z̩˥˥ p aw˨˩˦] * 紮猛子: [ʈʂ a˥˥ m o˨˩˦ ŋ ts z̩˦] * 名詞短語: [m i˧˥ ŋ tsʰ z̩˧˥ t w a˨˩˦ n y˨˩˦] |
Occurrences: 12,045 Examples: * 石決明: [ʂ ʐ̩˧˥ tɕ ɥ e˧˥ m i˧˥ ŋ] * 半金屬: [p a˥˩ n tɕ i˥˥ n ʂ u˨˩˦] * 互質數: [x u˥˩ ʈʂ ʐ̩˥˩ ʂ u˥˩] * 臨時工: [l i˧˥ n ʂ ʐ̩˧˥ k u˥˥ ŋ] Occurrences: 2,738 Examples: * 冰島人: [p i˥˥ ŋ t aw˨˩˦ ʐ ə˧˥ n] * 視乳頭水腫: [ʂ ʐ̩˥˩ ʐ u˨˩˦ tʰ ow˧˥ ʂ w ej˨˩˦ ʈʂ u˨˩˦ ŋ] * 撒瑪黎雅人: [s a˥˥ m a˨˩˦ l i˧˥ j a˨˩˦ ʐ ə˧˥ n] * 小綠人: [ɕ j aw˨˩˦ l y˥˩ ʐ ə˧˥ n] Occurrences: 7,056 Examples: * 結婚戒指: [tɕ j e˧˥ x w ə˥˥ n tɕ j e˥˩ ʈʂ ʐ̩˩] * 計算尺: [tɕ i˥˩ s w a˥˩ n ʈʂʰ ʐ̩˨˩˦] * 紙飛機: [ʈʂ ʐ̩˨˩˦ f ej˥˥ tɕ i˥˥] * 絲織品: [s z̩˥˥ ʈʂ ʐ̩˥˥ pʰ i˨˩˦ n] |
Occurrences: 11,432 Examples: * 下午好: [ɕ j a˥˩ u˨˩˦ x aw˨˩˦] * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] * 錫林郭勒: [ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩] * 熊本熊: [ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ] |
||||
Fricative |
Occurrences: 5,888 Examples: * 豆腐渣工程: [t ow˥˩ f u˩ ʈʂ a˥˥ k u˥˥ ŋ ʈʂʰ o˧˥ ŋ] * 富拉爾基: [f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ tɕ i˥˥] * 四方形: [s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ] * 德納裏峰: [t o˧˥ n a˥˩ l i˨˩˦ f o˥˥ ŋ] |
||||||
Approximant |
Occurrences: 22,265 Examples: * 龜茲文: [tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n] * 不會吧: [p u˥˩ x w ej˥˩ p a˩] * 錫林郭勒: [ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩] * 工聯會: [k u˥˥ ŋ l j e˧˥ n x w ej˥˩] |
Occurrences: 1,414 Examples: * 富拉爾基: [f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ tɕ i˥˥] * 卡爾加裏: [kʰ a˨˩˦ ʔ o˨˩˦ ɻ tɕ j a˥˥ l i˨˩˦] * 巴彥洪戈爾: [p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ] * 外甥女兒: [w aj˥˩ ʂ o˩ ŋ n y˨˩˦ ɻ] |
Occurrences: 27,399 Examples: * 下午好: [ɕ j a˥˩ u˨˩˦ x aw˨˩˦] * 刺五加片: [tsʰ z̩˥˩ u˨˩˦ tɕ j a˥˥ pʰ j e˥˩ n] * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] Occurrences: 4,285 Examples: * 石決明: [ʂ ʐ̩˧˥ tɕ ɥ e˧˥ m i˧˥ ŋ] * 參議員: [tsʰ a˥˥ n i˥˩ ɥ e˧˥ n] * 半月線: [p a˥˩ n ɥ e˥˩ ɕ j e˥˩ n] * 上野原: [ʂ a˥˩ ŋ j e˨˩˦ ɥ e˧˥ n] |
||||
Lateral |
Occurrences: 11,655 Examples: * 烏剋蘭語: [u˥˥ kʰ o˥˩ l a˧˥ n y˨˩˦] * 富拉爾基: [f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ tɕ i˥˥] * 錫林郭勒: [ɕ i˥˥ l i˧˥ n k w o˥˥ l o˥˩] * 工聯會: [k u˥˥ ŋ l j e˧˥ n x w ej˥˩] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 28,680 Examples: * 四方形: [s z̩˥˩ f a˥˥ ŋ ɕ i˧˥ ŋ] * 不客氣: [p u˥˩ kʰ o˥˩ tɕʰ i˩] * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] * 富拉爾基: [f u˥˩ l a˥˥ ʔ o˨˩˦ ɻ tɕ i˥˥] Occurrences: 5,772 Examples: * 愛玉子: [ʔ aj˥˩ y˥˩ ts z̩˨˩˦] * 林堡語: [l i˧˥ n p aw˨˩˦ y˨˩˦] * 癥候群: [ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n] * 老閨女: [l aw˨˩˦ k w ej˥˥ n y˨] |
Occurrences: 22,611 Examples: * 熊本熊: [ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ] * 豆腐乳: [t ow˥˩ f u˩ ʐ u˨˩˦] * 瞧不起: [tɕʰ j aw˧˥ p u˨ tɕʰ i˨˩˦] * 下午好: [ɕ j a˥˩ u˨˩˦ x aw˨˩˦] |
|||
Close-Mid |
Occurrences: 16,156 Examples: * 電正性: [t j e˥˩ n ʈʂ o˥˩ ŋ ɕ i˥˩ ŋ] * 參議員: [tsʰ a˥˥ n i˥˩ ɥ e˧˥ n] * 上野原: [ʂ a˥˩ ŋ j e˨˩˦ ɥ e˧˥ n] * 大少爺: [t a˥˩ ʂ aw˥˩ j e˩] Occurrences: 7,724 Examples: * 毀滅性: [x w ej˨˩˦ m j e˥˩ ɕ i˥˩ ŋ] * 電吹風: [t j e˥˩ n ʈʂʰ w ej˥˥ f o˥˥ ŋ] * 氣纍脖兒: [tɕʰ i˥˩ l ej˩ p w o˧˥ ɻ] * 迴迴教: [x w ej˧˥ x w ej˨ tɕ j aw˥˩] |
Occurrences: 19,882 Examples: * 巴彥洪戈爾: [p a˥˥ j e˥˩ n x u˧˥ ŋ k o˥˥ ʔ o˨˩˦ ɻ] * 外甥媳婦: [w aj˥˩ ʂ o˩ ŋ ɕ i˧˥ f u˥˩] * 夫妻老婆店: [f u˥˥ tɕʰ i˥˥ l aw˨˩˦ pʰ w o˦ t j e˥˩ n] * 老太婆: [l aw˨˩˦ tʰ aj˥˩ pʰ w o˧˥] Occurrences: 7,947 Examples: * 癥候群: [ʈʂ o˥˩ ŋ x ow˥˩ tɕʰ y˧˥ n] * 小時候: [ɕ j aw˨˩˦ ʂ ʐ̩˧˥ x ow˨] * 龜茲文: [tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n] * 流星馬: [l j ow˧˥ ɕ i˥˥ ŋ m a˨˩˦] |
|||
Occurrences: 7,328 Examples: * 孫王營: [s w ə˥˥ n w a˧˥ ŋ i˧˥ ŋ] * 德纍斯頓: [t o˧˥ l ej˨˩˦ s z̩˥˥ t w ə˥˩ n] * 熊本熊: [ɕ j u˧˥ ŋ p ə˨˩˦ n ɕ j u˧˥ ŋ] * 龜茲文: [tɕʰ j ow˥˥ tsʰ z̩˧˥ w ə˧˥ n] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 36,127 Examples: * 大鍵琴: [t a˥˩ tɕ j e˥˩ n tɕʰ i˧˥ n] * 南威島: [n a˧˥ n w ej˥˥ t aw˨˩˦] * 月亮湖: [ɥ e˥˩ l j a˩ ŋ x u˧˥] * 刺五加片: [tsʰ z̩˥˩ u˨˩˦ tɕ j a˥˥ pʰ j e˥˩ n] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩