Mandarin (Taiwan) MFA dictionary v2.0.0a#
@techreport{mfa_mandarin_taiwan_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (Taiwan) MFA dictionary v2.0.0a},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v2_0_0a.html}},
year={2022},
month={May},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_taiwan_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 7,123 Examples: * 末班車: [m w o ˥ ˩ p a ˥ ˥ n ʈ ʂ ʰ o ˥ ˥] * 孟高棉語族: [m o ˥ ˩ ŋ k a w ˥ ˥ m j e ˧ ˥ n y ˨ ˩ ˦ t s u ˧ ˥] * 默突捨拉: [m w o ˥ ˩ t ʰ u ˥ ˥ ʂ o ˥ ˩ l a ˥ ˥] * 摩羯座: [m w o ˧ ˥ t ɕ j e ˧ ˥ t s w o ˥ ˩] |
Occurrences: 39,355 Examples: * 討價還價: [t ʰ a w ˨ ˩ ˦ t ɕ j a ˥ ˩ x w a ˧ ˥ n t ɕ j a ˥ ˩] * 密涅瓦: [m i ˥ ˩ n j e ˥ ˩ w a ˨ ˩ ˦] * 按乃近: [ʔ a ˥ ˩ n n a j ˨ ˩ ˦ t ɕ i ˥ ˩ n] * 排水管: [p ʰ a j ˧ ˥ ʂ w e j ˨ ˩ ˦ k w a ˨ ˩ ˦ n] |
Occurrences: 30,593 Examples: * 金融工程: [t ɕ i ˥ ˥ n ʐ u ˧ ˥ ŋ k u ˥ ˥ ŋ ʈ ʂ ʰ o ˧ ˥ ŋ] * 中子數: [ʈ ʂ u ˥ ˥ ŋ t s z ̩ ˨ ˩ ˦ ʂ u ˥ ˩] * 女大不中留: [n y ˨ ˩ ˦ t a ˥ ˩ p u ˥ ˩ ʈ ʂ u ˥ ˥ ŋ l j o w ˧ ˥] * 老人星: [l a w ˨ ˩ ˦ ʐ ə ˧ ˥ n ɕ i ˥ ˥ ŋ] Occurrences: 2 Examples: |
||||
Stop |
Occurrences: 9,394 Examples: * 領導班子: [l i ˨ ˩ ˦ ŋ t a w ˨ ˩ ˦ p a ˥ ˥ n t s z ̩ ˨] * 伊豆半島: [i ˥ ˥ t o w ˥ ˩ p a ˥ ˩ n t a w ˨ ˩ ˦] * 背靠背: [p e j ˥ ˩ k ʰ a w ˥ ˩ p e j ˥ ˩] * 被害人: [p e j ˥ ˩ x a j ˥ ˩ ʐ ə ˧ ˥ n] |
Occurrences: 11,105 Examples: * 撒迦利亞書: [s a ˥ ˥ t ɕ j a ˥ ˥ l i ˥ ˩ j a ˥ ˩ ʂ u ˥ ˥] * 不得不: [p u ˥ ˩ t o ˧ ˥ p u ˥ ˩] * 瓦根基: [w a ˨ ˩ ˦ k ə ˥ ˥ n t ɕ i ˥ ˥] * 羅盤座: [l w o ˧ ˥ p ʰ a ˧ ˥ n t s w o ˥ ˩] |
Occurrences: 8,190 Examples: * 格林多: [k o ˧ ˥ l i ˧ ˥ n t w o ˥ ˥] * 老閨女: [l a w ˨ ˩ ˦ k w e j ˥ ˥ n y ˨] * 尤剋裏裏: [j o w ˧ ˥ k ʰ o ˥ ˩ l i ˨ ˩ ˦ l i ˨ ˩ ˦] * 兵工廠: [p i ˥ ˥ ŋ k u ˥ ˥ ŋ ʈ ʂ ʰ a ˨ ˩ ˦ ŋ] |
Occurrences: 3,525 Examples: * 耳鼻喉科: [ʔ o ˨ ˩ ˦ ɻ p i ˧ ˥ x o w ˧ ˥ k ʰ o ˥ ˥] * 澳洲堅果: [ʔ a w ˥ ˩ ʈ ʂ o w ˥ ˥ t ɕ j e ˥ ˥ n k w o ˨ ˩ ˦] * 田二河: [t ʰ j e ˧ ˥ n ʔ o ˥ ˩ ɻ x o ˧ ˥] * 維吾爾語: [w e j ˧ ˥ u ˧ ˥ ʔ o ˨ ˩ ˦ ɻ y ˨ ˩ ˦] |
|||
Affricate |
Occurrences: 5,019 Examples: * 馬販子: [m a ˨ ˩ ˦ f a ˥ ˩ n t s z ̩ ˩] * 草根性: [t s ʰ a w ˨ ˩ ˦ k ə ˥ ˥ n ɕ i ˥ ˩ ŋ] * 藥材鋪: [j a w ˥ ˩ t s ʰ a j ˧ ˥ p ʰ u ˥ ˩] * 政策性: [ʈ ʂ o ˥ ˩ ŋ t s ʰ o ˥ ˩ ɕ i ˥ ˩ ŋ] |
Occurrences: 9,480 Examples: * 巴拿馬城: [p a ˥ ˥ n a ˧ ˥ m a ˨ ˩ ˦ ʈ ʂ ʰ o ˧ ˥ ŋ] * 疑難雜癥: [i ˧ ˥ n a ˧ ˥ n t s a ˧ ˥ ʈ ʂ o ˥ ˩ ŋ] * 主持人: [ʈ ʂ u ˨ ˩ ˦ ʈ ʂ ʰ ʐ ̩ ˧ ˥ ʐ ə ˧ ˥ n] * 日常生活: [ʐ ̩ ˥ ˩ ʈ ʂ ʰ a ˧ ˥ ŋ ʂ o ˥ ˥ ŋ x w o ˧ ˥] |
Occurrences: 13,315 Examples: * 五方旗幟: [u ˨ ˩ ˦ f a ˥ ˥ ŋ t ɕ ʰ i ˧ ˥ ʈ ʂ ʐ ̩ ˥ ˩] * 商業區: [ʂ a ˥ ˥ ŋ j e ˥ ˩ t ɕ ʰ y ˥ ˥] * 微積分: [w e j ˥ ˥ t ɕ i ˥ ˥ f ə ˥ ˥ n] * 滅火器: [m j e ˥ ˩ x w o ˨ ˩ ˦ t ɕ ʰ i ˥ ˩] |
||||
Sibilant |
Occurrences: 4,845 Examples: * 四季豆: [s z ̩ ˥ ˩ t ɕ i ˥ ˩ t o w ˥ ˩] * 勒剋斯: [l o ˥ ˩ k ʰ o ˥ ˩ s z ̩ ˥ ˥] * 三七仔: [s a ˥ ˥ n t ɕ ʰ i ˥ ˥ t s z ̩ ˨ ˩ ˦] * 黃大仙祠: [x w a ˧ ˥ ŋ t a ˥ ˩ ɕ j e ˥ ˥ n t s ʰ z ̩ ˧ ˥] Occurrences: 0 Examples: * 俄羅斯: [ʔ o ˧ ˥ l w o ˧ ˥ s z ̩ ˥ ˥] * 死巷子: [s z ̩ ˨ ˩ ˦ ɕ j a ˥ ˩ ŋ t s z ̩ ˩] * 自限性: [t s z ̩ ˥ ˩ ɕ j e ˥ ˩ n ɕ i ˥ ˩ ŋ] * 龜茲語: [t ɕ ʰ j o w ˥ ˥ t s ʰ z ̩ ˧ ˥ y ˨ ˩ ˦] Occurrences: 4,197 Examples: * 電子束: [t j e ˥ ˩ n t s z ̩ ˨ ˩ ˦ ʂ u ˥ ˩] * 梨園子弟: [l i ˧ ˥ ɥ e ˧ ˥ n t s z ̩ ˨ ˩ ˦ t i ˥ ˩] * 斯蒂芬: [s z ̩ ˥ ˥ t i ˥ ˩ f ə ˥ ˥ n] * 腳腕子: [t ɕ j a w ˨ ˩ ˦ w a ˥ ˩ n t s z ̩ ˩] |
Occurrences: 12,099 Examples: * 非洲聯盟: [f e j ˥ ˥ ʈ ʂ o w ˥ ˥ l j e ˧ ˥ n m o ˧ ˥ ŋ] * 帳前吏: [ʈ ʂ a ˥ ˩ ŋ t ɕ ʰ j e ˧ ˥ n l i ˥ ˩] * 六十三: [l j o w ˥ ˩ ʂ ʐ ̩ ˧ ˥ s a ˥ ˥ n] * 萬人塚: [w a ˥ ˩ n ʐ ə ˧ ˥ n ʈ ʂ u ˨ ˩ ˦ ŋ] Occurrences: 2,754 Examples: * 十一日: [ʂ ʐ ̩ ˧ ˥ i ˥ ˥ ʐ ̩ ˥ ˩] * 二十六日: [ʔ o ˥ ˩ ɻ ʂ ʐ ̩ ˧ ˥ l j o w ˥ ˩ ʐ ̩ ˥ ˩] * 二十四史: [ʔ o ˥ ˩ ɻ ʂ ʐ ̩ ˧ ˥ s z ̩ ˥ ˩ ʂ ʐ ̩ ˨ ˩ ˦] * 均值定理: [t ɕ y ˥ ˥ n ʈ ʂ ʐ ̩ ˧ ˥ t i ˥ ˩ ŋ l i ˨ ˩ ˦] Occurrences: 7,088 Examples: * 原教旨主義: [ɥ e ˧ ˥ n t ɕ j a w ˥ ˩ ʈ ʂ ʐ ̩ ˨ ˩ ˦ ʈ ʂ u ˨ ˩ ˦ i ˥ ˩] * 電氣石: [t j e ˥ ˩ n t ɕ ʰ i ˥ ˩ ʂ ʐ ̩ ˧ ˥] * 結婚戒指: [t ɕ j e ˧ ˥ x w ə ˥ ˥ n t ɕ j e ˥ ˩ ʈ ʂ ʐ ̩ ˩] * 食肉目: [ʂ ʐ ̩ ˧ ˥ ʐ o w ˥ ˩ m u ˥ ˩] |
Occurrences: 11,482 Examples: * 舞陽君: [u ˨ ˩ ˦ j a ˧ ˥ ŋ t ɕ y ˥ ˥ n] * 傳國璽: [ʈ ʂ ʰ w a ˧ ˥ n k w o ˧ ˥ ɕ i ˨ ˩ ˦] * 科西嘉島: [k ʰ o ˥ ˥ ɕ i ˥ ˥ t ɕ j a ˥ ˥ t a w ˨ ˩ ˦] * 羚羊角: [l i ˧ ˥ ŋ j a ˧ ˥ ŋ t ɕ j a w ˨ ˩ ˦] |
||||
Fricative |
Occurrences: 5,914 Examples: * 番茄汁: [f a ˥ ˥ n t ɕ ʰ j e ˧ ˥ ʈ ʂ ʐ ̩ ˥ ˥] * 花粉癥: [x w a ˥ ˥ f ə ˨ ˩ ˦ n ʈ ʂ o ˥ ˩ ŋ] * 自由放任: [t s z ̩ ˥ ˩ j o w ˧ ˥ f a ˥ ˩ ŋ ʐ ə ˥ ˩ n] * 東方紅: [t u ˥ ˥ ŋ f a ˥ ˥ ŋ x u ˧ ˥ ŋ] |
||||||
Approximant |
Occurrences: 22,330 Examples: * 錫剋教: [ɕ i ˥ ˥ k ʰ o ˥ ˩ t ɕ j a w ˥ ˩] * 從小到大: [t s ʰ u ˧ ˥ ŋ ɕ j a w ˨ ˩ ˦ t a w ˥ ˩ t a ˥ ˩] * 密爾瓦基: [m i ˥ ˩ ʔ o ˨ ˩ ˦ ɻ w a ˨ ˩ ˦ t ɕ i ˥ ˥] * 管弦樂隊: [k w a ˨ ˩ ˦ n ɕ j e ˧ ˥ n ɥ e ˥ ˩ t w e j ˥ ˩] |
Occurrences: 1,436 Examples: * 一邊兒: [i ˥ ˥ p j a ˥ ˥ ɻ] * 土耳其文: [t ʰ u ˨ ˩ ˦ ʔ o ˨ ˩ ˦ ɻ t ɕ ʰ i ˧ ˥ w ə ˧ ˥ n] * 達斡爾族: [t a ˧ ˥ w o ˥ ˩ ʔ o ˨ ˩ ˦ ɻ t s u ˧ ˥] * 模特兒: [m w o ˧ ˥ t ʰ o ˥ ˩ ɻ] |
Occurrences: 27,517 Examples: * 列王記: [l j e ˥ ˩ w a ˧ ˥ ŋ t ɕ i ˥ ˩] * 小溪河: [ɕ j a w ˨ ˩ ˦ ɕ i ˥ ˥ x o ˧ ˥] * 雅各伯: [j a ˨ ˩ ˦ k o ˥ ˩ p w o ˧ ˥] * 三級片: [s a ˥ ˥ n t ɕ i ˧ ˥ p ʰ j e ˥ ˩ n] Occurrences: 4,301 Examples: * 月見草: [ɥ e ˥ ˩ t ɕ j e ˥ ˩ n t s ʰ a w ˨ ˩ ˦] * 瑞士捲: [ʐ w e j ˥ ˩ ʂ ʐ ̩ ˥ ˩ t ɕ ɥ e ˨ ˩ ˦ n] * 委麯求全: [w e j ˨ ˩ ˦ t ɕ ʰ y ˥ ˥ t ɕ ʰ j o w ˧ ˥ t ɕ ʰ ɥ e ˧ ˥ n] * 齣納員: [ʈ ʂ ʰ u ˥ ˥ n a ˥ ˩ ɥ e ˧ ˥ n] |
||||
Lateral |
Occurrences: 11,757 Examples: * 石榴皮: [ʂ ʐ ̩ ˧ ˥ l j o w ˧ ˥ p ʰ i ˧ ˥] * 拉巴特: [l a ˥ ˥ p a ˥ ˥ t ʰ o ˥ ˩] * 大墩路: [t a ˥ ˩ t w ə ˥ ˥ n l u ˥ ˩] * 伊利亞特: [i ˥ ˥ l i ˥ ˩ j a ˥ ˩ t ʰ o ˥ ˩] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 28,825 Examples: * 精密度: [t ɕ i ˥ ˥ ŋ m i ˥ ˩ t u ˥ ˩] * 比利時: [p i ˨ ˩ ˦ l i ˥ ˩ ʂ ʐ ̩ ˧ ˥] * 土地爺: [t ʰ u ˨ ˩ ˦ t i ˦ j e ˧ ˥] * 釘書機: [t i ˥ ˩ ŋ ʂ u ˥ ˥ t ɕ i ˥ ˥] Occurrences: 5,781 Examples: * 莙薘菜: [t ɕ y ˥ ˩ n t a ˧ ˥ t s ʰ a j ˥ ˩] * 喜歌劇: [ɕ i ˨ ˩ ˦ k o ˥ ˥ t ɕ y ˥ ˩] * 堯舜禹湯: [j a w ˧ ˥ ʂ w ə ˥ ˩ n y ˨ ˩ ˦ t ʰ a ˥ ˥ ŋ] * 瑞士軍官刀: [ʐ w e j ˥ ˩ ʂ ʐ ̩ ˥ ˩ t ɕ y ˥ ˥ n k w a ˥ ˥ n t a w ˥ ˥] |
Occurrences: 22,774 Examples: * 奧卡姆剃刀: [ʔ a w ˥ ˩ k ʰ a ˨ ˩ ˦ m u ˨ ˩ ˦ t ʰ i ˥ ˩ t a w ˥ ˥] * 一股腦兒: [i ˥ ˥ k u ˨ ˩ ˦ n a w ˨ ˩ ˦ ɻ] * 保不定: [p a w ˨ ˩ ˦ p u ˦ t i ˥ ˩ ŋ] * 常陸大宮: [ʈ ʂ ʰ a ˧ ˥ ŋ l u ˥ ˩ t a ˥ ˩ k u ˥ ˥ ŋ] |
|||
Close-Mid |
Occurrences: 16,209 Examples: * 導盲犬: [t a w ˨ ˩ ˦ m a ˧ ˥ ŋ t ɕ ʰ ɥ e ˨ ˩ ˦ n] * 煙油子: [j e ˥ ˥ n j o w ˧ ˥ t s z ̩ ˨] * 立足點: [l i ˥ ˩ t s u ˧ ˥ t j e ˨ ˩ ˦ n] * 大少爺: [t a ˥ ˩ ʂ a w ˥ ˩ j e ˩] Occurrences: 7,740 Examples: * 氣纍脖兒: [t ɕ ʰ i ˥ ˩ l e j ˩ p w o ˧ ˥ ɻ] * 水力劈裂: [ʂ w e j ˨ ˩ ˦ l i ˥ ˩ p ʰ i ˥ ˥ l j e ˥ ˩] * 迴報率: [x w e j ˧ ˥ p a w ˥ ˩ l y ˥ ˩] * 對談者: [t w e j ˥ ˩ t ʰ a ˧ ˥ n ʈ ʂ o ˨ ˩ ˦] |
Occurrences: 19,963 Examples: * 火車頭: [x w o ˨ ˩ ˦ ʈ ʂ ʰ o ˥ ˥ t ʰ o w ˧ ˥] * 糯稻根: [n w o ˥ ˩ t a w ˥ ˩ k ə ˥ ˥ n] * 走後門兒: [t s o w ˨ ˩ ˦ x o w ˥ ˩ m ə ˧ ˥ ɻ] * 小老婆: [ɕ j a w ˨ ˩ ˦ l a w ˨ ˩ ˦ p ʰ w o ˦] Occurrences: 7,993 Examples: * 木頭人: [m u ˥ ˩ t ʰ o w ˩ ʐ ə ˧ ˥ n] * 陳傢樓: [ʈ ʂ ʰ ə ˧ ˥ n t ɕ j a ˥ ˥ l o w ˧ ˥] * 井字遊戲: [t ɕ i ˨ ˩ ˦ ŋ t s z ̩ ˥ ˩ j o w ˧ ˥ ɕ i ˥ ˩] * 修正液: [ɕ j o w ˥ ˥ ʈ ʂ o ˥ ˩ ŋ j e ˥ ˩] |
|||
Occurrences: 7,352 Examples: * 四輪驅動: [s z ̩ ˥ ˩ l w ə ˧ ˥ n t ɕ ʰ y ˥ ˥ t u ˥ ˩ ŋ] * 溫布頓: [w ə ˥ ˥ n p u ˥ ˩ t w ə ˥ ˩ n] * 人情味: [ʐ ə ˧ ˥ n t ɕ ʰ i ˧ ˥ ŋ w e j ˥ ˩] * 婚紗照: [x w ə ˥ ˥ n ʂ a ˥ ˥ ʈ ʂ a w ˥ ˩] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 36,307 Examples: * 工商界: [k u ˥ ˥ ŋ ʂ a ˥ ˥ ŋ t ɕ j e ˥ ˩] * 恍然大悟: [x w a ˨ ˩ ˦ ŋ ʐ a ˧ ˥ n t a ˥ ˩ u ˥ ˩] * 皇阿瑪: [x w a ˧ ˥ ŋ ʔ a ˥ ˩ m a ˩] * 香港仔: [ɕ j a ˥ ˥ ŋ k a ˨ ˩ ˦ ŋ t s a j ˨ ˩ ˦] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥˥
˥˩
˦
˧˥
˨
˨˩˦
˩