Mandarin (Taiwan) MFA dictionary v3.0.0#
@techreport{mfa_mandarin_taiwan_mfa_dictionary_2024,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Mandarin (Taiwan) MFA dictionary v3.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Mandarin/Mandarin (Taiwan) MFA dictionary v3_0_0.html}},
year={2024},
month={Feb},
}
G2P models |
Installation#
Install from the MFA command line:
mfa model download dictionary mandarin_taiwan_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/mandarin/mfa/Mandarin (Taiwan) MFA dictionary v3_0_0.dict).
Intended use#
This dictionary is intended for forced alignment of Mandarin Chinese transcripts.
This dictionary uses the MFA phone set for Mandarin, and was used in training the Mandarin MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 857 Examples: * 好事多磨: [x aw˨˩˦ ʂ ʐ̩˥˩ t w o˥ m w o˧˥] * 罗马尼亚: [l w o˧˥ m a˨˩˦ ɲ i˧˥ j a˥˩] * 苗栗縣: [m j aw˧˥ ʎ i˥˩ ɕ e˥˩ n] * 北美洲: [p ej˨˩˦ m ej˨˩˦ ʈʂ ow˥] Occurrences: 318 Examples: * 密苏里: [mʲ i˥˩ s u˥ ʎ i˨˩˦] * 黎明路: [ʎ i˧˥ mʲ i˧˥ ŋ l u˥˩] * 原住民: [ɥ e˧˥ n ʈʂ u˥˩ mʲ i˧˥ n] * 名列錶: [mʲ i˧˥ ŋ l j e˥˩ p j aw˨˩˦] |
Occurrences: 7,648 Examples: * 年雨量: [n j e˧˥ ɲ y˨˩˦ l j a˥˩ ŋ] * 輔導人: [f u˨˩˦ t aw˨˩˦ ʐ ə˧˥ n] * 離綫版: [ʎ i˧˥ ɕ e˥˩ n p a˨˩˦ n] * 五十分: [u˨˩˦ ʂ ʐ̩˧˥ f ə˥˩ n] Occurrences: 3 Examples: |
Occurrences: 235 Examples: * 現異性: [ɕ e˥˩ ɲ i˥˩ ɕ i˥˩ ŋ] * 參與感: [tsʰ a˥ ɲ y˥˩ k a˨˩˦ n] * 不盡人意: [pʷ u˥˩ tɕ i˨˩˦ n ʐ ə˧˥ ɲ i˥˩] * 鬍言亂語: [xʷ u˧˥ j e˧˥ n l w a˥˩ ɲ y˨˩˦] |
Occurrences: 5,994 Examples: * 安平路: [ʔ a˥ n pʲ i˧˥ ŋ l u˥˩] * 颱中港: [tʰ aj˧˥ ʈʂ u˥ ŋ k a˨˩˦ ŋ] * 登記簿: [t o˥ ŋ tɕ i˥˩ pʷ u˥˩] * 神岡區: [ʂ ə˧˥ n k a˥ ŋ tɕʰ y˥] Occurrences: 3 Examples: |
|||
Stop Plain |
Occurrences: 1,079 Examples: * 識彆化: [ʂ ʐ̩˧˥ p j e˧˥ xʷ a˥˩] * 玻利维亚: [p w o˥ ʎ i˥˩ w ej˧˥ j a˥˩] * 完整版: [w a˧˥ n ʈʂ o˨˩˦ ŋ p a˨˩˦ n] * 準備金: [ʈʂ w ə˨˩˦ n p ej˥˩ tɕ i˥ n] Occurrences: 464 Examples: * 屏息以待: [pʲ i˧˥ ŋ ɕ i˥ i˨˩˦ t aj˥˩] * 垂死病: [ʈʂʰ w ej˧˥ s z̩˨˩˦ pʲ i˥˩ ŋ] * 閉幕式: [pʲ i˥˩ m u˥˩ ʂ ʐ̩˥˩] * 菲律賓: [f ej˥ ʎ y˥˩ pʲ i˥ n] Occurrences: 344 Examples: * 源源不絕: [ɥ e˧˥ n ɥ e˧˥ n pʷ u˥˩ tɕʷ e˧˥] * 普洱茶: [pʷ u˨˩˦ ʔ o˨˩˦ ɻ ʈʂʰ a˧˥] * 念念不忘: [n j e˥˩ n n j e˥˩ n pʷ u˥˩ w a˥˩ ŋ] * 巴不得乘: [p a˥ pʷ u˥˩ tʲ i˥˩ ʈʂʰ o˧˥ ŋ] |
Occurrences: 1,551 Examples: * 华盛顿: [xʷ a˧˥ ʂ o˥˩ ŋ t w ə˥˩ n] * 大材小用: [t a˥˩ tsʰ aj˧˥ ɕ aw˨˩˦ j u˥˩ ŋ] * 自作多情: [ts z̩˥˩ ts w o˥˩ t w o˥ tɕʰ i˧˥ ŋ] * 電影院: [t j e˥˩ ɲ i˨˩˦ ŋ ɥ e˥˩ n] Occurrences: 577 Examples: * 地下室: [tʲ i˥˩ ɕ a˥˩ ʂ ʐ̩˥˩] * 甜甜圈: [tʲ e˧˥ n tʲ e˧˥ n tɕʷ e˥ n] * 鼎山街: [tʲ i˨˩˦ ŋ ʂ a˥ n tɕ e˥] * 房地産: [f a˧˥ ŋ tʲ i˥˩ ʈʂʰ a˨˩˦ n] Occurrences: 637 Examples: * 山东省: [ʂ a˥ n tʷ u˥ ŋ ʂ o˨˩˦ ŋ] * 大同區: [t a˥˩ tʷ u˧˥ ŋ tɕʰ y˥] * 印度洋: [i˥˩ n tʷ u˥˩ j a˧˥ ŋ] * 同盟路: [tʷ u˧˥ ŋ m o˧˥ ŋ l u˥˩] |
Occurrences: 1,193 Examples: * 親切感: [tɕʰ i˥ n tɕʰ e˥˩ k a˨˩˦ n] * 高屏溪: [k aw˥ pʲ i˧˥ ŋ ɕ i˥] * 狗尾續貂: [k ow˨˩˦ w ej˨˩˦ ɕ y˥˩ t j aw˥] * 岡山南路: [k a˥ ŋ ʂ a˥ n n a˧˥ n l u˥˩] Occurrences: 478 Examples: * 恐固力: [kʷ u˨˩˦ ŋ kʷ u˥˩ ʎ i˥˩] * 農民工: [n u˧˥ ŋ mʲ i˧˥ n kʷ u˥ ŋ] * 共和國: [kʷ u˥˩ ŋ x o˧˥ k w o˧˥] * 辦公室: [p a˥˩ n kʷ u˥ ŋ ʂ ʐ̩˨˩˦] |
Occurrences: 416 Examples: * 一九九二年: [i˥ tɕ ow˨˩˦ tɕ ow˨˩˦ ʔ o˥˩ ɻ n j e˧˥ n] * 八安橋: [p a˥ ʔ a˥ n tɕʰ aw˧˥] * 奥林匹克: [ʔ aw˥˩ ʎ i˧˥ n pʲ i˨˩˦ kʰ o˥˩] * 安樂區: [ʔ a˥ n l o˥˩ tɕʰ y˥] |
|||
Aspirated |
Occurrences: 374 Examples: * 詹姆斯龐德: [ʈʂ a˥ n m u˨˩˦ s z̩˥ pʰ a˧˥ ŋ t o˧˥] * 派齣所: [pʰ aj˥˩ ʈʂʰ u˥ s w o˨˩˦] * 旁觀者: [pʰ a˧˥ ŋ k w a˥ n ʈʂ o˨˩˦] * 香菜派: [ɕ a˥ ŋ tsʰ aj˥˩ pʰ aj˥˩] |
Occurrences: 845 Examples: * 愛因斯坦: [ʔ aj˥˩ i˥ n s z̩˥ tʰ a˨˩˦ n] * 動態式: [tʷ u˥˩ ŋ tʰ aj˥˩ ʂ ʐ̩˥˩] * 颱南人: [tʰ aj˧˥ n a˧˥ n ʐ ə˧˥ n] * 堂車站: [tʰ a˧˥ ŋ ʈʂʰ o˥ ʈʂ a˥˩ n] |
Occurrences: 624 Examples: * 莫斯科: [m w o˥˩ s z̩˥ kʰ o˥] * 科帕县: [kʰ o˥ pʰ a˥˩ ɕ e˥˩ n] * 社科院: [ʂ o˥˩ kʰ o˥ ɥ e˥˩ n] * 齣海口: [ʈʂʰ u˥ x aj˨˩˦ kʰ ow˨˩˦] |
||||
Affricate Plain |
Occurrences: 872 Examples: * 资本主义: [ts z̩˥ p ə˨˩˦ n ʈʂ u˨˩˦ i˥˩] * 足球队: [ts u˧˥ tɕʰ ow˧˥ t w ej˥˩] * 不敢造次: [pʷ u˥˩ k a˨˩˦ n ts aw˥˩ tsʰ z̩˥˩] * 議事組: [i˥˩ ʂ ʐ̩˥˩ ts u˨˩˦] |
Occurrences: 2,052 Examples: * 當局者: [t a˥ ŋ tɕ y˧˥ ʈʂ o˨˩˦] * 中科院: [ʈʂ u˥ ŋ kʰ o˥ ɥ e˥˩ n] * 張學良: [ʈʂ a˥ ŋ ɕʷ e˧˥ l j a˧˥ ŋ] * 指日可待: [ʈʂ ʐ̩˨˩˦ ʐ̩˥˩ kʰ o˨˩˦ t aj˥˩] |
Occurrences: 2,411 Examples: * 九一八: [tɕ ow˨˩˦ i˥ p a˥] * 科學傢: [kʰ o˥ ɕʷ e˧˥ tɕ a˥] * 社會局: [ʂ o˥˩ xʷ ej˥˩ tɕ y˧˥] * 北京市: [p ej˨˩˦ tɕ i˥ ŋ ʂ ʐ̩˥˩] Occurrences: 237 Examples: * 甜甜圈: [tʰ j e˧˥ n tʰ j e˧˥ n tɕʷ e˥ n] * 泉州市: [tɕʷ e˧˥ n ʈʂ ow˥ ʂ ʐ̩˥˩] * 生活圈: [ʂ o˥ ŋ xʷ o˧˥ tɕʷ e˥ n] * 不可或缺: [pʷ u˥˩ kʰ o˨˩˦ xʷ o˥˩ tɕʷ e˥] |
||||
Aspirated |
Occurrences: 506 Examples: * 北山村: [p ej˨˩˦ ʂ a˥ n tsʰ w ə˥ n] * 小林村: [ɕ aw˨˩˦ ʎ i˧˥ n tsʰ w ə˥ n] * 促進社: [tsʰ u˥˩ tɕ i˥˩ n ʂ o˥˩] * 參與者: [tsʰ a˥ ɲ y˨˩˦ ʈʂ o˨˩˦] |
Occurrences: 1,293 Examples: * 迴收場: [xʷ ej˧˥ ʂ ow˥ ʈʂʰ a˧˥ ŋ] * 籌備委: [ʈʂʰ ow˧˥ p ej˥˩ w ej˨˩˦] * 斷腸人: [t w a˥˩ n ʈʂʰ a˧˥ ŋ ʐ ə˧˥ n] * 警察局: [tɕ i˨˩˦ ŋ ʈʂʰ a˧˥ tɕ y˧˥] |
Occurrences: 1,250 Examples: * 高架橋: [k aw˥ tɕ a˥˩ tɕʰ aw˧˥] * 保护区: [p aw˨˩˦ xʷ u˥˩ tɕʰ y˥] * 機器人: [tɕ i˥ tɕʰ i˥˩ ʐ ə˧˥ n] * 一日韆裏: [i˧˥ ʐ̩˥˩ tɕʰ e˥ n ʎ i˨˩˦] |
||||
Sibilant |
Occurrences: 618 Examples: * 星期三: [ɕ i˥ ŋ tɕʰ i˧˥ s a˥ n] * 史語所: [ʂ ʐ̩˨˩˦ y˨˩˦ s w o˨˩˦] * 一哄而散: [i˧˥ xʷ u˥˩ ŋ ʔ o˧˥ ɻ s a˥˩ n] * 我行我素: [w o˨˩˦ ɕ i˧˥ ŋ w o˨˩˦ s u˥˩] Occurrences: 0 Examples: Occurrences: 575 Examples: * 資本傢: [ts z̩˥ p ə˨˩˦ n tɕ a˥] * 龍慈路: [l u˧˥ ŋ tsʰ z̩˧˥ l u˥˩] * 弗内斯: [f u˧˥ n ej˥˩ s z̩˥] * 魏公子: [w ej˥˩ kʷ u˥ ŋ ts z̩˩] |
Occurrences: 2,389 Examples: * 九十日: [tɕ ow˨˩˦ ʂ ʐ̩˧˥ ʐ̩˥˩] * 二十分: [ʔ o˥˩ ɻ ʂ ʐ̩˧˥ f ə˥˩ n] * 文化市: [w ə˧˥ n xʷ a˥˩ ʂ ʐ̩˥˩] * 名副其實: [mʲ i˧˥ ŋ f u˥˩ tɕʰ i˧˥ ʂ ʐ̩˧˥] Occurrences: 626 Examples: * 仁愛院: [ʐ ə˧˥ n ʔ aj˥˩ ɥ e˥˩ n] * 年輕人: [n j e˧˥ n tɕʰ i˥ ŋ ʐ ə˧˥ n] * 花花惹: [xʷ a˥ xʷ a˥ ʐ o˨˩˦] * 瑞隆路: [ʐ w ej˥˩ l u˧˥ ŋ l u˥˩] Occurrences: 1,481 Examples: * 大口吃肉: [t a˥˩ kʰ ow˨˩˦ ʈʂʰ ʐ̩˥ ʐ ow˥˩] * 同安市: [tʷ u˧˥ ŋ ʔ a˥ n ʂ ʐ̩˥˩] * 隻不過: [ʈʂ ʐ̩˨˩˦ pʷ u˥ k w o˥˩] * 落井下石: [l w o˥˩ tɕ i˨˩˦ ŋ ɕ a˥˩ ʂ ʐ̩˧˥] |
Occurrences: 2,142 Examples: * 一下子: [i˧˥ ɕ a˥˩ ts z̩˩] * 西西里岛: [ɕ i˥ ɕ i˥ ʎ i˨˩˦ t aw˨˩˦] * 匈牙利: [ɕ u˥ ŋ j a˧˥ ʎ i˥˩] * 忠孝橋: [ʈʂ u˥ ŋ ɕ aw˥˩ tɕʰ aw˧˥] Occurrences: 226 Examples: * 學鋼琴: [ɕʷ e˧˥ k a˥ ŋ tɕʰ i˧˥ n] * 凱鏇路: [kʰ aj˨˩˦ ɕʷ e˧˥ n l u˥˩] * 心理学家: [ɕ i˥ n ʎ i˨˩˦ ɕʷ e˧˥ tɕ a˥] * 下雪天: [ɕ a˥˩ ɕʷ e˨˩˦ tʰ j e˥ n] |
||||
Fricative |
Occurrences: 1,118 Examples: * 菲律賓: [f ej˥ ʎ y˥˩ pʲ i˥ n] * 豐原區: [f o˥ ŋ ɥ e˧˥ n tɕʰ y˥] * 保護法: [p aw˨˩˦ xʷ u˥˩ f a˨˩˦] * 福壽街: [f u˧˥ ʂ ow˥˩ tɕ e˥] |
||||||
Approximant |
Occurrences: 3,411 Examples: * 文心南路: [w ə˧˥ n ɕ i˥ n n a˧˥ n l u˥˩] * 模範生: [m w o˧˥ f a˥˩ n ʂ o˥ ŋ] * 專業品: [ʈʂ w a˥ n j e˥˩ pʲ i˨˩˦ n] * 電磁波: [t j e˥˩ n tsʰ z̩˧˥ p w o˥] |
Occurrences: 131 Examples: * 一哄而散: [i˥ xʷ u˥˩ ŋ ʔ o˧˥ ɻ s a˥˩ n] * 公二八: [kʷ u˥ ŋ ʔ o˥˩ ɻ p a˥] * 十全二路: [ʂ ʐ̩˧˥ tɕʷ e˧˥ n ʔ o˥˩ ɻ l u˥˩] * 二零一二年: [ʔ o˥˩ ɻ ʎ i˧˥ ŋ i˥ ʔ o˥˩ ɻ n j e˧˥ n] |
Occurrences: 2,550 Examples: * 今年初: [tɕ i˥ n n j e˧˥ n ʈʂʰ u˥] * 大不瞭: [t a˥˩ pʷ u˥˩ l j aw˨˩˦] * 連鎖店: [l j e˧˥ n s w o˨˩˦ tʲ e˥˩ n] * 顺天府: [ʂ w ə˥˩ n tʰ j e˥ n f u˨˩˦] Occurrences: 386 Examples: * 设计院: [ʂ o˥˩ tɕ i˥˩ ɥ e˥˩ n] * 遠東站: [ɥ e˨˩˦ n tʷ u˥ ŋ ʈʂ a˥˩ n] * 王音樂: [w a˧˥ ŋ i˥ n ɥ e˥˩] * 研究院: [j e˧˥ n tɕ ow˥ ɥ e˥˩ n] |
||||
Lateral |
Occurrences: 1,414 Examples: * 縣芬路: [ɕ e˥˩ n f ə˥ n l u˥˩] * 興東路: [ɕ i˥˩ ŋ tʷ u˥ ŋ l u˥˩] * 洛根县: [l w o˥˩ k ə˥ n ɕ e˥˩ n] * 復興南路: [f u˥˩ ɕ i˥˩ ŋ n a˧˥ n l u˥˩] |
Occurrences: 755 Examples: * 既得利益: [tɕ i˥˩ t o˧˥ ʎ i˥˩ i˥˩] * 叙利亚: [ɕ y˥˩ ʎ i˥˩ j a˨˩˦] * 罹患率: [ʎ i˧˥ xʷ a˥˩ n ʎ y˥˩] * 一零三年: [i˥ ʎ i˧˥ ŋ s a˥ n n j e˧˥ n] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 5,207 Examples: * 好景不常: [x aw˨˩˦ tɕ i˨˩˦ ŋ pʷ u˥˩ ʈʂʰ a˧˥ ŋ] * 區桌椅: [tɕʰ y˥ ʈʂ w o˥ i˨˩˦] * 黎明路: [ʎ i˧˥ mʲ i˧˥ ŋ l u˥˩] * 關係人: [k w a˥ n ɕ i˩ ʐ ə˧˥ n] Occurrences: 1,160 Examples: * 平谷区: [pʲ i˧˥ ŋ kʷ u˨˩˦ tɕʰ y˨˩˦] * 五髒俱全: [u˨˩˦ ts a˥˩ ŋ tɕ y˥˩ tɕʷ e˧˥ n] * 语言学: [y˨˩˦ j e˧˥ n ɕʷ e˧˥] * 燕巢區: [j e˥˩ n ʈʂʰ aw˧˥ tɕʰ y˥] |
Occurrences: 4,262 Examples: * 差不多: [ʈʂʰ a˥˩ pʷ u˩ t w o˥] * 輔導人: [f u˨˩˦ t aw˨˩˦ ʐ ə˧˥ n] * 紅樹林: [xʷ u˧˥ ŋ ʂ u˥˩ ʎ i˧˥ n] * 福利部: [f u˧˥ ʎ i˥˩ pʷ u˥˩] |
|||
Close-Mid |
Occurrences: 3,517 Examples: * 瓜拉雪兰: [k w a˥ l a˥ ɕʷ e˨˩˦ l a˧˥ n] * 星聚點: [ɕ i˥ ŋ tɕ y˥˩ t j e˨˩˦ n] * 遠離市: [ɥ e˨˩˦ n ʎ i˧˥ ʂ ʐ̩˥˩] * 自由權: [ts z̩˥˩ j ow˧˥ tɕʷ e˧˥ n] Occurrences: 1,576 Examples: * 營業稅: [i˧˥ ŋ j e˥˩ ʂ w ej˥˩] * 鬼鬼祟祟: [k w ej˨˩˦ k w ej˨˩˦ s w ej˥˩ s w ej˥˩] * 麵對麵: [m j e˥˩ n t w ej˥˩ m j e˥˩ n] * 淡水人: [t a˥˩ n ʂ w ej˨˩˦ ʐ ə˧˥ n] |
Occurrences: 3,436 Examples: * 可能性: [kʰ o˨˩˦ n o˧˥ ŋ ɕ i˥˩ ŋ] * 康特拉科: [kʰ a˥ ŋ tʰ o˥˩ l a˥ kʰ o˥] * 二鍋頭: [ʔ o˥˩ ɻ k w o˥ tʰ ow˧˥] * 所得稅: [s w o˨˩˦ t o˧˥ ʂ w ej˥˩] Occurrences: 1,396 Examples: * 修道院: [ɕ ow˥ t aw˥˩ ɥ e˥˩ n] * 苟延殘喘: [k ow˨˩˦ j e˧˥ n tsʰ a˧˥ n ʈʂʰ w a˨˩˦ n] * 宇宙人: [y˨˩˦ ʈʂ ow˥˩ ʐ ə˧˥ n] * 有朝一日: [j ow˨˩˦ ʈʂ aw˥ i˧˥ ʐ̩˥˩] |
|||
Occurrences: 1,459 Examples: * 敦化南路: [t w ə˥ n xʷ a˥˩ n a˧˥ n l u˥˩] * 照本宣科: [ʈʂ aw˥˩ p ə˨˩˦ n ɕʷ e˥ n kʰ o˥] * 三十分: [s a˥ n ʂ ʐ̩˧˥ f ə˥˩ n] * 年輕人: [n j e˧˥ n tɕʰ i˥ ŋ ʐ ə˧˥ n] |
|||||
Open-Mid |
|||||
Open |
Occurrences: 6,376 Examples: * 五髒俱全: [u˨˩˦ ts a˥˩ ŋ tɕ y˥˩ tɕʷ e˧˥ n] * 司法人: [s z̩˥ f a˨˩˦ ʐ ə˧˥ n] * 幸福感: [ɕ i˥˩ ŋ f u˧˥ k a˨˩˦ n] * 拉阿魯: [l a˥ ʔ a˥ l u˨˩˦] |
Diphthongs#
aj
aw
ej
ow
Tones#
˥
˥˩
˧
˧˥
˨˩˦
˩