Japanese MFA dictionary v3.0.0#
@techreport{mfa_japanese_mfa_dictionary_2024,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Japanese MFA dictionary v3.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Japanese/Japanese MFA dictionary v3_0_0.html}},
year={2024},
month={Feb},
}
Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary japanese_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/japanese/mfa/Japanese MFA dictionary v3_0_0.dict).
Intended use#
This dictionary is intended for forced alignment of Japanese transcripts.
This dictionary uses the MFA phone set for Japanese, and was used in training the Japanese MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Palatal |
Velar |
Uvular |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 124,858 Examples: * 求めたら: [m o t o m e t a ɾ a] * 楽しませて: [t a n o ɕ i m a s e t e] * まるや: [m a ɾ ɯ j a] * カンボジア: [k a m b o ʑ i a] Occurrences: 34,298 Examples: * 中国絡み: [tɕ ɨː ɡ o k ɯ ɡ a ɾ a mʲ i] * 磨きたい: [mʲ i ɡ a c i t a i] * 伏見区: [ɸ ɯ ɕ i mʲ i k ɯ] * 妙高山: [mʲ oː k oː s a ɴ] Occurrences: 766 Examples: * 一本道: [i pː o mʲː i tɕ i] * 難民たち: [n a mʲː i ɲ t a tɕ] * 近未来: [c i mʲː i ɾ a i] * 新民連: [ɕ i mʲː i n ɾ e ɴ] Occurrences: 3,663 Examples: * 外国産米: [ɡ a i k o k ɯ s a mː a i] * 専門店: [s e mː o n t e ɴ] * 何万倍: [n a mː a m b a i] * 半面高: [h a mː e ŋ k oː] |
Occurrences: 117,160 Examples: * nasa: [n a s a] * 鳴沢村: [n a ɾ ɯ s a w a m ɯ ɾ a] * かけこんで: [k a k e k o n d e] * レバノン人: [ɾ e b a n o ɰ̃ ɲ i ɴ] Occurrences: 4,069 Examples: * いろんな: [i ɾ o nː a] * 本能的: [h o nː oː t e c i] * 福島県内: [ɸ ɯ k ɯ ɕ i m a k e nː a i] * 天王寺区: [t e nː oː ʑ i k ɯ] |
Occurrences: 46,135 Examples: * 二段階目: [ɲ iː d a ŋ k a i m e] * 生産量: [s eː s a ɲ ɾʲ oː] * お役人: [o j a k ɯ ɲ i ɴ] * 三キロ: [s a ɲ c i ɾ o] Occurrences: 1,265 Examples: * にんにく: [ɲ i ɲː i k ɯ] * 不信任: [ɸ ɕ i ɲː i ɴ] * 新日鉄: [ɕ i ɲː i t e ts ɨ] * カンニング: [k a ɲː i ŋ ɡ ɯ] |
Occurrences: 27,375 Examples: * ドーピング: [d oː pʲ i ŋ ɡ ɯ] * 人間たち: [ɲ i ŋ ɡ e n t a tɕ] * 感覚的: [k a ŋ k a k ɯ t e c] * 文化祭: [b ɯ ŋ k a s a i] |
Occurrences: 57,108 Examples: * 研究員: [k e ɲ c ɨː i ɴ] * 歌い手さん: [ɯ t a i t e s a ɴ] * 鈴木君: [s ɨ z ɨ c k ɯ ɴ] * 未経験: [mʲ i k eː k e ɴ] Occurrences: 9 Examples: * うーん: [ɴː] |
||
Stop |
Occurrences: 21,264 Examples: * レポート: [ɾ e p oː t o] * 飼い猫ぷ: [k a i n e k o p ɯ] * 文法的: [b ɯ m p oː t e c i] * パロディ: [p a ɾ o dʲ i] Occurrences: 4,147 Examples: * ぴったり: [pʲ tː a ɾʲ i] * ガレスピー: [ɡ a ɾ e s ɨ pʲ iː] * スピノザ: [s ɨ pʲ i n o z a] * ピント: [pʲ i n t o] Occurrences: 532 Examples: * フロッピー: [ɸ ɯ ɾ o pʲː iː] * グッピー: [ɡ ɯ pʲː iː] * すっぴん: [s ɨ pʲː i ɴ] * 年月日: [n e ŋ ɡ a pʲː i] Occurrences: 5,578 Examples: * ストリップ: [s t o ɾʲ i pː] * カップ系: [k a pː ɯ k eː] * ひっ迫: [ç i pː a k ɯ] * きっぷ: [c i pː ɯ] Occurrences: 73,996 Examples: * ブチ切れ: [b ɯ tɕ i c i ɾ e] * コロンブス: [k o ɾ o m b ɯ s ɨ] * 植物園: [ɕ o k ɯ b ɯ ts ɨ e ɴ] * ブレン: [b ɯ ɾ e ɴ] Occurrences: 15,624 Examples: * kbs: [k eː bʲ iː e s ɨ] * 手引き: [t e bʲ i c i] * ビーバー: [bʲ iː b aː] * くびれ: [k ɯ bʲ i ɾ e] Occurrences: 4 Examples: Occurrences: 44 Examples: * web: [w e bː ɯ] * ビッベ: [bʲ i bː e] * やばい: [j a bː eː] |
Occurrences: 150,999 Examples: * 短プラ: [t a m p ɯ ɾ a] * 取り込める: [t o ɾʲ i k o m e ɾ ɯ] * 採って: [t o tː e] * だして: [d a ɕ t e] Occurrences: 3,857 Examples: * ティアラ: [tʲ i a ɾ a] * オーティス: [oː tʲ i s ɨ] * ティー: [tʲ iː] * ゴーティエ: [ɡ oː tʲ i e] Occurrences: 160 Examples: * メノッティ: [m e n o tʲː i̥] * ブガッティ: [b ɯ ɡ a tʲː i] * スコッティ: [s ɨ k o tʲː i] Occurrences: 16,804 Examples: * 当たったり: [a t a tː a ɾʲ i] * 突っ張って: [ts ɨ pː a tː e] * 目立って: [m e d a tː e] * っぽかった: [pː o k a tː a] Occurrences: 68,752 Examples: * キリン堂: [c i ɾʲ i n d oː] * 電線向け: [d e ɰ̃ s e mː ɯ k e] * 追い出す: [o i d a s ɨ] * 運んだり: [h a k o n d a ɾʲ i] Occurrences: 3,717 Examples: * レディース: [ɾ e dʲ iː s ɨ] * ディリップ: [dʲ i ɾʲ i pː ɯ] * デイトン: [dʲ i t o ɴ] * デュアル: [dʲ ɨ a ɾ ɯ] Occurrences: 3 Examples: Occurrences: 984 Examples: * レッド: [ɾ e dː o] * オッド: [o dː o] * リクッド: [ɾʲ i k ɯ dː o] * ヘッド: [h e dː o] |
Occurrences: 83,466 Examples: * 競争力: [c oː s oː ɾʲ o k ɯ] * 動き始めて: [ɯ ɡ o c i h a ʑ i m e t e] * 着替え: [c i ɡ a e] * プリキュア: [p ɯ ɾʲ i c ɨ a] Occurrences: 2,913 Examples: * 日記的: [ɲ i cː i t e c i] * 六キロ: [ɾ o cː i ɾ o] * くっきり: [k ɯ̥ cː i ɾʲ i] * 消去法: [ɕ oː cː o h oː] Occurrences: 18,348 Examples: * 営業力: [eː ɟ oː ɾʲ o k ɯ] * 岐阜県: [ɟ i ɸ ɯ k e ɴ] * 行革審: [ɟ oː k a k ɕ i ɴ] * 工業品: [k oː ɟ oː ç i ɴ] Occurrences: 10 Examples: * マッギー: [m a ɟː iː] |
Occurrences: 298,910 Examples: * 虫除け: [m ɯ ɕ i j o k e] * 合言葉: [a i k o t o b a] * 財テク: [dz a i t e k ɯ] * 空気等: [k ɯː c t oː] Occurrences: 12,247 Examples: * コサック: [k o s a kː ɯ] * 作家様: [s a kː a s a m a] * 三日目: [mʲ i kː a m e] * ギックリ: [ɟ i kː ɯ ɾʲ i] Occurrences: 86,852 Examples: * 科学者: [k a ɡ a k ɕ a] * 人ごみ: [ç t o ɡ o mʲ i] * グダグダ: [ɡ ɯ d a ɡ ɯ d a] * 慰められる: [n a ɡ ɯ s a m e ɾ a ɾ e ɾ ɯ] Occurrences: 479 Examples: * フラッグス: [ɸ ɯ ɾ a ɡː ɯ s ɨ] * レッグ: [ɾ e ɡː ɯ] * ドッグ: [d o ɡː ɯ] * タッグ: [t a ɡː ɯ] |
Occurrences: 207 Examples: * ニョロッ: [ɲ o ɾ o ʔ] * ポチッ: [p o tɕ i ʔ] * キャッ: [c a ʔ] * しよっ: [ɕ i j o ʔ] |
||
Affricate |
Occurrences: 55,275 Examples: * 詰め寄った: [ts ɨ m e j o tː a] * 八カ月: [h a tɕ i̥ k a ɡ e ts ɨ] * 郵便物: [j ɨː bʲ i m b ɯ ts ɨ] * ついつい: [ts ɨ i ts ɨ i] Occurrences: 507 Examples: * 引っ掴んで: [ç i tsː ɨ k a n d e] * しかめっ面: [ɕ i k a m e tsː ɨ ɾ a] * くっついて: [k ɯ tsː ɨ i t e] Occurrences: 10,688 Examples: * 造幣局: [dz oː h eː c o k ɯ] * ズオウ: [dz ɨ oː] * メンズ: [m e n dz ɨ] * 属した: [dz o k ɯ ɕ i t a] Occurrences: 63 Examples: * ぜんぜん: [dz e n dzː e ɴ] * ドッズ: [d o dzː ɨ] * グッズ: [ɡ ɯ dzː ɨ] * レッズ: [ɾ e dzː ɨ] |
Occurrences: 47,557 Examples: * 漁民たち: [ɟ o mʲ i n t a tɕ] * 非嫡出子: [ç i tɕ a k ɯ̥ ɕ ts ɨ ɕ i] * 住民たち: [dʑ ɨː mʲ i n t a tɕ i] * 口応え: [k ɯ tɕ i ɡ o t a e] Occurrences: 2,784 Examples: * ちっちゃく: [tɕ i tɕː a k ɯ] * 一人ぽっち: [ç i t o ɾʲ i p o tɕː i] * 殺虫剤: [s a tɕː ɨː z a i] * スケッチ: [s ɨ k e tɕː i] Occurrences: 22,009 Examples: * 自律的: [dʑ i ɾʲ i ts ɨ t e c i] * ジャスパー: [dʑ a s p aː] * ジレンマ: [dʑ i ɾ e mː a] * 純文学: [dʑ ɨ m b ɯ ŋ ɡ a k ɯ] Occurrences: 266 Examples: * ビレッジ: [bʲ i ɾ e dʑː i] * エッジ: [e dʑː i] * ヘッジ売り: [h e dʑː i ɯ ɾʲ i] * カレッジ: [k a ɾ e dʑː i] |
|||||
Sibilant |
Occurrences: 178,174 Examples: * 励みます: [h a ɡ e mʲ i m a s ɨ] * 買います: [k a i m a s ɨ] * 促そう: [ɯ n a ɡ a s oː] * 現れます: [a ɾ a w a ɾ e m a s] Occurrences: 2,887 Examples: * 達する: [t a sː ɨ ɾ ɯ] * ベッセマー: [b e sː e m aː] * 別世界: [b e sː e k a i] * 一センチ: [i sː e ɲ tɕ i] Occurrences: 37,143 Examples: * 行わず: [o k o n a w a z ɨ] * 宝塚市: [t a k a ɾ a z ɨ k a ɕ i] * 貴族たち: [c i z o k t a tɕ] * 尋ねました: [t a z ɨ n e m a ɕ t a] |
Occurrences: 125,822 Examples: * 小沢氏: [o z a w a ɕ] * 経済誌: [k eː z a i ɕ i] * 歯医者: [h a i ɕ a] * 観音寺市: [k a ɰ̃ o ɲ ʑ i ɕ i] Occurrences: 3,790 Examples: * らっしゃい: [ɾ a ɕː a i] * アッシャー: [a ɕː aː] * がっしり: [ɡ a ɕː i ɾʲ i] * 合衆国: [ɡ a ɕː ɨː k o k ɯ̥] Occurrences: 34,150 Examples: * 持ち味: [m o tɕ i a ʑ i] * 王子たち: [oː ʑ i t a tɕ i] * 平城宮: [h eː ʑ oː c ɨː] * グルジア語: [ɡ ɯ ɾ ɯ ʑ i a ɡ o] |
|||||
Fricative |
Occurrences: 26,333 Examples: * 含まね: [ɸ k ɯ m a n e] * 振り分け: [ɸ ɯ ɾʲ i w a k e] * 不道徳: [ɸ ɯ d oː t o k ɯ] * 後円墳: [k oː e ɴ ɸ ɯ ɴ] Occurrences: 1,793 Examples: * アルフィー: [a ɾ ɯ ɸʲ iː] * アラフィフ: [a ɾ a ɸʲ i ɸ ɯ̥] * フィア: [ɸʲ i a] * フィリップ: [ɸʲ i ɾʲ i pː ɯ] Occurrences: 8 Examples: * ミッフィー: [mʲ i ɸʲː iː] * バッフィー: [b a ɸʲː iː] Occurrences: 152 Examples: * ベグリッフ: [b e ɡ ɯ ɾʲ i ɸː ɯ] * スタッフ: [s ɨ t a ɸː ɯ] * ラッフルズ: [ɾ a ɸː ɯ ɾ ɯ z ɨ] |
Occurrences: 675 Examples: * バーサス: [v aː s a s ɨ] * ノヴェロ: [n o v e ɾ o] * アルヴァ: [a ɾ ɯ v a] * ヴァン: [v a ɴ] Occurrences: 281 Examples: * ヴィーナス: [vʲ iː n a s ɨ] * ヴィッツ: [vʲ i tsː ɨ] * ヴィオラ: [vʲ i o ɾ a] * ヴャジマ公: [vʲ a ʑ i m a k oː] |
Occurrences: 23,438 Examples: * 百ドル: [ç a k ɯ d o ɾ ɯ] * ひとまず: [ç t o m a z ɨ] * 光輝いて: [ç i k a ɾʲ i k a ɡ a j a i t e] * 左わき: [ç i d a ɾʲ i w a c i] Occurrences: 8 Examples: |
Occurrences: 55,324 Examples: * 入れば: [h a i ɾ e b a] * 話し方: [h a n a ɕ i k a t a] * 情報等: [dʑ oː h oː t oː] * 開放的: [k a i h oː t e c i̥] Occurrences: 59 Examples: * ゲルラッハ: [ɡ e ɾ ɯ ɾ a hː a] * ゼンパッハ: [dz e m p a hː a] * バッハ: [b a hː a] |
|||
Approximant |
Occurrences: 29,803 Examples: * 救われる: [s ɨ k ɯ w a ɾ e ɾ ɯ] * 使わず: [ts ɨ k a w a z ɨ] * ワンちゃん: [w a ɲ tɕ a ɴ] * 加古川市: [k a k o ɡ a w a ɕ i] Occurrences: 1 Examples: |
Occurrences: 53,417 Examples: * 易しく: [j a s a ɕ i k ɯ] * ヤフー: [j a ɸ ɯː] * 吉賀町: [j o ɕ i k a tɕ oː] * 偏った: [k a t a j o tː a] |
Occurrences: 36,021 Examples: * 万円強: [m a ɰ̃ e ɲ c oː] * ラーメン屋: [ɾ aː m e ɰ̃ j a] * ヴァンス: [v a ɰ̃ s ɨ] * 本社屋: [h o ɰ̃ ɕ a j a] |
||||
Tap |
Occurrences: 189,362 Examples: * 序列的: [dʑ o ɾ e ts ɨ t e c i] * 早良区: [s a w a ɾ a k ɯ] * よろこび: [j o ɾ o k o bʲ i] * エレアコ: [e ɾ e a k o] Occurrences: 65,243 Examples: * がっかり: [ɡ a kː a ɾʲ i] * 支給率: [ɕ i c ɨː ɾʲ i ts ɨ] * 建てたり: [t a t e t a ɾʲ i] * チューリヒ: [tɕ ɨː ɾʲ i ç i] Occurrences: 13 Examples: Occurrences: 35 Examples: * バガレッラ: [b a ɡ a ɾ e ɾː a] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 477,954 Examples: * 命じられた: [m eː ʑ i ɾ a ɾ e t a] * カチッと: [k a tɕ i tː o] * いりません: [i ɾʲ i m a s e ɴ] * 国東市: [k ɯ ɲ i s a c i ɕ i] Occurrences: 21,669 Examples: * チーム名: [tɕ iː m ɯ m eː] * 引いた: [ç iː t a] * リール: [ɾʲ iː ɾ ɯ] * チーム力: [tɕ iː m ɯ ɾʲ o k ɯ] Occurrences: 1,339 Examples: * 開放的: [k a i h oː t e c i̥] * 生徒たち: [s eː t o t a tɕ i̥] * 落とし: [o t o ɕ i̥] * 支払う: [ɕ i̥ h a ɾ a ɯ] |
Occurrences: 145,870 Examples: * イラついて: [i ɾ a ts ɨ i t e] * 脱ぎます: [n ɯ ɟ i m a s ɨ] * ひきずって: [ç i c i z ɨ tː e] * 支払います: [ɕ i h a ɾ a i m a s ɨ] Occurrences: 35,411 Examples: * 支給額: [ɕ i c ɨː ɡ a k ɯ] * 充電池: [dʑ ɨː d e ɲ tɕ i] * 遊歩道: [j ɨː h o d oː] * 真最中: [m a sː a i tɕ ɨː] Occurrences: 1,306 Examples: * やります: [j a ɾʲ i m a s ɨ̥] * みます: [mʲ i m a s ɨ̥] * 特殊的: [t o k ɕ ɨ̥ t e c i] * 出資者: [ɕ ɨ̥ ɕː i̥ ɕ a] |
Occurrences: 278,480 Examples: * パスカル: [p a s k a ɾ ɯ] * ふれさせ: [ɸ ɯ ɾ e s a s e] * 近づく: [tɕ i k a z ɨ k ɯ] * 買い込む: [k a i k o m ɯ] Occurrences: 5,603 Examples: * 制空権: [s eː k ɯː k e ɴ] * トゥルー: [t ɯ ɾ ɯː] * クーン: [k ɯː ɴ] * ムーン: [m ɯː ɴ] Occurrences: 1,202 Examples: * 九カ国: [c ɨː k a k o k ɯ̥] * 食ってる: [k ɯ̥ tː e ɾ ɯ] * ガクガク: [ɡ a k ɯ ɡ a k ɯ̥] * サクサク: [s a k ɯ̥ s a k ɯ̥] |
||
Close-Mid |
Occurrences: 248,778 Examples: * 渡された: [w a t a s a ɾ e t a] * 追い掛ける: [o i k a k e ɾ ɯ] * 挙げられる: [a ɡ e ɾ a ɾ e ɾ ɯ] * かかって: [k a k a tː e] Occurrences: 57,741 Examples: * チェーホフ: [tɕ eː h o ɸ ɯ] * 脱力系: [d a ts ɨ ɾʲ o k ɯ k eː] * 生徒さん: [s eː t o s a ɴ] * 十世紀: [dʑ ɨː s eː c i] |
Occurrences: 305,162 Examples: * 同じく: [o n a ʑ i k] * キリスト: [c i ɾʲ i s ɨ t o] * アストル: [a s ɨ t o ɾ ɯ] * 越生町: [k o ɕ i o tɕ oː] Occurrences: 155,521 Examples: * 武豊町: [t a k e t o j o tɕ oː] * 衝動的: [ɕ oː d oː t e c i] * 甲良町: [k oː ɾ a tɕ oː] * し庁内: [ɕ i tɕ oː n a i] |
|||
Open-Mid |
|||||
Open |
Occurrences: 608,366 Examples: * 共和党寄り: [c oː w a t oː j o ɾʲ i] * つくられた: [ts ɨ k ɯ ɾ a ɾ e t a] * 五十人弱: [ɡ o ʑ ɨː ɲ i ɲ dʑ a k ɯ] * 七メートル: [n a n a m eː t o ɾ ɯ] Occurrences: 19,644 Examples: * オギャー: [o ɟ aː] * フッカー: [ɸ ɯ kː aː] * アメーバー: [a m eː b aː] * ターナー: [t aː n aː] |