Japanese MFA dictionary v2.0.1a#
@techreport{mfa_japanese_mfa_dictionary_2023,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Japanese MFA dictionary v2.0.1a},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Japanese/Japanese MFA dictionary v2_0_1a.html}},
year={2023},
month={Jan},
}
G2P models Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary japanese_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/japanese/mfa/Japanese MFA dictionary v2_0_1a.dict).
Intended use#
This dictionary is intended for forced alignment of Japanese transcripts.
This dictionary uses the MFA phone set for Japanese, and was used in training the Japanese MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Palatal |
Velar |
Uvular |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 6,024 Examples: * まくる: [m a k ɯ ɾ ɯ] * 近江八幡: [oː mʲ i h a tɕ i m a ɴ] * とんでも: [t o n d e m o] * 如きもの: [ɡ o t o c i m o n o] Occurrences: 1,452 Examples: * 海沿い: [ɯ mʲ i z o i] * 南相馬: [mʲ i n a mʲ i s oː m a] * 見守り: [mʲ i m a m o ɾʲ i] * 明るみ: [a k a ɾ ɯ mʲ i] Occurrences: 43 Examples: * シンミリ: [ɕ i mʲː i ɾʲ i] Occurrences: 119 Examples: * まんま: [m a mː a] * ジレンマ: [dʑ i ɾ e mː a] * 真ん前: [m a mː a e] * ヴェンメン: [v e mː e ɰ̃] |
Occurrences: 5,210 Examples: * モーガン: [m oː ɡ a n] * マグネット: [m a ɡ ɯ n e tː o] * guns: [ɡ a n dz ɨ] * 狙って: [n e ɾ a tː e] Occurrences: 168 Examples: * すんな: [s ɨ nː a] * 大なれ: [d a nː a ɾ e] * まん中: [m a nː a k a] * どんな: [d o nː a] |
Occurrences: 2,562 Examples: * ワクチン: [w a k ɯ̥ tɕ i ɲ] * アンチ: [a ɲ tɕ i] * のんき: [n o ɲ c i] * シーニック: [ɕ iː ɲ i kː ɯ] Occurrences: 79 Examples: * ニンニク: [ɲ i ɲː i k] * コンニャク: [k o ɲː a k ɯ] * にんにく: [ɲ i ɲː i k ɯ] * なんに: [n a ɲː i] |
Occurrences: 2,101 Examples: * スーザン: [s ɨː z a ŋ] * パンク: [p a ŋ k ɯ] * カンニング: [k a ɲː i ŋ ɡ ɯ] * クウェーン: [k ɯ w eː ŋ] |
Occurrences: 2,625 Examples: * 一トン: [i tː o ɴ] * オリン: [o ɾʲ i ɴ] * アメリカン: [a m e ɾʲ i k a ɴ] * シャイアン: [ɕ a i a ɴ] Occurrences: 5 Examples: * うーん: [ɴː] |
||
Stop |
Occurrences: 1,115 Examples: * スポンサー: [s ɨ p o ɰ̃ s aː] * ピレネー: [pʲ i ɾ e nː eː] * ピオリア: [pʲ i o ɾʲ i a] * パトリシア: [p a t o ɾʲ i ɕ i a] Occurrences: 305 Examples: * 酔っぱらい: [j o pː a ɾ a i] * ストップ: [s ɨ̥ t o pː ɯ̥] * ホップ: [h o pː ɯ̥] * いっぺん: [i pː e ɰ̃] Occurrences: 2,817 Examples: * バルーン: [b a ɾ ɯː ŋ] * ビッグ: [bʲ i ɡː ɯ] * アリバイ: [a ɾʲ i b a i] * ブレーク: [b ɯ ɾ eː k ɯ̥] Occurrences: 1 Examples: * やばい: [j a bː eː] |
Occurrences: 6,811 Examples: * ニチイ: [ɲ i tɕ iː] * 子育て: [k o s o d a t e] * 行って: [i tː e] * とろろ: [t o ɾ o ɾ o] Occurrences: 1,060 Examples: * のぼった: [n o b o tː a] * 貯まった: [t a m a tː a] * まちがって: [m a tɕ i ɡ a tː e] * ありったけ: [a ɾʲ i tː a k e] Occurrences: 2,931 Examples: * デビュー: [d e bʲ ɨː] * タンザニア: [t a n dz a ɲ i a] * ジャリ: [dʑ a ɾʲ i] * まるで: [m a ɾ ɯ d e] Occurrences: 29 Examples: * ピラミッド: [pʲ i ɾ a mʲ i dː o] * デッドキー: [d e dː o c iː] * レッド: [ɾ e dː o] * ゴッド: [ɡ o dː o] |
Occurrences: 4,140 Examples: * スキイ: [s ɨ c i i] * sqb: [e s ɨ c ɨː bʲ iː] * メキシコ: [m e c ɕ k o] * 教科書: [c oː k a ɕ o] Occurrences: 142 Examples: * すっきり: [s ɨ̥ cː i ɾʲ i] * ひっきり: [ç i cː i ɾʲ i] * 大っきく: [o cː i k ɯ] * っきり: [cː i ɾʲ i] Occurrences: 604 Examples: * ギレスピー: [ɟ i ɾ e s ɨ pʲ iː] * ぎょっと: [ɟ o tː o] * ギテガ: [ɟ i t e ɡ a] * つなぎ: [ts ɨ n a ɟ i] Occurrences: 1 Examples: * マッギー: [m a ɟː iː] |
Occurrences: 14,171 Examples: * 政財界: [s eː z a i k a i] * クール: [k ɯː ɾ ɯ] * コンウェイ: [k o ɰ̃ w e i] * 替わる: [k a w a ɾ ɯ] Occurrences: 747 Examples: * どっかり: [d o kː a ɾʲ i] * 引っかき: [ç i kː a c i] * 撤回し: [t e kː a i ɕ i] * スナック: [s ɨ n a kː ɯ̥] Occurrences: 3,340 Examples: * 大げさ: [oː ɡ e s a] * 手掛かり: [t e ɡ a k a ɾʲ i] * モデルガン: [m o d e ɾ ɯ ɡ a m] * すぐれ: [s ɨ ɡ ɯ ɾ e] Occurrences: 21 Examples: * すごい: [s ɨ ɡː eː] * エッグ: [e ɡː ɯ] * ウィッグ: [w i ɡː ɯ] * ドッグ: [d o ɡː ɯ] |
Occurrences: 17 Examples: * わるっ: [w a ɾ ɯ ʔ] * ギュッ: [ɟ ɨ ʔ] * おおっ: [oː ʔ] * うまい: [ɯ m a ʔ] |
||
Affricate |
Occurrences: 3,406 Examples: * そいつ: [s o i ts ɨ̥] * つけよう: [ts ɨ k e j oː] * 手伝った: [t e ts ɨ d a tː a] * アーツ: [aː ts ɨ] Occurrences: 47 Examples: * 突っつく: [ts ɨ tsː ɨ k ɯ] * やっつけ: [j a tsː ɨ̥ k e] * フィッツ: [ɸʲ i tsː] * いっつ: [i tsː ɨ] Occurrences: 649 Examples: * づきあい: [dz ɨ c i a i] * 半蔵門: [h a n dz oː m o ɴ] * ずっしり: [dz ɨ ɕː i ɾʲ i] * ざんまい: [dz a mː a i] Occurrences: 3 Examples: * キッズ: [c i dzː ɨ] * グッズ: [ɡ ɯ dzː ɨ] * ぜんぜん: [dz e n dzː e ɴ] |
Occurrences: 2,443 Examples: * リチャード: [ɾʲ i tɕ aː d o] * 賃上げ: [tɕ i ɰ̃ a ɡ e] * チューリヒ: [tɕ ɨː ɾʲ i ç i] * 飛ばっちり: [t o b a tɕː i ɾʲ i] Occurrences: 154 Examples: * 一長一短: [i tɕː oː i tː a ŋ] * マッチング: [m a tɕː i ŋ ɡ ɯ] * 小っちゃい: [tɕ i̥ tɕː eː] * スイッチ: [s ɨ i tɕː] Occurrences: 984 Examples: * ジェリー: [dʑ e ɾʲ iː] * ジグソー: [dʑ i ɡ ɯ s oː] * ジェラルド: [dʑ e ɾ a ɾ ɯ d o] * ジャック: [dʑ a kː ɯ] Occurrences: 9 Examples: * カレッジ: [k a ɾ e dʑː i] * ヘッジ: [h e dʑː i] * ビレッジ: [bʲ i ɾ e dʑː i] * レッジ: [ɾ e dʑː i] |
|||||
Sibilant |
Occurrences: 8,206 Examples: * マスタード: [m a s ɨ t aː d o] * マックス: [m a kː ɯ s ɨ] * スポーン: [s ɨ p oː ɴ] * ストーカー: [s ɨ̥ t oː k aː] Occurrences: 231 Examples: * くっついて: [k ɯ tsː ɨ i t e] * 一センチ: [i sː e ɲ tɕ i] * フィッツ: [ɸʲ i tsː] * 没する: [b o sː ɨ ɾ ɯ] Occurrences: 1,651 Examples: * ブラザーズ: [b ɯ ɾ a z aː z ɨ] * 恥ずかしい: [h a z ɨ k a ɕ iː] * バンザイ: [b a n dz a i] * バイザー: [b a i z aː] |
Occurrences: 6,728 Examples: * シュー: [ɕ ɨː] * 台無し: [d a i n a ɕ i] * ピンチ: [pʲ i ɲ tɕ i̥] * チャーター: [tɕ aː t aː] Occurrences: 265 Examples: * ボッシ: [b o ɕː i] * スケッチ: [s k e tɕː i̥] * フィッシュ: [ɸʲ i ɕː ɨ] * 独りぼっち: [ç i t o ɾʲ i b o tɕː i] Occurrences: 1,814 Examples: * 初めて: [h a ʑ i m e t e] * ngo: [e n ɯ ʑ iː oː] * おやじ: [o j a ʑ i] * じょうぶ: [ʑ oː b ɯ] |
|||||
Fricative |
Occurrences: 1,494 Examples: * 踏んで: [ɸ ɯ n d e] * アウフゲ: [a ɯ ɸ ɯ ɡ e] * ファスト: [ɸ a s ɨ t o] * ファイヴ: [ɸ a i v ɯ] Occurrences: 69 Examples: * フィル: [ɸʲ i ɾ ɯ] * フィジー: [ɸʲ i ʑ iː] * フィッシュ: [ɸʲ i ɕː ɨ] * カダフィ: [k a d a ɸʲ i] Occurrences: 1 Examples: * ダッフィ: [d a ɸʲː i] Occurrences: 10 Examples: * ベグリッフ: [b e ɡ ɯ ɾʲ i ɸː ɯ̥] * ラッフルズ: [ɾ a ɸː ɯ ɾ ɯ z ɨ] * スタッフ: [s t a ɸː] |
Occurrences: 32 Examples: * ヴャジマ: [vʲ a ʑ i m a] * レヴァン: [ɾ e v a ɴ] * ヴェーザー: [v eː z aː] * ヴィシュヌ: [vʲ i ɕ ɨ n ɯ] Occurrences: 19 Examples: * catv: [ɕ iː eː tʲ iː vʲ i] * リヴィウ: [ɾʲ i vʲ i ɯ] * ヴャジマ: [vʲ a ʑ i m a] * ヴィシュヌ: [vʲ i ɕ ɨ n ɯ] |
Occurrences: 1,151 Examples: * ひろげ: [ç i ɾ o ɡ e] * ひいて: [ç iː t e] * ヒーター: [ç iː t aː] * 引いた: [ç iː t a] |
Occurrences: 2,516 Examples: * はいり: [h a i ɾʲ i] * ほんと: [h o n t o] * 法省令: [h oː s eː ɾ eː] * 懸け橋: [k a k e h a ɕ i] Occurrences: 2 Examples: * ゼンパッハ: [dz e m p a hː a] * バッハ: [b a hː a] |
|||
Approximant |
Occurrences: 1,474 Examples: * 申し訳: [m oː ɕ i w a k e] * スウェット: [s ɨ w e tː o] * まわって: [m a w a tː e] * 壊れる: [k o w a ɾ e ɾ ɯ] |
Occurrences: 2,406 Examples: * やたら: [j a t a ɾ a] * 見よう: [mʲ i j oː] * 沈みゆく: [ɕ i z ɨ mʲ i j ɨ k ɯ] * かゆい: [k a j ɨ i] |
Occurrences: 2,755 Examples: * エチレン: [e tɕ i ɾ e ɰ̃] * ベルグソン: [b e ɾ ɯ ɡ ɯ s o ɰ̃] * エッセンス: [e sː e ɰ̃ s ɨ] * オレゴン: [o ɾ e ɡ o ɰ̃] |
||||
Tap |
Occurrences: 6,087 Examples: * ラテン: [ɾ a t e ɲ] * 繰返さ: [k ɯ ɾʲ i k a e s a] * イコール: [i k oː ɾ ɯ] * エリオット: [e ɾʲ i o tː o] Occurrences: 2,568 Examples: * チャリング: [tɕ a ɾʲ i ŋ ɡ ɯ] * エミリー: [e mʲ i ɾʲ iː] * 盛りあげる: [m o ɾʲ i a ɡ e ɾ ɯ] * 送り状: [o k ɯ ɾʲ i ʑ oː] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 19,507 Examples: * 追い詰め: [o i ts ɨ m e] * 拾った: [ç i ɾ o tː a] * 東広島: [ç i ɡ a ɕ i ç i ɾ o ɕ i m a] * ワサビ: [w a s a bʲ i] Occurrences: 879 Examples: * リピート: [ɾʲ i pʲ iː t o] * 新しい: [a t a ɾ a ɕ iː] * ディーラー: [dʲ iː ɾ aː] * アリーナ: [a ɾʲ iː n a] Occurrences: 2,349 Examples: * 執よう: [ɕ i̥ ts ɨ j oː] * 然らざれ: [ɕ i̥ k a ɾ a z a ɾ e] * 押し掛け: [o ɕ i̥ k a k e] * きたして: [c i̥ t a ɕ i̥ t e] |
Occurrences: 5,721 Examples: * すっごい: [s ɨ ɡː o i] * 起こす: [o k o s ɨ̥] * けん銃: [k e ɲ dʑ ɨː] * リンス: [ɾʲ i ɰ̃ s ɨ] Occurrences: 1,542 Examples: * ヒューズ: [ç ɨː z ɨ] * コミューン: [k o mʲ ɨː ŋ] * 富士通: [ɸ ɯ ʑ i ts ɨː] * スーダン: [s ɨː d a ɴ] Occurrences: 1,709 Examples: * ホノリウス: [h o n o ɾʲ i ɯ s ɨ̥] * スペシャル: [s ɨ̥ p e ɕ a ɾ ɯ] * ハリス: [h a ɾʲ i s ɨ̥] * マスタード: [m a s ɨ̥ t aː d o] |
Occurrences: 9,690 Examples: * 炒める: [i t a m e ɾ ɯ] * 受け取る: [ɯ k e t o ɾ ɯ] * うなずい: [ɯ n a z ɨ i] * ぬるぬる: [n ɯ ɾ ɯ n ɯ ɾ ɯ] Occurrences: 232 Examples: * クープ: [k ɯː p ɯ] * 封じ込めれ: [ɸ ɯː ʑ i k o m e ɾ e] * ループ: [ɾ ɯː p ɯ] * フード: [ɸ ɯː d o] Occurrences: 2,264 Examples: * 福知山: [ɸ ɯ̥ k ɯ tɕ i j a m a] * 振って: [ɸ ɯ̥ tː e] * 読みふける: [j o mʲ i ɸ ɯ̥ k e ɾ ɯ] * 草分け: [k ɯ̥ s a w a k e] |
||
Close-Mid |
Occurrences: 10,685 Examples: * けんすい: [k e ɰ̃ s ɨ i] * メージャー: [m eː ʑ aː] * ゆがめ: [j ɨ ɡ a m e] * はかれ: [h a k a ɾ e] Occurrences: 2,247 Examples: * ケージ: [k eː ʑ i] * ベーシス: [b eː ɕ i s ɨ] * 安芸高田: [a ɡ eː a t a] * グレーター: [ɡ ɯ ɾ eː t aː] |
Occurrences: 12,811 Examples: * 乏しい: [t o b o ɕ iː] * クボタ: [k ɯ b o t a] * フォルク: [ɸ o ɾ ɯ k ɯ] * スローガン: [s ɨ ɾ oː ɡ a ɰ̃] Occurrences: 5,471 Examples: * レチノール: [ɾ e tɕ i n oː ɾ ɯ] * なろう: [n a ɾ oː] * きのう: [c i n oː] * コービー: [k oː bʲ iː] |
|||
Open-Mid |
|||||
Open |
Occurrences: 26,310 Examples: * 日高川: [ç i d a k a k a w a] * せがむ: [s e ɡ a m ɯ] * ハーベイ: [h aː b e i] * 誤って: [a j a m a tː e] Occurrences: 954 Examples: * ミスター: [mʲ i s ɨ t aː] * 棚上げ: [t a n aː ɡ e] * オファー: [o ɸ aː] * ワーナー: [w aː n aː] |