Korean MFA dictionary v2.0.0#
@techreport{mfa_korean_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Korean MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Korean/Korean MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
G2P models Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary korean_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Korean transcripts.
This dictionary uses the MFA phone set for Korean, and was used in training the Korean MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|
Nasal |
Occurrences: 15,980 Examples: * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] * 비망록: [p iː m ɐ ŋ n o k̚] * 몰아갈: [m o ɾ ɐ ɡ ɐ ɭ] * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] |
Occurrences: 30,413 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 니트릴: [n i tʰ ɨ ɾ i ɭ] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] |
Occurrences: 3,657 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 산적한: [sʰ ɐ ɲ dʑ ʌ kʰ ɐ n] * 순자산: [sʰ u ɲ tɕ ɐ sʰ ɐ n] * 뉴스타파: [ɲ u s ɨ tʰ ɐ pʰ ɐ] |
Occurrences: 16,306 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 비망록: [p iː m ɐ ŋ n o k̚] * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] * 복용량: [p o ɡ j o ŋ ɲ ɐ ŋ] |
||
Stop |
Occurrences: 5,660 Examples: * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 불란서: [p u ɭ ɭ ɐ n sʰ ʌ] * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] * 비망록: [p iː m ɐ ŋ n o k̚] Occurrences: 4,326 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 일베에: [i ɭ b e e] * 사법처리: [sʰ ɐ b ʌ p̚ tɕʰ ʌ ɾ i] * 출발해: [tɕʰ u ɭ b ɐ ɾ h eː] |
Occurrences: 4,120 Examples: * 던져줌: [t ʌː ɲ dʑ ʌ dʑ u m] * 다하십시오: [t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o] * 들어가다: [t ɨ ɾ ʌ ɡ ɐ d ɐ] * 돗자리: [t o t̚ tɕʰ ɐ ɾ i] Occurrences: 10,317 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 지적하다: [tɕ i dʑ ʌ kʰ ɐ d ɐ] * 따라가다: [t͈ ɐ ɾ ɐ ɡ ɐ d ɐ] * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] |
Occurrences: 9,368 Examples: * 가지게: [k ɐ dʑ i ɡ e] * 경험치: [k j ʌ ŋ h ʌ m tɕʰ i] * 건너갔: [k ʌː n n ʌ ɡ ɐ t̚] * 목소리도: [m o k s o ɾ i d o] Occurrences: 13,510 Examples: * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 따라가다: [t͈ ɐ ɾ ɐ ɡ ɐ d ɐ] * 털기가: [tʰ ʌ ɭ ɡ i ɡ ɐ] |
|||
Affricate |
Occurrences: 6,818 Examples: * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 지적하다: [tɕ i dʑ ʌ kʰ ɐ d ɐ] * 순자산: [sʰ u ɲ tɕ ɐ sʰ ɐ n] Occurrences: 8,815 Examples: * 시키는거죠: [ɕʰ iː kʰ iː n ɨ n ɡ ʌ dʑ o] * 산적한: [sʰ ɐ ɲ dʑ ʌ kʰ ɐ n] * 지적하다: [tɕ i dʑ ʌ kʰ ɐ d ɐ] * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] |
|||||
Sibilant Plain |
Occurrences: 2,334 Examples: * 뉴스타파: [ɲ u s ɨ tʰ ɐ pʰ ɐ] * 했을까: [h eː s ɨ ɭ k͈ ɐ] * 목소리도: [m o k s o ɾ i d o] * 발사된: [p ɐ ɭ s ɐ d w e n] |
|||||
Tense |
Occurrences: 1,120 Examples: * 주고받았고: [tɕ u ɡ o b ɐ d ɐ s͈ ɡ o] * 글쓴이: [k ɨ ɭ s͈ ɨ n i] * 쌍동이: [s͈ ɐ ŋ d o ŋ i] * 취했기: [tɕʰ ɥ iː ɦ eː s͈ ɡ i] |
Occurrences: 800 Examples: * 리터씩: [ɾ i tʰ ʌ ɕ͈ i k̚] * 다하십시오: [t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o] * 절약시킬: [tɕ ʌ ɾ j ɐ k ɕ͈ i kʰ i ɭ] * 한식집을: [h ɐ n ɕ͈ i k̚ tɕʰ i m ɨ ɭ] |
||||
Aspirated |
Occurrences: 12,406 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 산적한: [sʰ ɐ ɲ dʑ ʌ kʰ ɐ n] * 불란서: [p u ɭ ɭ ɐ n sʰ ʌ] * 순자산: [sʰ u ɲ tɕ ɐ sʰ ɐ n] |
Occurrences: 3,660 Examples: * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 곳이다: [k o ɕʰ i d ɐ] * 다하십시오: [t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o] * 감시를: [k ɐ m ɕʰ i ɾ ɨ ɭ] |
||||
Fricative |
Occurrences: 1,021 Examples: * 획득하다: [ɸ w e k̚ tʰ ɨ kʰ ɐ d ɐ] * 호가호위를: [ɸ o ɡ ɐ β o ɥ i ɾ ɨ ɭ] * 홀로서기: [ɸ o ɭ ɭ o sʰ ʌ ɡ i] * 환경을: [ɸ w ɐ n ɡ j ʌ ŋ ɨ ɭ] |
Occurrences: 477 Examples: * 희생양: [ç i sʰ eː ŋ j ɐ ŋ] * 향하게: [ç ɐ ŋ h ɐ ɡ e] * 현상은: [ç ʌ n sʰ ɐ ŋ ɨ n] * 현명할: [ç ʌ n m j ʌ ŋ h ɐ ɭ] Occurrences: 859 Examples: * 고혈압처럼: [k o ʝ ʌ ɾ ɐ p̚ tɕʰ ʌ ɾ ʌ m] * 무수히: [m uː sʰ u ʝ i] * 표현은: [pʰ j o ʝ ʌ n ɨ n] * 대한변협은: [t eː ɦ ɐ n b j ʌ n ʝ ʌ pʰ ɨ n] |
Occurrences: 4,632 Examples: * 출발해: [tɕʰ u ɭ b ɐ ɾ h eː] * 경험치: [k j ʌ ŋ h ʌ m tɕʰ i] * 했을까: [h eː s ɨ ɭ k͈ ɐ] * 헤르츠: [h e ɾ ɨ tɕʰ ɨ] Occurrences: 2,207 Examples: * 불가피한: [p u ɭ ɡ ɐ pʰ i ɦ ɐ n] * 위치하고: [ɥ i tɕʰ i ɦ ɐ ɡ o] * 다하십시오: [t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o] * 연기한다고: [j ʌː n ɡ i ɦ ɐ n d ɐ ɡ o] |
|||
Approximant |
Occurrences: 7,410 Examples: * 발견되다: [p ɐ ɭ ɡ j ʌ n d w e d ɐ] * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] * 부위원장: [p u ɥ i w ʌ ɲ dʑ ɐ ŋ] * 발사된: [p ɐ ɭ s ɐ d w e n] |
Occurrences: 12,379 Examples: * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] * 발견되다: [p ɐ ɭ ɡ j ʌ n d w e d ɐ] * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] * 보이려고: [p o i ɾ j ʌ ɡ o] Occurrences: 1,129 Examples: * 위치하고: [ɥ i tɕʰ i ɦ ɐ ɡ o] * 부위원장: [p u ɥ i w ʌ ɲ dʑ ɐ ŋ] * 호가호위를: [ɸ o ɡ ɐ β o ɥ i ɾ ɨ ɭ] * 방위산업체: [p ɐ ŋ ɥ i sʰ ɐ n ʌ p̚ tɕʰ e] |
Occurrences: 1,777 Examples: * 미립자의: [m i ɾ i p̚ tɕʰ ɐ ɰ i] * 몰도바의: [m o ɭ d o b ɐ ɰ i] * 강남의: [k ɐ ŋ n ɐ m ɰ i] * 면발의: [m j ʌ n b ɐ ɾ ɰ i] |
|||
Tap |
Occurrences: 12,333 Examples: * 니트릴: [n i tʰ ɨ ɾ i ɭ] * 따라가다: [t͈ ɐ ɾ ɐ ɡ ɐ d ɐ] * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] * 사법처리: [sʰ ɐ b ʌ p̚ tɕʰ ʌ ɾ i] |
|||||
Lateral |
Occurrences: 3,338 Examples: * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 출점을: [tɕʰ u ʎ dʑ ʌ m ɨ ɭ] * 글리세린: [k ɨ ʎ ʎ i sʰ e ɾ i n] * 반려동물을: [p ɐ ʎ ʎ ʌ d o ŋ m u ɾ ɨ ɭ] |
|||||
Lateral Tap |
Occurrences: 11,579 Examples: * 일베에: [i ɭ b e e] * 니트릴: [n i tʰ ɨ ɾ i ɭ] * 불란서: [p u ɭ ɭ ɐ n sʰ ʌ] * 털기가: [tʰ ʌ ɭ ɡ i ɡ ɐ] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 29,148 Examples: * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 일베에: [i ɭ b e e] * 니트릴: [n i tʰ ɨ ɾ i ɭ] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] Occurrences: 1,829 Examples: * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 비망록: [p iː m ɐ ŋ n o k̚] * 이츠키: [iː tɕʰ ɨ kʰ i] * 취했기: [tɕʰ ɥ iː ɦ eː s͈ ɡ i] |
Occurrences: 18,341 Examples: * 니트릴: [n i tʰ ɨ ɾ i ɭ] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 뉴스타파: [ɲ u s ɨ tʰ ɐ pʰ ɐ] * 얼스터: [ʌ ɭ sʰ ɨ tʰ ʌ] Occurrences: 242 Examples: * 금지시켜: [k ɨː m dʑ i ɕʰ i kʰ j ʌ] * 끌린다: [k͈ ɨː ʎ ʎ i n d ɐ] * 긍정적으로: [k ɨː ŋ dʑ ʌ ŋ dʑ ʌ ɡ ɨ ɾ o] * 끌어왔: [k͈ ɨː ɾ ʌ w ɐ t̚] |
Occurrences: 14,492 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 불란서: [p u ɭ ɭ ɐ n sʰ ʌ] * 순자산: [sʰ u ɲ tɕ ɐ sʰ ɐ n] * 뉴스타파: [ɲ u s ɨ tʰ ɐ pʰ ɐ] Occurrences: 1,086 Examples: * 무수히: [m uː sʰ u ʝ i] * 경우는: [k j ʌː ŋ uː n ɨ n] * 준호씨: [tɕ uː n β o ɕ͈ i] * 부가가치세: [p uː ɡ ɐ ɡ ɐ tɕʰ i sʰ e] |
||
Close-Mid |
Occurrences: 9,604 Examples: * 일베에: [i ɭ b e e] * 가지게: [k ɐ dʑ i ɡ e] * 발견되다: [p ɐ ɭ ɡ j ʌ n d w e d ɐ] * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] Occurrences: 9,168 Examples: * 예상됩니다: [j eː sʰ ɐ ŋ d w e m n i d ɐ] * 출발해: [tɕʰ u ɭ b ɐ ɾ h eː] * 했을까: [h eː s ɨ ɭ k͈ ɐ] * 잘못했다는: [tɕ ɐ ɭ m o tʰ eː t̚ tʰ ɐ n ɨ n] |
Occurrences: 19,381 Examples: * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 보도자료만: [p o d o dʑ ɐ ɾ j o m ɐ n] * 비망록: [p iː m ɐ ŋ n o k̚] * 흑고니: [x ɨ k̚ kʰ o n i] Occurrences: 1,819 Examples: * 공천에서: [k oː ŋ tɕʰ ʌ n e sʰ ʌ] * 요인들을: [j oː i n d ɨ ɾ ɨ ɭ] * 공급하기: [k oː ŋ ɡ ɨ p̚ h ɐ ɡ i] * 소득공제를: [sʰ oː d ɨ k̚ kʰ o ŋ dʑ e ɾ ɨ ɭ] |
|||
Open-Mid |
Occurrences: 26,002 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 시키는거죠: [ɕʰ iː kʰ i n ɨ n ɡ ʌ tɕ o] * 산적한: [sʰ ɐ ɲ dʑ ʌ kʰ ɐ n] Occurrences: 2,526 Examples: * 건너갔: [k ʌː n n ʌ ɡ ɐ t̚] * 던져줌: [t ʌː ɲ dʑ ʌ dʑ u m] * 연구관이: [j ʌː n ɡ u ɡ w ɐ n i] * 없었을: [ʌː p s ʌ s ɨ ɭ] |
||||
Occurrences: 44,370 Examples: * 서북청년단: [sʰ ʌ b u k̚ tɕʰ ʌ ŋ ɲ ʌ n d ɐ n] * 발전기: [p ɐ ʎ tɕ ʌ n ɡ i] * 산적한: [sʰ ɐ ɲ dʑ ʌ kʰ ɐ n] * 지적하다: [tɕ i dʑ ʌ kʰ ɐ d ɐ] |
|||||
Open |