Korean MFA dictionary v2.0.0a#

  • Maintainer: Montreal Forced Aligner

  • Language: Korean

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 55,190

  • Phones: b d e h i j k m n o p s t tɕʰ tɕ͈ u w x ç ŋ ɐ ɕʰ ɕ͈ ɛ ɛː ɡ ɣ ɥ ɦ ɨ ɨː ɭ ɰ ɲ ɸ ɾ ʌ ʌː ʎ ʝ β

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_korean_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Korean MFA dictionary v2.0.0a},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Korean/Korean MFA dictionary v2_0_0a.html}},
	year={2022},
	month={May},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary korean_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Korean transcripts.

This dictionary uses the MFA phone set for Korean, and was used in training the Korean MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
15,386
Examples:
* 전남청장과:
[t ɕ ʌ n n ɐ m t ɕ ʰ ʌ ŋ d ʑ ɐ ŋ ɡ w ɐ]
* 깎으면서다:
[k ͈ ɐ k ͈ ɨ m j ʌ n s ʰ ʌ d ɐ]
* 연합뉴스:
[j ʌ ː n h ɐ m ɲ u s ɨ]
* 명품주연이:
[m j ʌ ː ŋ p ʰ u m d ʑ u j ʌ n i]
Occurrences:
29,049
Examples:
* 떠오르지만:
[t ͈ ʌ o ɾ ɨ d ʑ i m ɐ n]
* 신종관:
[ɕ ʰ i ː ɲ d ʑ o ŋ ɡ w ɐ n]
* 군인법:
[k u n i n p ʌ p ̚]
* 지나다:
[t ɕ i n ɐ d ɐ]
Occurrences:
3,504
Examples:
* 더민주로:
[t ʌ m i ɲ d ʑ u ɾ o]
* 산양을:
[s ʰ ɐ n ɲ ɐ ŋ ɨ ɭ]
* 원자폭탄:
[w ʌ ɲ d ʑ ɐ p ʰ o k ̚ t ʰ ɐ n]
* 찬장처럼:
[t ɕ ʰ ɐ ɲ d ʑ ɐ ŋ t ɕ ʰ ʌ ɾ ʌ m]
Occurrences:
16,063
Examples:
* 대우중공업:
[t ɛ ː u d ʑ u ŋ ɡ o ŋ ʌ p ̚]
* 도망하다:
[t o m ɐ ŋ h ɐ d ɐ]
* 행위자는:
[h ɛ ː ŋ ɥ i d ʑ ɐ n ɨ n]
* 우왕좌왕:
[u w ɐ ŋ d ʑ w ɐ w ɐ ŋ]

Stop

Occurrences:
5,477
Examples:
* 아청법이:
[ɐ t ɕ ʰ ʌ ŋ p ʌ b i]
* 보전하지:
[p o ː d ʑ ʌ n h ɐ d ʑ i]
* 연습장:
[j ʌ ː n s ʰ ɨ p ̚ t ɕ ʰ ɐ ŋ]
* 뻔했다:
[p ͈ ʌ ː n h ɛ ː t ̚ t ʰ ɐ]
Occurrences:
4,218
Examples:
* 소비절벽:
[s ʰ o b i d ʑ ʌ ɭ b j ʌ k ̚]
* 절반가량인:
[t ɕ ʌ ɭ b ɐ n ɡ ɐ ɾ j ɐ ŋ i n]
* 중수부로:
[t ɕ u ŋ s ʰ u b u ɾ o]
* 수사받고:
[s ʰ u s ʰ ɐ b ɐ t ̚ k ʰ o]
Occurrences:
3,988
Examples:
* 주지사:
[t ɕ u d ʑ i s ʰ ɐ]
* 사면대상:
[s ʰ ɐ m j ʌ n t ɛ ː s ʰ ɐ ŋ]
* 질환의:
[t ɕ i ɾ β w ɐ n ɰ i]
* 주소를:
[t ɕ u ː s ʰ o ɾ ɨ ɭ]
Occurrences:
9,674
Examples:
* 무장하다:
[m u ː d ʑ ɐ ŋ h ɐ d ɐ]
* 정동영:
[t ɕ ʌ ŋ d o ŋ j ʌ ŋ]
* 고영주가:
[k o j ʌ ŋ d ʑ u ɡ ɐ]
* 직원들을:
[t ɕ i ɡ w ʌ n d ɨ ɾ ɨ ɭ]
Occurrences:
9,060
Examples:
* 중학교까지:
[t ɕ u ŋ h ɐ k ̚ k ʰ j o k ͈ ɐ d ʑ i]
* 관행이다:
[k w ɐ n h ɛ ː ŋ i d ɐ]
* 구글이나:
[k u ɡ ɨ ɾ i n ɐ]
* 호흡기:
[ɸ o ɣ ɨ p ̚ k ʰ i]
Occurrences:
12,893
Examples:
* 다지기:
[t ɐ d ʑ i ɡ i]
* 수술하기로:
[s ʰ u s ʰ u ɾ h ɐ ɡ i ɾ o]
* 가게는:
[k ɐ ɡ e n ɨ n]
* 헝가리어:
[h ʌ ŋ ɡ ɐ ɾ i ʌ]

Affricate

Occurrences:
6,517
Examples:
* 입장입니다:
[i p ̚ t ɕ ʰ ɐ ŋ i m n i d ɐ]
* 접하기는:
[t ɕ ʌ p ʰ ɐ ɡ i n ɨ n]
* 자들을:
[t ɕ ɐ d ɨ ɾ ɨ ɭ]
* 잡스의:
[t ɕ ɐ p s ɨ ɰ i]
Occurrences:
8,545
Examples:
* 남자는:
[n ɐ m d ʑ ɐ n ɨ n]
* 많아지면:
[m ɐ n ɐ d ʑ i m j ʌ n]
* 달아줄:
[t ɐ ɾ ɐ d ʑ u ɭ]
* 수요자가:
[s ʰ u j o d ʑ ɐ ɡ ɐ]

Sibilant Plain

Occurrences:
2,056
Examples:
* 머릿속에:
[m ʌ ɾ i t ̚ s ʰ o ɡ e]
* 사로잡혔:
[s ʰ ɐ ɾ o d ʑ ɐ p ʰ j ʌ t ̚]
* 발산하고:
[p ɐ ɭ s ʰ ɐ n h ɐ ɡ o]
* 도서관까지:
[t o s ʰ ʌ ɡ w ɐ n k ͈ ɐ d ʑ i]

Tense

Occurrences:
1,132
Examples:
* 맡았던:
[m ɐ t ʰ ɐ s ͈ d ʌ n]
* 안했는데:
[ɐ n h ɛ ː s ͈ n ɨ n d e]
* 판단했다:
[p ʰ ɐ n d ɐ n h ɛ ː s ͈ d ɐ]
* 싸우면:
[s ͈ ɐ u m j ʌ n]
Occurrences:
768
Examples:
* 반짝반짝:
[p ɐ ɲ t ɕ ͈ ɐ k ̚ p ʰ ɐ ɲ t ɕ ͈ ɐ k ̚]
* 최씨를:
[t ɕ ʰ w e ː ɕ ͈ i ɾ ɨ ɭ]
* 졸업식:
[t ɕ o ɾ ʌ p ɕ ͈ i k ̚]
* 대학생이:
[t ɛ ː ɦ ɐ k ɕ ͈ ɛ ː ŋ i]

Aspirated

Occurrences:
11,969
Examples:
* 박종서:
[p ɐ k ̚ t ɕ ʰ o ŋ s ʰ ʌ]
* 불가사리:
[p u ɭ ɡ ɐ s ʰ ɐ ɾ i]
* 우주에서:
[u ː d ʑ u e s ʰ ʌ]
* 성서의:
[s ʰ ʌ ː ŋ s ʰ ʌ ɰ i]
Occurrences:
3,479
Examples:
* 고친다고:
[k o t ɕ ʰ i n d ɐ ɡ o]
* 등록취소:
[t ɨ ŋ n o k ̚ t ɕ ʰ ɥ i s ʰ o]
* 동하시면:
[t o ŋ h ɐ ɕ ʰ i m j ʌ n]
* 소방차:
[s ʰ o b ɐ ŋ t ɕ ʰ ɐ]

Fricative

Occurrences:
1,006
Examples:
* 화합물:
[ɸ w ɐ ɦ ɐ m m u ɭ]
* 황금색:
[ɸ w ɐ ŋ ɡ ɨ m s ʰ ɛ ː k]
* 화장실:
[ɸ w ɐ d ʑ ɐ ŋ ɕ ʰ i ɭ]
* 활용해:
[ɸ w ɐ ɾ j o ŋ h ɛ ː]
Occurrences:
454
Examples:
* 현실이다:
[ç ʌ n ɕ ʰ i ɾ i d ɐ]
* 협상에서:
[ç ʌ p s ɐ ŋ e s ʰ ʌ]
* 히치하이크:
[ç i t ɕ ʰ i ɦ ɐ i k ʰ ɨ]
* 히틀러의:
[ç i t ʰ ɨ ɭ ɭ ʌ ɰ i]
Occurrences:
831
Examples:
* 예상했다:
[j e ː s ʰ ɐ ŋ ʝ ɛ ː t ̚ t ʰ ɐ]
* 경건히:
[k j ʌ ː ŋ ɡ ʌ n ʝ i]
* 대혁명:
[t ɛ ː ʝ ʌ ŋ m j ʌ ŋ]
* 소형차:
[s ʰ o ː ʝ ʌ ŋ t ɕ ʰ ɐ]
Occurrences:
4,360
Examples:
* 학생들이랑:
[h ɐ k ɕ ͈ ɛ ː ŋ d ɨ ɾ i ɾ ɐ ŋ]
* 한다든지:
[h ɐ n d ɐ d ɨ ɲ d ʑ i]
* 하이픈:
[h ɐ i p ʰ ɨ n]
* 직면했기:
[t ɕ i ŋ m j ʌ n h ɛ ː s ͈ ɡ i]
Occurrences:
2,087
Examples:
* 파괴하다:
[p ʰ ɐ ɡ w e ɦ ɐ d ɐ]
* 맞이한다는:
[m ɐ d ʑ i ɦ ɐ n d ɐ n ɨ n]
* 공고한:
[k o ŋ ɡ o ɦ ɐ n]
* 지하철이:
[t ɕ i ɦ ɐ t ɕ ʰ ʌ ɾ i]

Approximant

Occurrences:
7,169
Examples:
* 기회라고:
[k i β w e ɾ ɐ ɡ o]
* 배워서:
[p ɛ ː w ʌ s ʰ ʌ]
* 운동권:
[u ː n d o ŋ ɡ w ʌ n]
* 일깨워주고:
[i ɭ k ͈ ɛ ː w ʌ d ʑ u ɡ o]
Occurrences:
11,865
Examples:
* 옮겨서:
[o m ɡ j ʌ s ʰ ʌ]
* 자장면:
[t ɕ ɐ d ʑ ɐ ŋ m j ʌ n]
* 정육면체:
[t ɕ ʌ ŋ j u ŋ m j ʌ ɲ t ɕ ʰ e]
* 유니코드:
[j u n i k ʰ o d ɨ]
Occurrences:
1,113
Examples:
* 뒤집기:
[t ɥ i d ʑ i p ̚ k ʰ i]
* 뛰어가다:
[t ͈ ɥ i ʌ ɡ ɐ d ɐ]
* 뛰었어요:
[t ͈ ɥ i ʌ s ʌ j o]
* 최고위원이:
[t ɕ ʰ w e ː ɡ o ɥ i w ʌ n i]
Occurrences:
1,685
Examples:
* 우리들의:
[u ɾ i d ɨ ɾ ɰ i]
* 모두의:
[m o d u ɰ i]
* 예술계의:
[j e ː s ʰ u ɭ ɡ j e ɰ i]
* 소수의:
[s ʰ o ː s ʰ u ɰ i]

Tap

Occurrences:
11,901
Examples:
* 결의문:
[k j ʌ ɾ ɰ i m u n]
* 배상하라고:
[p ɛ ː s ʰ ɐ ŋ h ɐ ɾ ɐ ɡ o]
* 오늘의:
[o n ɨ ɾ ɰ i]
* 곡소리가:
[k o k s o ɾ i ɡ ɐ]

Lateral

Occurrences:
3,276
Examples:
* 전략가:
[t ɕ ʌ ː ʎ ʎ ɐ k ̚ k ʰ ɐ]
* 실린다:
[ɕ ʰ i ʎ ʎ i n d ɐ]
* 올려주지:
[o ʎ ʎ ʌ d ʑ u d ʑ i]
* 빈소설치:
[p i n s o s ʰ ʌ ʎ t ɕ ʰ i]

Lateral Tap

Occurrences:
11,020
Examples:
* 언론학:
[ʌ ɭ ɭ o n h ɐ k ̚]
* 월세소득이:
[w ʌ ɭ s e s ʰ o d ɨ ɡ i]
* 국회를:
[k u k ʰ w e ɾ ɨ ɭ]
* 거기를:
[k ʌ ː ɡ i ɾ ɨ ɭ]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
27,922
Examples:
* 비타민:
[p i t ʰ ɐ m i n]
* 얼룩진:
[ʌ ɭ ɭ u k ̚ t ɕ ʰ i n]
* 찢어지는:
[t ɕ ͈ i d ʑ ʌ d ʑ i n ɨ n]
* 트로이카:
[t ʰ ɨ ɾ o i k ʰ ɐ]
Occurrences:
1,756
Examples:
* 피해서:
[p ʰ i ː ɦ ɛ ː s ʰ ʌ]
* 의용군:
[ɰ i ː j o ŋ ɡ u n]
* 시효로:
[ɕ ʰ i ː ʝ o ɾ o]
* 이쑤시개:
[i ː s ͈ u ɕ ʰ i ɡ ɛ ː]
Occurrences:
17,136
Examples:
* 제어력을:
[t ɕ e ʌ ɾ j ʌ ɡ ɨ ɭ]
* 법안은:
[p ʌ b ɐ n ɨ n]
* 받아들이:
[p ɐ d ɐ d ɨ ɾ i]
* 타타르:
[t ʰ ɐ t ʰ ɐ ɾ ɨ]
Occurrences:
239
Examples:
* 응모자:
[ɨ ː ŋ m o d ʑ ɐ]
* 긍정적:
[k ɨ ː ŋ d ʑ ʌ ŋ d ʑ ʌ k ̚]
* 금지해놔서:
[k ɨ ː m d ʑ i ɦ ɛ ː n w ɐ s ʰ ʌ]
* 끌리다:
[k ͈ ɨ ː ʎ ʎ i d ɐ]
Occurrences:
14,232
Examples:
* 북아메리카:
[p u ɡ ɐ m e ɾ i k ʰ ɐ]
* 모체로부터:
[m o t ɕ ʰ e ɾ o b u t ʰ ʌ]
* 주택수요를:
[t ɕ u t ̚ ɛ ː k s u j o ɾ ɨ ɭ]
* 중남미:
[t ɕ u ŋ n ɐ m m i]
Occurrences:
1,047
Examples:
* 구급대원을:
[k u ː ɡ ɨ p ̚ t ʰ ɛ ː w ʌ n ɨ ɭ]
* 우정국:
[u ː d ʑ ʌ ŋ ɡ u k ̚]
* 주민센터:
[t ɕ u ː m i n s e n t ʰ ʌ]
* 구경거리:
[k u ː ɡ j ʌ ŋ ɡ ʌ ɾ i]

Close-Mid

Occurrences:
9,175
Examples:
* 세실의:
[s ʰ e ɕ ʰ i ɾ ɰ i]
* 정부에:
[t ɕ ʌ ŋ b u e]
* 제안하다:
[t ɕ e ɐ n h ɐ d ɐ]
* 청결하게:
[t ɕ ʰ ʌ ŋ ɡ j ʌ ɾ h ɐ ɡ e]
Occurrences:
1,251
Examples:
* 최고위:
[t ɕ ʰ w e ː ɡ o ɥ i]
* 최고위원을:
[t ɕ ʰ w e ː ɡ o ɥ i w ʌ n ɨ ɭ]
* 제작하다:
[t ɕ e ː d ʑ ɐ k ʰ ɐ d ɐ]
* 인쇄업자:
[i n s ʰ w e ː ʌ p ̚ t ɕ ʰ ɐ]
Occurrences:
18,471
Examples:
* 노인병학:
[n o ː i n b j ʌ ŋ h ɐ k ̚]
* 빠지고:
[p ͈ ɐ d ʑ i ɡ o]
* 신동욱:
[ɕ ʰ i n d o ŋ u k ̚]
* 고민만:
[k o m i n m ɐ n]
Occurrences:
1,770
Examples:
* 소득세:
[s ʰ o ː d ɨ k s e]
* 소상공인:
[s ʰ o ː s ʰ ɐ ŋ ɡ o ŋ i n]
* 통행인:
[t ʰ o ː ŋ h ɛ ː ŋ i n]
* 보신탕:
[p o ː ɕ ʰ i n t ʰ ɐ ŋ]

Open-Mid

Occurrences:
7
Examples:
* 재래시장:
[t ɕ ɛ ː ɾ ɛ ː ɕ ʰ i d ʑ ɐ ŋ]
* 답했다:
[t ɐ p ʰ ɛ ː t ̚ t ʰ ɐ]
* 직거래:
[t ɕ i k ̚ k ʰ ʌ ɾ ɛ ː]
* 개국에서:
[k ɛ ː ɡ u ɡ e s ʰ ʌ]
Occurrences:
7,620
Examples:
* 무턱대고:
[m u t ʰ ʌ k ̚ t ʰ ɛ ː ɡ o]
* 백업으로:
[p ɛ ː ɡ ʌ b ɨ ɾ o]
* 대란이:
[t ɛ ː ɾ ɐ n i]
* 처해졌:
[t ɕ ʰ ʌ ː ɦ ɛ ː d ʑ ʌ t ̚]
Occurrences:
25,045
Examples:
* 스러웠:
[s ʰ ɨ ɾ ʌ w ʌ t ̚]
* 최욱철:
[t ɕ ʰ w e u k ̚ t ɕ ʰ ʌ ɭ]
* 현금지급기:
[ç ʌ n ɡ ɨ m d ʑ i ɡ ɨ p ̚ k ʰ i]
* 수개월이:
[s ʰ u k ɛ ː w ʌ ɾ i]
Occurrences:
2,436
Examples:
* 어휘장:
[ʌ ː h ɥ i d ʑ ɐ ŋ]
* 전경련:
[t ɕ ʌ ː n ɡ j ʌ ŋ ɲ ʌ n]
* 검찰은:
[k ʌ ː m t ɕ ʰ ɐ ɾ ɨ n]
* 서씨가:
[s ʰ ʌ ː ɕ ͈ i ɡ ɐ]
Occurrences:
42,801
Examples:
* 오스트리아:
[o s ʰ ɨ t ʰ ɨ ɾ i ɐ]
* 오염되다:
[o ː j ʌ m d w e d ɐ]
* 청취했다:
[t ɕ ʰ ʌ ŋ t ɕ ʰ ɥ i ɦ ɛ ː s ͈ d ɐ]
* 세포학:
[s ʰ e ː p ʰ o ɦ ɐ k ̚]

Open