Korean MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Korean

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 54,074

  • Phones: b d e h i j k m n o p s t tɕʰ tɕ͈ u w x ç ŋ ɐ ɕʰ ɕ͈ ɡ ɣ ɥ ɦ ɨ ɨː ɭ ɰ ɲ ɸ ɾ ʌ ʌː ʎ ʝ β

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_korean_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Korean MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Korean/Korean MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary korean_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Korean transcripts.

This dictionary uses the MFA phone set for Korean, and was used in training the Korean MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
15,980
Examples:
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
* 비망록:
[p m ɐ ŋ n o ]
* 몰아갈:
[m o ɾ ɐ ɡ ɐ ɭ]
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
Occurrences:
30,413
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 니트릴:
[n i ɨ ɾ i ɭ]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
Occurrences:
3,657
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 산적한:
[ ɐ ɲ ʌ ɐ n]
* 순자산:
[ u ɲ ɐ ɐ n]
* 뉴스타파:
[ɲ u s ɨ ɐ ɐ]
Occurrences:
16,306
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 비망록:
[p m ɐ ŋ n o ]
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
* 복용량:
[p o ɡ j o ŋ ɲ ɐ ŋ]

Stop

Occurrences:
5,660
Examples:
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 불란서:
[p u ɭ ɭ ɐ n ʌ]
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
* 비망록:
[p m ɐ ŋ n o ]
Occurrences:
4,326
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 일베에:
[i ɭ b e e]
* 사법처리:
[ ɐ b ʌ tɕʰ ʌ ɾ i]
* 출발해:
[tɕʰ u ɭ b ɐ ɾ h ]
Occurrences:
4,120
Examples:
* 던져줌:
[t ʌː ɲ ʌ u m]
* 다하십시오:
[t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o]
* 들어가다:
[t ɨ ɾ ʌ ɡ ɐ d ɐ]
* 돗자리:
[t o tɕʰ ɐ ɾ i]
Occurrences:
10,317
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 지적하다:
[ i ʌ ɐ d ɐ]
* 따라가다:
[ ɐ ɾ ɐ ɡ ɐ d ɐ]
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
Occurrences:
9,368
Examples:
* 가지게:
[k ɐ i ɡ e]
* 경험치:
[k j ʌ ŋ h ʌ m tɕʰ i]
* 건너갔:
[k ʌː n n ʌ ɡ ɐ ]
* 목소리도:
[m o k s o ɾ i d o]
Occurrences:
13,510
Examples:
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 따라가다:
[ ɐ ɾ ɐ ɡ ɐ d ɐ]
* 털기가:
[ ʌ ɭ ɡ i ɡ ɐ]

Affricate

Occurrences:
6,818
Examples:
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 지적하다:
[ i ʌ ɐ d ɐ]
* 순자산:
[ u ɲ ɐ ɐ n]
Occurrences:
8,815
Examples:
* 시키는거죠:
[ɕʰ n ɨ n ɡ ʌ o]
* 산적한:
[ ɐ ɲ ʌ ɐ n]
* 지적하다:
[ i ʌ ɐ d ɐ]
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]

Sibilant Plain

Occurrences:
2,334
Examples:
* 뉴스타파:
[ɲ u s ɨ ɐ ɐ]
* 했을까:
[h s ɨ ɭ ɐ]
* 목소리도:
[m o k s o ɾ i d o]
* 발사된:
[p ɐ ɭ s ɐ d w e n]

Tense

Occurrences:
1,120
Examples:
* 주고받았고:
[ u ɡ o b ɐ d ɐ ɡ o]
* 글쓴이:
[k ɨ ɭ ɨ n i]
* 쌍동이:
[ ɐ ŋ d o ŋ i]
* 취했기:
[tɕʰ ɥ ɦ ɡ i]
Occurrences:
800
Examples:
* 리터씩:
[ɾ i ʌ ɕ͈ i ]
* 다하십시오:
[t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o]
* 절약시킬:
[ ʌ ɾ j ɐ k ɕ͈ i i ɭ]
* 한식집을:
[h ɐ n ɕ͈ i tɕʰ i m ɨ ɭ]

Aspirated

Occurrences:
12,406
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 산적한:
[ ɐ ɲ ʌ ɐ n]
* 불란서:
[p u ɭ ɭ ɐ n ʌ]
* 순자산:
[ u ɲ ɐ ɐ n]
Occurrences:
3,660
Examples:
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 곳이다:
[k o ɕʰ i d ɐ]
* 다하십시오:
[t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o]
* 감시를:
[k ɐ m ɕʰ i ɾ ɨ ɭ]

Fricative

Occurrences:
1,021
Examples:
* 획득하다:
[ɸ w e ɨ ɐ d ɐ]
* 호가호위를:
[ɸ o ɡ ɐ β o ɥ i ɾ ɨ ɭ]
* 홀로서기:
[ɸ o ɭ ɭ o ʌ ɡ i]
* 환경을:
[ɸ w ɐ n ɡ j ʌ ŋ ɨ ɭ]
Occurrences:
477
Examples:
* 희생양:
[ç i ŋ j ɐ ŋ]
* 향하게:
[ç ɐ ŋ h ɐ ɡ e]
* 현상은:
[ç ʌ n ɐ ŋ ɨ n]
* 현명할:
[ç ʌ n m j ʌ ŋ h ɐ ɭ]
Occurrences:
859
Examples:
* 고혈압처럼:
[k o ʝ ʌ ɾ ɐ tɕʰ ʌ ɾ ʌ m]
* 무수히:
[m u ʝ i]
* 표현은:
[ j o ʝ ʌ n ɨ n]
* 대한변협은:
[t ɦ ɐ n b j ʌ n ʝ ʌ ɨ n]
Occurrences:
4,632
Examples:
* 출발해:
[tɕʰ u ɭ b ɐ ɾ h ]
* 경험치:
[k j ʌ ŋ h ʌ m tɕʰ i]
* 했을까:
[h s ɨ ɭ ɐ]
* 헤르츠:
[h e ɾ ɨ tɕʰ ɨ]
Occurrences:
2,207
Examples:
* 불가피한:
[p u ɭ ɡ ɐ i ɦ ɐ n]
* 위치하고:
[ɥ i tɕʰ i ɦ ɐ ɡ o]
* 다하십시오:
[t ɐ ɦ ɐ ɕʰ i p ɕ͈ i o]
* 연기한다고:
[j ʌː n ɡ i ɦ ɐ n d ɐ ɡ o]

Approximant

Occurrences:
7,410
Examples:
* 발견되다:
[p ɐ ɭ ɡ j ʌ n d w e d ɐ]
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
* 부위원장:
[p u ɥ i w ʌ ɲ ɐ ŋ]
* 발사된:
[p ɐ ɭ s ɐ d w e n]
Occurrences:
12,379
Examples:
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
* 발견되다:
[p ɐ ɭ ɡ j ʌ n d w e d ɐ]
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
* 보이려고:
[p o i ɾ j ʌ ɡ o]
Occurrences:
1,129
Examples:
* 위치하고:
[ɥ i tɕʰ i ɦ ɐ ɡ o]
* 부위원장:
[p u ɥ i w ʌ ɲ ɐ ŋ]
* 호가호위를:
[ɸ o ɡ ɐ β o ɥ i ɾ ɨ ɭ]
* 방위산업체:
[p ɐ ŋ ɥ i ɐ n ʌ tɕʰ e]
Occurrences:
1,777
Examples:
* 미립자의:
[m i ɾ i tɕʰ ɐ ɰ i]
* 몰도바의:
[m o ɭ d o b ɐ ɰ i]
* 강남의:
[k ɐ ŋ n ɐ m ɰ i]
* 면발의:
[m j ʌ n b ɐ ɾ ɰ i]

Tap

Occurrences:
12,333
Examples:
* 니트릴:
[n i ɨ ɾ i ɭ]
* 따라가다:
[ ɐ ɾ ɐ ɡ ɐ d ɐ]
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
* 사법처리:
[ ɐ b ʌ tɕʰ ʌ ɾ i]

Lateral

Occurrences:
3,338
Examples:
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 출점을:
[tɕʰ u ʎ ʌ m ɨ ɭ]
* 글리세린:
[k ɨ ʎ ʎ i e ɾ i n]
* 반려동물을:
[p ɐ ʎ ʎ ʌ d o ŋ m u ɾ ɨ ɭ]

Lateral Tap

Occurrences:
11,579
Examples:
* 일베에:
[i ɭ b e e]
* 니트릴:
[n i ɨ ɾ i ɭ]
* 불란서:
[p u ɭ ɭ ɐ n ʌ]
* 털기가:
[ ʌ ɭ ɡ i ɡ ɐ]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
29,148
Examples:
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 일베에:
[i ɭ b e e]
* 니트릴:
[n i ɨ ɾ i ɭ]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
Occurrences:
1,829
Examples:
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 비망록:
[p m ɐ ŋ n o ]
* 이츠키:
[ tɕʰ ɨ i]
* 취했기:
[tɕʰ ɥ ɦ ɡ i]
Occurrences:
18,341
Examples:
* 니트릴:
[n i ɨ ɾ i ɭ]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 뉴스타파:
[ɲ u s ɨ ɐ ɐ]
* 얼스터:
[ʌ ɭ ɨ ʌ]
Occurrences:
242
Examples:
* 금지시켜:
[k ɨː m i ɕʰ i j ʌ]
* 끌린다:
[ ɨː ʎ ʎ i n d ɐ]
* 긍정적으로:
[k ɨː ŋ ʌ ŋ ʌ ɡ ɨ ɾ o]
* 끌어왔:
[ ɨː ɾ ʌ w ɐ ]
Occurrences:
14,492
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 불란서:
[p u ɭ ɭ ɐ n ʌ]
* 순자산:
[ u ɲ ɐ ɐ n]
* 뉴스타파:
[ɲ u s ɨ ɐ ɐ]
Occurrences:
1,086
Examples:
* 무수히:
[m u ʝ i]
* 경우는:
[k j ʌː ŋ n ɨ n]
* 준호씨:
[ n β o ɕ͈ i]
* 부가가치세:
[p ɡ ɐ ɡ ɐ tɕʰ i e]

Close-Mid

Occurrences:
9,604
Examples:
* 일베에:
[i ɭ b e e]
* 가지게:
[k ɐ i ɡ e]
* 발견되다:
[p ɐ ɭ ɡ j ʌ n d w e d ɐ]
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
Occurrences:
9,168
Examples:
* 예상됩니다:
[j ɐ ŋ d w e m n i d ɐ]
* 출발해:
[tɕʰ u ɭ b ɐ ɾ h ]
* 했을까:
[h s ɨ ɭ ɐ]
* 잘못했다는:
[ ɐ ɭ m o ɐ n ɨ n]
Occurrences:
19,381
Examples:
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 보도자료만:
[p o d o ɐ ɾ j o m ɐ n]
* 비망록:
[p m ɐ ŋ n o ]
* 흑고니:
[x ɨ o n i]
Occurrences:
1,819
Examples:
* 공천에서:
[k ŋ tɕʰ ʌ n e ʌ]
* 요인들을:
[j i n d ɨ ɾ ɨ ɭ]
* 공급하기:
[k ŋ ɡ ɨ h ɐ ɡ i]
* 소득공제를:
[ d ɨ o ŋ e ɾ ɨ ɭ]

Open-Mid

Occurrences:
26,002
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 시키는거죠:
[ɕʰ i n ɨ n ɡ ʌ o]
* 산적한:
[ ɐ ɲ ʌ ɐ n]
Occurrences:
2,526
Examples:
* 건너갔:
[k ʌː n n ʌ ɡ ɐ ]
* 던져줌:
[t ʌː ɲ ʌ u m]
* 연구관이:
[j ʌː n ɡ u ɡ w ɐ n i]
* 없었을:
[ʌː p s ʌ s ɨ ɭ]
Occurrences:
44,370
Examples:
* 서북청년단:
[ ʌ b u tɕʰ ʌ ŋ ɲ ʌ n d ɐ n]
* 발전기:
[p ɐ ʎ ʌ n ɡ i]
* 산적한:
[ ɐ ɲ ʌ ɐ n]
* 지적하다:
[ i ʌ ɐ d ɐ]

Open