Korean (Jamo) MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Korean

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 54,093

  • Phones: b d e h i j k m n o p s t tɕʰ tɕ͈ u w x ç ŋ ɐ ɕʰ ɕ͈ ɡ ɣ ɥ ɦ ɨ ɨː ɭ ɰ ɲ ɸ ɾ ʌ ʌː ʎ ʝ β

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_korean_jamo_mfa_dictionary_2022,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={Korean (Jamo) MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Korean/Korean (Jamo) MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary korean_jamo_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Korean transcripts.

This dictionary uses the MFA phone set for Korean, and was used in training the Korean MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Alveolar

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
15,143
Examples:
* ㅁㅗㄷㅜ:
[m o d u]
* ㅁㅗㅍㅣ:
[m o i]
* ㅁㅣㄴㅇㅛ:
[m i ɲ o]
* ㄴㅏㅁㅁㅐ:
[n ɐ m m ]
Occurrences:
28,729
Examples:
* ㄴㅗㅇㅈㅣ:
[n o ŋ i]
* ㄴㅏㅁㅁㅐ:
[n ɐ m m ]
* ㄴㅐㄱㅓ:
[n ɡ ʌ]
* ㄴㅐㄱㅏㄱ:
[n ɡ ɐ ]
Occurrences:
3,463
Examples:
* ㅁㅣㄴㅇㅛ:
[m i ɲ o]
* ㅋㅗㄴㅑㄱ:
[ o ɲ ɐ ]
* ㅈㅣㄴㅈㅏ:
[ i ɲ ɐ]
* ㄴㅏㅁㄴㅕ:
[n ɐ m ɲ ʌ]
Occurrences:
15,801
Examples:
* ㄴㅗㅇㅈㅣ:
[n o ŋ i]
* ㄸㅏㅇㅇㅣ:
[ ɐ ŋ i]
* ㅇㅘㅇㅈㅏ:
[w ɐ ŋ ɐ]
* ㅂㅣㅇㅅㅜ:
[p i ŋ u]

Stop

Occurrences:
5,399
Examples:
* ㅂㅣㅇㅅㅜ:
[p i ŋ u]
* ㅂㅏㅇㅟ:
[p ɐ ɥ i]
* ㅂㅗㄴㅐㅆ:
[p o n ]
* ㅂㅏㄷㅡㄴ:
[p ɐ d ɨ n]
Occurrences:
4,167
Examples:
* ㅇㅖㅂㅗ:
[j e b o]
* ㄲㅏㅂㅔㄹ:
[ ɐ b e ɭ]
* ㅈㅜㄴㅂㅣ:
[ n b i]
* ㅅㅔㅂㅜㄴ:
[ e b u n]
Occurrences:
3,929
Examples:
* ㄷㅓㄹㄷㅏ:
[t ʌː ɭ d ɐ]
* ㄷㅔㄴㅣ:
[t e n i]
* ㄷㅙㄷㅗ:
[t w d o]
* ㄷㅡㅇㅍㅣ:
[t ɨ ŋ i]
Occurrences:
9,538
Examples:
* ㅁㅗㄷㅜ:
[m o d u]
* ㅈㅝㄷㅗ:
[ w ʌ d o]
* ㄴㅏㄷㅏㅁ:
[n ɐ d ɐ m]
* ㅂㅏㄷㅡㄴ:
[p ɐ d ɨ n]
Occurrences:
8,928
Examples:
* ㄱㅣㅇㅏ:
[k i ɐ]
* ㄱㅣㄹㄹㅐ:
[k i ɭ ɭ ]
* ㄱㅡㄴㅔ:
[k ɨː n e]
* ㅅㅗㄱㅅㅔ:
[ o k s e]
Occurrences:
12,741
Examples:
* ㄴㅐㄱㅓ:
[n ɡ ʌ]
* ㄴㅐㄱㅏㄱ:
[n ɡ ɐ ]
* ㅈㅓㄱㅡㅁ:
[ ʌː ɡ ɨ m]
* ㅁㅏㄱㅜ:
[m ɐ ɡ u]

Affricate

Occurrences:
6,425
Examples:
* ㅈㅓㄱㅡㅁ:
[ ʌː ɡ ɨ m]
* ㅈㅝㄷㅗ:
[ w ʌ d o]
* ㅈㅣㄴㅈㅏ:
[ i ɲ ɐ]
* ㅈㅜㄴㅂㅣ:
[ n b i]
Occurrences:
8,444
Examples:
* ㄴㅗㅇㅈㅣ:
[n o ŋ i]
* ㅇㅘㅇㅈㅏ:
[w ɐ ŋ ɐ]
* ㅃㅏㅈㅣㅁ:
[ ɐ i m]
* ㅇㅠㅈㅗ:
[j u o]

Sibilant Plain

Occurrences:
2,055
Examples:
* ㅅㅗㄱㅅㅔ:
[ o k s e]
* ㅌㅏㄹㅅㅔ:
[ ɐ ɭ s e]
* ㅅㅓㄹㅅㅏ:
[ ʌ ɭ s ɐ]
* ㅎㅐㅆㅇㅓ:
[h s ʌ]

Tense

Occurrences:
1,094
Examples:
* ㅆㅡㄹㅜ:
[ ɨ ɾ u]
* ㅆㅡㅍㅜ:
[ ɨ u]
* ㅇㅝㄴㅆㅜ:
[w ʌ n u]
* ㅆㅏㅇㅁㅜ:
[ ɐ ŋ m u]
Occurrences:
765
Examples:
* ㅂㅜㄹㅆㅣ:
[p ɭ ɕ͈ i]
* ㅅㅣㅇㅔ:
[ɕ͈ i e]
* ㅁㅗㄳㅇㅣ:
[m o k ɕ͈ i]
* ㅁㅜㄴㅆㅣ:
[m ɭ ɕ͈ i]

Aspirated

Occurrences:
11,829
Examples:
* ㅂㅣㅇㅅㅜ:
[p i ŋ u]
* ㅎㅜㅅㅏㅁ:
[ɸ ɐ m]
* ㅅㅏㅅㅡㄹ:
[ ɐ ɨ ɭ]
* ㅊㅜㅅㅜ:
[tɕʰ u u]
Occurrences:
3,428
Examples:
* ㅅㅣㅅㅜㄹ:
[ɕʰ i u ɭ]
* ㅅㅣㄱㅂㅣ:
[ɕʰ i i]
* ㅁㅏㅅㅣ:
[m ɐ ɕʰ i]
* ㅅㅣㅊㅏ:
[ɕʰ tɕʰ ɐ]

Fricative

Occurrences:
1,000
Examples:
* ㅎㅜㅅㅏㅁ:
[ɸ ɐ m]
* ㅎㅗㄹㅗ:
[ɸ o ɾ o]
* ㅎㅘㅅㅡㅇ:
[ɸ w ɐ ɨ ŋ]
* ㅎㅘㅇㅎㅜ:
[ɸ w ɐ ŋ β u]
Occurrences:
447
Examples:
* ㅎㅠㅇㄱㅣ:
[ç u ŋ ɡ i]
* ㅎㅕㄴㅈㅐ:
[ç ʌ ɲ ]
* ㅎㅕㄴㅈㅣ:
[ç ʌː ɲ i]
* ㅎㅢㅎㅘ:
[ç i β w ɐ]
Occurrences:
829
Examples:
* ㅇㅕㅇㅎㅢ:
[j ʌ ŋ ʝ i]
* ㄱㅣㅎㅕㅇ:
[k i ʝ ʌ ŋ]
* ㅅㅣㅎㅖ:
[ɕʰ i ʝ e]
* ㄱㅏㅎㅐㅆ:
[k ɐ ʝ ]
Occurrences:
4,340
Examples:
* ㅌㅐㄱㅎㅐ:
[ k h ]
* ㅎㅏㄱㅣㄴ:
[h ɐ ɡ i n]
* ㅎㅏㅇㅇㅢ:
[h ɐ ŋ ɰ i]
* ㅎㅟㅂㅏㄹ:
[h ɥ i b ɐ ɭ]
Occurrences:
2,079
Examples:
* ㅊㅣㅎㅏ:
[tɕʰ i ɦ ɐ]
* ㅇㅏㅎㅏ:
[ɐ ɦ ɐ]
* ㄷㅐㅎㅏㄴ:
[t ɦ ɐ n]
* ㄷㅗㅎㅏ:
[t o ɦ ɐ]

Approximant

Occurrences:
7,116
Examples:
* ㅇㅘㅇㅈㅏ:
[w ɐ ŋ ɐ]
* ㅈㅝㄷㅗ:
[ w ʌ d o]
* ㅇㅚㅊㅣ:
[w tɕʰ i]
* ㅇㅏㄴㅇㅘ:
[ɐ n w ɐ]
Occurrences:
11,681
Examples:
* ㅇㅖㅂㅗ:
[j e b o]
* ㅇㅖㅎㅜ:
[j β u]
* ㅁㅕㄴㅇㅢ:
[m j ʌː n ɰ i]
* ㅁㅓㄱㅇㅕ:
[m ʌ ɡ j ʌ]
Occurrences:
1,108
Examples:
* ㅂㅏㅇㅟ:
[p ɐ ɥ i]
* ㅅㅏㅇㅟ:
[ ɐ ɥ i]
* ㄱㅟㅊㅜ:
[k ɥ i tɕʰ u]
* ㅇㅟㄱㅏ:
[ɥ i ɡ ɐ]
Occurrences:
1,682
Examples:
* ㅁㅕㄴㅇㅢ:
[m j ʌː n ɰ i]
* ㄱㅗㅇㅢ:
[k o ɰ i]
* ㅇㅢㅁㅜㄴ:
[ɰ i m u n]
* ㅎㅏㅇㅇㅢ:
[h ɐ ŋ ɰ i]

Tap

Occurrences:
11,736
Examples:
* ㅆㅡㄹㅜ:
[ ɨ ɾ u]
* ㅇㅏㅀㅇㅣ:
[ɐ ɾ i]
* ㅎㅗㄹㅗ:
[ɸ o ɾ o]
* ㅇㅜㄹㅣㅁ:
[u ɾ i m]

Lateral

Occurrences:
3,213
Examples:
* ㅂㅜㄹㄹㅕ:
[p u ʎ ʎ ʌ]
* ㅁㅓㄹㄹㅣ:
[m ʌː ʎ ʎ i]
* ㄱㅘㄴㄹㅣ:
[k w ɐ ʎ ʎ i]
* ㅅㅣㄹㄴㅐ:
[ɕʰ i ʎ ʎ ]

Lateral Tap

Occurrences:
10,883
Examples:
* ㄱㅣㄹㄹㅐ:
[k i ɭ ɭ ]
* ㅅㅏㅅㅡㄹ:
[ ɐ ɨ ɭ]
* ㅅㅣㅅㅜㄹ:
[ɕʰ i u ɭ]
* ㅇㅓㄹㄷㅏ:
[ʌː ɭ d ɐ]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
27,539
Examples:
* ㄴㅗㅇㅈㅣ:
[n o ŋ i]
* ㄸㅏㅇㅇㅣ:
[ ɐ ŋ i]
* ㅁㅗㅍㅣ:
[m o i]
* ㅁㅣㄴㅇㅛ:
[m i ɲ o]
Occurrences:
1,730
Examples:
* ㅇㅣㅎㅗ:
[ β o]
* ㅅㅣㅊㅏ:
[ɕʰ tɕʰ ɐ]
* ㅅㅟㅇㅜㄴ:
[ ɥ u n]
* ㅇㅣㅁㄴㅏ:
[ m n ɐ]
Occurrences:
16,928
Examples:
* ㅆㅡㄹㅜ:
[ ɨ ɾ u]
* ㅈㅓㄱㅡㅁ:
[ ʌː ɡ ɨ m]
* ㅇㅓㄴㅡ:
[ʌ n ɨ]
* ㅅㅏㅅㅡㄹ:
[ ɐ ɨ ɭ]
Occurrences:
233
Examples:
* ㄱㅡㄴㅔ:
[k ɨː n e]
* ㅇㅡㅇㅡㅁ:
[ɨː ɨ m]
* ㅇㅡㅇㅣㅇ:
[ɨː i ŋ]
* ㄱㅡㅇㅈㅣ:
[k ɨː ŋ i]
Occurrences:
13,991
Examples:
* ㅆㅡㄹㅜ:
[ ɨ ɾ u]
* ㅁㅗㄷㅜ:
[m o d u]
* ㅂㅣㅇㅅㅜ:
[p i ŋ u]
* ㅇㅖㅎㅜ:
[j β u]
Occurrences:
1,038
Examples:
* ㅎㅜㅅㅏㅁ:
[ɸ ɐ m]
* ㅈㅜㄴㅂㅣ:
[ n b i]
* ㅅㅜㅂㅏㄴ:
[ b ɐ n]
* ㅈㅜㅇㅣㄹ:
[ i ɭ]

Close-Mid

Occurrences:
9,056
Examples:
* ㅇㅖㅂㅗ:
[j e b o]
* ㄱㅡㄴㅔ:
[k ɨː n e]
* ㅅㅗㄱㅅㅔ:
[ o k s e]
* ㄲㅏㅂㅔㄹ:
[ ɐ b e ɭ]
Occurrences:
8,707
Examples:
* ㄴㅏㅁㅁㅐ:
[n ɐ m m ]
* ㄴㅐㄱㅓ:
[n ɡ ʌ]
* ㄴㅐㄱㅏㄱ:
[n ɡ ɐ ]
* ㅇㅖㅎㅜ:
[j β u]
Occurrences:
18,232
Examples:
* ㄴㅗㅇㅈㅣ:
[n o ŋ i]
* ㅁㅗㄷㅜ:
[m o d u]
* ㅁㅗㅍㅣ:
[m o i]
* ㅁㅣㄴㅇㅛ:
[m i ɲ o]
Occurrences:
1,742
Examples:
* ㄱㅛㅍㅕㄴ:
[k j j ʌ n]
* ㅇㅗㅎㅗ:
[ β o]
* ㄱㅗㅇㅈㅏ:
[k ŋ ɐ]
* ㅇㅗㅇㅎㅗ:
[ ŋ β o]

Open-Mid

Occurrences:
24,653
Examples:
* ㄴㅐㄱㅓ:
[n ɡ ʌ]
* ㅇㅓㄴㅡ:
[ʌ n ɨ]
* ㅈㅝㄷㅗ:
[ w ʌ d o]
* ㄸㅓㅁㅏㅌ:
[ ʌ m ɐ ]
Occurrences:
2,398
Examples:
* ㅈㅓㄱㅡㅁ:
[ ʌː ɡ ɨ m]
* ㅁㅕㄴㅇㅢ:
[m j ʌː n ɰ i]
* ㅅㅓㄴㄴㅐ:
[ ʌː n n ]
* ㅂㅕㄴㅁㅗ:
[p j ʌː n m o]
Occurrences:
42,211
Examples:
* ㄸㅏㅇㅇㅣ:
[ ɐ ŋ i]
* ㄴㅏㅁㅁㅐ:
[n ɐ m m ]
* ㅇㅏㅀㅇㅣ:
[ɐ ɾ i]
* ㅇㅘㅇㅈㅏ:
[w ɐ ŋ ɐ]

Open