Ukrainian CV dictionary v2.0.0#
@misc{Ahn_Chodroff_2022,
author={Ahn, Emily and Chodroff, Eleanor},
title={VoxCommunis Corpus},
address={\url{https://osf.io/t957v}},
publisher={OSF},
year={2022},
month={Jan}
}
Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary ukrainian_cv
Or download from the release page.
Intended use#
This dictionary is intended for forced alignment of Ukrainian transcripts.
This dictionary uses the XPF phone set for Ukrainian, and was used in training the Ukrainian XPF acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Alveopalatal |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 7,726 Examples: * місця: [m i s t sʲ a] * мiсця: [m i s t sʲ a] * гамка: [ɦ a m k a] * комар: [k ɔ m a r] |
Occurrences: 12,410 Examples: * кияни: [k ɪ j a n ɪ] * ворон: [v ɔ r ɔ n] * вниз: [v n ɪ z] * віник: [v i n ɪ k] Occurrences: 3,275 Examples: * шумні: [ʃ u m nʲ i] * мені: [m ɛ nʲ i] * кузня: [k u z nʲ a] * їхніх: [j i x nʲ i x] |
|||||
Stop |
Occurrences: 8,051 Examples: * спаду: [s p a d u] * плода: [p l ɔ d a] * порід: [p ɔ rʲ i d] * пощо: [p ɔ ʃ tʃ ɔ] Occurrences: 3,679 Examples: * біг: [b i ɦ] * білки: [b i l k ɪ] * бiльш: [b i lʲ ʃ] * робив: [r ɔ b ɪ v] |
Occurrences: 11,613 Examples: * атлас: [a t l a s] * місця: [m i s t sʲ a] * мiсця: [m i s t sʲ a] * отуди: [ɔ t u d ɪ] Occurrences: 6,665 Examples: * спаду: [s p a d u] * душ: [d u ʃ] * плода: [p l ɔ d a] * порід: [p ɔ rʲ i d] |
Occurrences: 9,041 Examples: * кияни: [k ɪ j a n ɪ] * гамка: [ɦ a m k a] * таки: [t a k ɪ] * комар: [k ɔ m a r] Occurrences: 49 Examples: * манґи: [m a n ɡ ɪ] * ґазда: [ɡ a z d a] * фанґ: [f a n ɡ] * ґанок: [ɡ a n ɔ k] |
||||
Affricate |
Occurrences: 3,847 Examples: * пощо: [p ɔ ʃ tʃ ɔ] * череп: [tʃ ɛ r ɛ p] * чужий: [tʃ u ʒ ɪ j] * чутно: [tʃ u t n ɔ] Occurrences: 228 Examples: * бджіл: [b dʒ i l] * пейдж: [p ɛ j dʒ] * джекі: [dʒ ɛ k i] * джолі: [dʒ ɔ lʲ i] |
||||||
Sibilant |
Occurrences: 8,439 Examples: * атлас: [a t l a s] * слів: [s lʲ i v] * спаду: [s p a d u] * місця: [m i s t sʲ a] Occurrences: 5,102 Examples: * місця: [m i s t sʲ a] * мiсця: [m i s t sʲ a] * заєць: [z a j ɛ t sʲ] * якусь: [j a k u sʲ] Occurrences: 5,087 Examples: * вниз: [v n ɪ z] * зри: [z r ɪ] * заєць: [z a j ɛ t sʲ] * кузня: [k u z nʲ a] Occurrences: 389 Examples: * крізь: [k rʲ i zʲ] * возів: [v ɔ zʲ i v] * крiзь: [k r i zʲ] * дузю: [d u zʲ u] |
Occurrences: 2,785 Examples: * душ: [d u ʃ] * пощо: [p ɔ ʃ tʃ ɔ] * шумні: [ʃ u m nʲ i] * бiльш: [b i lʲ ʃ] Occurrences: 1,666 Examples: * жито: [ʒ ɪ t ɔ] * чужий: [tʃ u ʒ ɪ j] * вражі: [v r a ʒ i] * живий: [ʒ ɪ v ɪ j] |
|||||
Fricative |
Occurrences: 660 Examples: * ліфти: [lʲ i f t ɪ] * ефес: [ɛ f ɛ s] * феб: [f ɛ b] * ефект: [ɛ f ɛ k t] Occurrences: 13,901 Examples: * слів: [s lʲ i v] * ворон: [v ɔ r ɔ n] * вниз: [v n ɪ z] * всi: [v s i] |
Occurrences: 4,058 Examples: * ага: [a ɦ a] * гамка: [ɦ a m k a] * біг: [b i ɦ] * гасла: [ɦ a s l a] |
|||||
Approximant |
Occurrences: 8,303 Examples: * кияни: [k ɪ j a n ɪ] * лий: [l ɪ j] * чужий: [tʃ u ʒ ɪ j] * дію: [dʲ i j u] |
||||||
Trill |
Occurrences: 11,612 Examples: * серце: [s ɛ r t s ɛ] * ворон: [v ɔ r ɔ n] * рогу: [r ɔ ɦ u] * комар: [k ɔ m a r] Occurrences: 1,368 Examples: * порід: [p ɔ rʲ i d] * луарі: [l u a rʲ i] * крізь: [k rʲ i zʲ] * тряси: [t rʲ a s ɪ] |
||||||
Lateral |
Occurrences: 6,762 Examples: * атлас: [a t l a s] * плода: [p l ɔ d a] * гасла: [ɦ a s l a] * лий: [l ɪ j] Occurrences: 3,089 Examples: * слів: [s lʲ i v] * бiльш: [b i lʲ ʃ] * ліфти: [lʲ i f t ɪ] * оселі: [ɔ s ɛ lʲ i] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 14,137 Examples: * слів: [s lʲ i v] * місця: [m i s t sʲ a] * мiсця: [m i s t sʲ a] * порід: [p ɔ rʲ i d] |
Occurrences: 10,945 Examples: * спаду: [s p a d u] * душ: [d u ʃ] * отуди: [ɔ t u d ɪ] * пуп: [p u p] |
|||
Occurrences: 17,683 Examples: * кияни: [k ɪ j a n ɪ] * отуди: [ɔ t u d ɪ] * жито: [ʒ ɪ t ɔ] * вниз: [v n ɪ z] |
|||||
Close-Mid |
Occurrences: 1 Examples: * maace: [m a s e] |
Occurrences: 3 Examples: * пaблo: [p a b l o] * iдiть: [i d o tʲ] * joki: [j o k i] |
|||
Open-Mid |
Occurrences: 12,254 Examples: * серце: [s ɛ r t s ɛ] * череп: [tʃ ɛ r ɛ p] * мені: [m ɛ nʲ i] * єстві: [j ɛ s t v i] |
Occurrences: 21,542 Examples: * плода: [p l ɔ d a] * порід: [p ɔ rʲ i d] * отуди: [ɔ t u d ɪ] * пощо: [p ɔ ʃ tʃ ɔ] |
|||
Open |
Occurrences: 25,802 Examples: * атлас: [a t l a s] * спаду: [s p a d u] * кияни: [k ɪ j a n ɪ] * плода: [p l ɔ d a] |