Ukrainian CV dictionary v2.0.0#

  • Maintainer: Vox Communis

  • Language: Ukrainian

  • Dialect: N/A

  • Phone set: XPF

  • Number of words: 31,032

  • Phones: a b d e f i j k l m n o p r s t u v x z ɔ ɛ ɡ ɦ ɪ ʃ ʒ

  • License: CC-0

  • Compatible MFA version: v2.0.0

  • Citation:

@misc{Ahn_Chodroff_2022,
	author={Ahn, Emily and Chodroff, Eleanor},
	title={VoxCommunis Corpus},
	address={\url{https://osf.io/t957v}},
	publisher={OSF},
	year={2022},
	month={Jan}
}

Installation#

Install from the MFA command line:

mfa model download dictionary ukrainian_cv

Or download from the release page.

Intended use#

This dictionary is intended for forced alignment of Ukrainian transcripts.

This dictionary uses the XPF phone set for Ukrainian, and was used in training the Ukrainian XPF acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Alveolar

Alveopalatal

Palatal

Velar

Glottal

Nasal

Occurrences:
7,726
Examples:
* місця:
[m i s t a]
* мiсця:
[m i s t a]
* гамка:
[ɦ a m k a]
* комар:
[k ɔ m a r]
Occurrences:
12,410
Examples:
* кияни:
[k ɪ j a n ɪ]
* ворон:
[v ɔ r ɔ n]
* вниз:
[v n ɪ z]
* віник:
[v i n ɪ k]
Occurrences:
3,275
Examples:
* шумні:
[ʃ u m i]
* мені:
[m ɛ i]
* кузня:
[k u z a]
* їхніх:
[j i x i x]

Stop

Occurrences:
8,051
Examples:
* спаду:
[s p a d u]
* плода:
[p l ɔ d a]
* порід:
[p ɔ i d]
* пощо:
[p ɔ ʃ ɔ]
Occurrences:
3,679
Examples:
* біг:
[b i ɦ]
* білки:
[b i l k ɪ]
* бiльш:
[b i ʃ]
* робив:
[r ɔ b ɪ v]
Occurrences:
11,613
Examples:
* атлас:
[a t l a s]
* місця:
[m i s t a]
* мiсця:
[m i s t a]
* отуди:
[ɔ t u d ɪ]
Occurrences:
6,665
Examples:
* спаду:
[s p a d u]
* душ:
[d u ʃ]
* плода:
[p l ɔ d a]
* порід:
[p ɔ i d]
Occurrences:
9,041
Examples:
* кияни:
[k ɪ j a n ɪ]
* гамка:
[ɦ a m k a]
* таки:
[t a k ɪ]
* комар:
[k ɔ m a r]
Occurrences:
49
Examples:
* манґи:
[m a n ɡ ɪ]
* ґазда:
[ɡ a z d a]
* фанґ:
[f a n ɡ]
* ґанок:
[ɡ a n ɔ k]

Affricate

Occurrences:
3,847
Examples:
* пощо:
[p ɔ ʃ ɔ]
* череп:
[ ɛ r ɛ p]
* чужий:
[ u ʒ ɪ j]
* чутно:
[ u t n ɔ]
Occurrences:
228
Examples:
* бджіл:
[b i l]
* пейдж:
[p ɛ j ]
* джекі:
[ ɛ k i]
* джолі:
[ ɔ i]

Sibilant

Occurrences:
8,439
Examples:
* атлас:
[a t l a s]
* слів:
[s i v]
* спаду:
[s p a d u]
* місця:
[m i s t a]
Occurrences:
5,102
Examples:
* місця:
[m i s t a]
* мiсця:
[m i s t a]
* заєць:
[z a j ɛ t ]
* якусь:
[j a k u ]
Occurrences:
5,087
Examples:
* вниз:
[v n ɪ z]
* зри:
[z r ɪ]
* заєць:
[z a j ɛ t ]
* кузня:
[k u z a]
Occurrences:
389
Examples:
* крізь:
[k i ]
* возів:
[v ɔ i v]
* крiзь:
[k r i ]
* дузю:
[d u u]
Occurrences:
2,785
Examples:
* душ:
[d u ʃ]
* пощо:
[p ɔ ʃ ɔ]
* шумні:
[ʃ u m i]
* бiльш:
[b i ʃ]
Occurrences:
1,666
Examples:
* жито:
[ʒ ɪ t ɔ]
* чужий:
[ u ʒ ɪ j]
* вражі:
[v r a ʒ i]
* живий:
[ʒ ɪ v ɪ j]

Fricative

Occurrences:
660
Examples:
* ліфти:
[ i f t ɪ]
* ефес:
[ɛ f ɛ s]
* феб:
[f ɛ b]
* ефект:
[ɛ f ɛ k t]
Occurrences:
13,901
Examples:
* слів:
[s i v]
* ворон:
[v ɔ r ɔ n]
* вниз:
[v n ɪ z]
* всi:
[v s i]
Occurrences:
4,058
Examples:
* ага:
[a ɦ a]
* гамка:
[ɦ a m k a]
* біг:
[b i ɦ]
* гасла:
[ɦ a s l a]

Approximant

Occurrences:
8,303
Examples:
* кияни:
[k ɪ j a n ɪ]
* лий:
[l ɪ j]
* чужий:
[ u ʒ ɪ j]
* дію:
[ i j u]

Trill

Occurrences:
11,612
Examples:
* серце:
[s ɛ r t s ɛ]
* ворон:
[v ɔ r ɔ n]
* рогу:
[r ɔ ɦ u]
* комар:
[k ɔ m a r]
Occurrences:
1,368
Examples:
* порід:
[p ɔ i d]
* луарі:
[l u a i]
* крізь:
[k i ]
* тряси:
[t a s ɪ]

Lateral

Occurrences:
6,762
Examples:
* атлас:
[a t l a s]
* плода:
[p l ɔ d a]
* гасла:
[ɦ a s l a]
* лий:
[l ɪ j]
Occurrences:
3,089
Examples:
* слів:
[s i v]
* бiльш:
[b i ʃ]
* ліфти:
[ i f t ɪ]
* оселі:
[ɔ s ɛ i]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
14,137
Examples:
* слів:
[s i v]
* місця:
[m i s t a]
* мiсця:
[m i s t a]
* порід:
[p ɔ i d]
Occurrences:
10,945
Examples:
* спаду:
[s p a d u]
* душ:
[d u ʃ]
* отуди:
[ɔ t u d ɪ]
* пуп:
[p u p]
Occurrences:
17,683
Examples:
* кияни:
[k ɪ j a n ɪ]
* отуди:
[ɔ t u d ɪ]
* жито:
[ʒ ɪ t ɔ]
* вниз:
[v n ɪ z]

Close-Mid

Occurrences:
1
Examples:
* maace:
[m a s e]
Occurrences:
3
Examples:
* пaблo:
[p a b l o]
* iдiть:
[i d o ]
* joki:
[j o k i]

Open-Mid

Occurrences:
12,254
Examples:
* серце:
[s ɛ r t s ɛ]
* череп:
[ ɛ r ɛ p]
* мені:
[m ɛ i]
* єстві:
[j ɛ s t v i]
Occurrences:
21,542
Examples:
* плода:
[p l ɔ d a]
* порід:
[p ɔ i d]
* отуди:
[ɔ t u d ɪ]
* пощо:
[p ɔ ʃ ɔ]

Open

Occurrences:
25,802
Examples:
* атлас:
[a t l a s]
* спаду:
[s p a d u]
* кияни:
[k ɪ j a n ɪ]
* плода:
[p l ɔ d a]