Bulgarian MFA dictionary v2.0.0a#
@techreport{mfa_bulgarian_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Bulgarian MFA dictionary v2.0.0a},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Bulgarian/Bulgarian MFA dictionary v2_0_0a.html}},
year={2022},
month={May},
}
G2P models Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary bulgarian_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Bulgarian transcripts.
This dictionary uses the MFA phone set for Bulgarian, and was used in training the Bulgarian MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Dental |
Alveolar |
Alveopalatal |
Palatal |
Velar |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 9,366 Examples: * милчо: [m ʲ i ɫ t ʃ ɔ] * куме: [k u m ɛ] * омъжи: [o m ɤ ʒ i] * махни: [m a x n ̪ i] Occurrences: 1,911 Examples: * умиха: [o m ʲ i x ɤ] * смя: [s ̪ m ʲ a] * седми: [s ̪ ɛ d ̪ m ʲ i] * мисли: [m ʲ i s ̪ ʎ i] |
Occurrences: 150 Examples: * съмва: [s ̪ a ɱ v ɤ] * амвон: [a ɱ v ɔ n ̪] * инмат: [i ɱ m ɤ t ̪] * инфо: [i ɱ f o] |
Occurrences: 34,041 Examples: * умни: [u m n ̪ i] * тайна: [t ̪ a j n ̪ ɤ] * пияно: [p i j a n ̪ ɔ] * тони: [t ̪ ɔ n ̪ i] |
Occurrences: 1,245 Examples: * винят: [v ʲ i ɲ ɤ t ̪] * ранях: [r a ɲ a x] * коня: [k o ɲ ɤ] * звъня: [z ̪ v ɲ ɤ] |
Occurrences: 450 Examples: * анко: [a ŋ k o] * ринга: [r ʲ i ŋ ɡ a] * ганке: [ɡ a ŋ k ɛ] * пенко: [p ɛ ŋ k o] |
||
Stop |
Occurrences: 22,849 Examples: * рап: [r a p] * успея: [u s ̪ p ɛ j ɤ] * спира: [s ̪ p i r ɤ] * порив: [p o r ʲ i f] Occurrences: 8,692 Examples: * нсбоп: [n ̪ z ̪ b ɔ p] * обем: [o b ɛ m] * бесът: [b ɛ s ̪ ɤ t ̪] * банк: [b a ŋ k] |
Occurrences: 47,228 Examples: * ахмед: [x m ɛ t ̪] * лодка: [ɫ ɔ t ̪ k ɤ] * тръне: [t ̪ r ɤ n ̪ ɛ] * порт: [p o r t ̪] Occurrences: 11,799 Examples: * дарик: [d ̪ a r ʲ i k] * дума: [d ̪ u m ɤ] * редът: [r ɛ d ̪ ɤ t ̪] * отбил: [o d ̪ b i ɫ] |
Occurrences: 1,826 Examples: * малки: [m a ɫ c i] * екипа: [ɛ c i p a] * майки: [m a j c i] * кино: [c i n ̪ o] Occurrences: 355 Examples: * гинка: [ɟ i ŋ k ɤ] * гилзи: [ɟ i ɫ z ̪ i] * багня: [b a ɟ ɲ a] * гьол: [ɟ ɔ ɫ] |
Occurrences: 12,503 Examples: * къса: [k ɤ s ̪ ɤ] * чокоя: [t ʃ ɔ k o j a] * днк: [d ̪ ɤ n ̪ ɤ k ɤ] * касае: [k a s ̪ a ɛ] Occurrences: 6,096 Examples: * агасъ: [a ɡ a s ̪ ɤ] * графе: [ɡ r a f ɛ] * ганчо: [ɡ a n ̪ t ʃ ɔ] * бряга: [b r ʲ a ɡ ɤ] |
|||
Affricate |
Occurrences: 3,717 Examples: * бицам: [b i t ̪ s ̪ ɤ m] * челце: [t ʃ ɛ ɫ t ̪ s ̪ ɛ] * майнц: [m a j n ̪ t ̪ s ̪] * мацка: [m a t ̪ s ̪ k ɤ] |
Occurrences: 9,681 Examples: * чудно: [t ʃ u d ̪ n ̪ o] * уличи: [u ʎ i t ʃ i] * мълчи: [m a ɫ t ʃ i] * учещ: [u t ʃ ɛ ʃ t ̪] Occurrences: 220 Examples: * ходжа: [x ɔ d ʒ ɤ] * джаз: [d ʒ a s ̪] * алчба: [a ɫ d ʒ b ɤ] * индже: [i n ̪ d ʒ ɛ] |
|||||
Sibilant |
Occurrences: 21,785 Examples: * дейци: [d ̪ ɛ j t ̪ s ̪ i] * заспа: [z ̪ ɤ s ̪ p a] * лице: [ʎ i t ̪ s ̪ ɛ] * излез: [i z ̪ ɫ ɛ s ̪] Occurrences: 13,488 Examples: * знаеш: [z ̪ n ̪ a ɛ ʃ] * оазис: [o ɤ z ̪ i s ̪] * лози: [ɫ ɔ z ̪ i] * заеме: [z ̪ a ɛ m ɛ] |
Occurrences: 1,311 Examples: * сяра: [s ʲ a r ɤ] * бесят: [b ɛ s ʲ ɤ t ̪] * обеся: [o b ɛ s ʲ ɤ] * сякаш: [s ʲ a k ɤ ʃ] Occurrences: 98 Examples: * пазят: [p a z ʲ ɤ t ̪] * зярна: [z ʲ a r n ̪ ɤ] * пълзя: [p ɤ ɫ z ʲ ɤ] * мразя: [m r a z ʲ ɤ] |
Occurrences: 10,897 Examples: * ушили: [o ʃ i ɫ i] * зрящ: [z ̪ r ʲ a ʃ t ̪] * шибам: [ʃ i b ɤ m] * вечни: [v ɛ t ʃ n ̪ i] Occurrences: 4,036 Examples: * жанра: [ʒ ɤ n ̪ r ɤ] * ръжен: [r ɤ ʒ ɛ n ̪] * чуждо: [t ʃ u ʒ d ̪ o] * бдж: [b ɤ d ̪ ɤ ʒ ɤ] |
||||
Fricative |
Occurrences: 2,577 Examples: * пегав: [p ɛ ɡ ɤ f] * живко: [ʒ i f k o] * втора: [f t ̪ ɔ r ɤ] * вплел: [f p ɫ ɛ ɫ] Occurrences: 326 Examples: * филип: [f ʲ i ʎ i p] * шефик: [ʃ ɛ f ʲ i k] * фирми: [f ʲ i r m ʲ i] * кафяв: [k a f ʲ a f] Occurrences: 21,871 Examples: * уловя: [u ɫ o v ʲ ɤ] * врящ: [v r ʲ a ʃ t ̪] * бавим: [b a v ʲ i m] * влача: [v ɫ a t ʃ ɤ] Occurrences: 3,182 Examples: * вижда: [v ʲ i ʒ d ̪ a] * уловя: [u ɫ o v ʲ ɤ] * дивят: [d ʲ i v ʲ ɤ t ̪] * вия: [v ʲ i j ɤ] |
Occurrences: 152 Examples: * мухи: [m u ç i] * хилда: [ç i ɫ d ̪ ɤ] * хитър: [ç i t ̪ ɤ r] * тихи: [t ̪ i ç i] |
|||||
Approximant |
Occurrences: 10,155 Examples: * шайка: [ʃ a j k ɤ] * юлен: [j u ɫ ɛ n ̪] * язди: [j a z ̪ d ʲ i] * хай: [x a j] |
||||||
Trill |
Occurrences: 24,971 Examples: * рая: [r a j ɤ] * утро: [u t ̪ r o] * реже: [r ɛ ʒ ɛ] * трио: [t ̪ r ʲ i ɔ] Occurrences: 5,688 Examples: * мери: [m ɛ r ʲ i] * риа: [r ʲ i a] * спри: [s ̪ p r ʲ i] * акрил: [ɤ k r ʲ i ɫ] |
||||||
Lateral |
Occurrences: 23,640 Examples: * лъгах: [ɫ a ɡ a x] * палец: [p a ɫ ɛ t ̪ s ̪] * чуела: [t ʃ u ɛ ɫ ɤ] * щолпе: [ʃ t ̪ ɔ ɫ p ɛ] |
Occurrences: 3,280 Examples: * лято: [ʎ a t ̪ o] * вопли: [v o p ʎ i] * стеля: [s ̪ t ̪ ɛ ʎ ɤ] * млян: [m ʎ ɤ n ̪] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 50,015 Examples: * взима: [v z ̪ i m ɤ] * първи: [p ɤ r v ʲ i] * изгря: [i z ̪ ɡ r ʲ a] * чешки: [t ʃ ɛ ʃ c i] |
Occurrences: 7,765 Examples: * хулим: [x u ʎ i m] * чупят: [t ʃ u p ʲ ɤ t ̪] * чужд: [t ʃ u ʃ t ̪] * чушле: [t ʃ u ʃ ɫ ɛ] |
|||
Close-Mid |
Occurrences: 62,396 Examples: * убия: [u b i j ɤ] * имана: [i m ɤ n ̪ ɤ] * лиска: [ʎ i s ̪ k ɤ] * ремък: [r ɛ m ɤ k] Occurrences: 26,244 Examples: * петно: [p ɛ t ̪ n ̪ o] * бяло: [b ʲ a ɫ o] * почна: [p o t ʃ n ̪ a] * крало: [k r a ɫ o] |
||||
Open-Mid |
Occurrences: 51,297 Examples: * бера: [b ɛ r ɤ] * дошле: [d ̪ ɔ ʃ ɫ ɛ] * бодем: [b o d ̪ ɛ m] * поело: [p o ɛ ɫ ɔ] |
Occurrences: 19,581 Examples: * водещ: [v ɔ d ̪ ɛ ʃ t ̪] * брош: [b r ɔ ʃ] * кьолн: [k ɔ ɫ n ̪] * рома: [r ɔ m ɤ] |
|||
Open |
Occurrences: 37,485 Examples: * дясна: [d ʲ a s ̪ n ̪ ɤ] * ласка: [ɫ a s ̪ k ɤ] * блат: [b ɫ a t ̪] * ираде: [i r a d ̪ ɛ] |