Bulgarian MFA dictionary v3.0.0#
@techreport{mfa_bulgarian_mfa_dictionary_2024,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Bulgarian MFA dictionary v3.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Bulgarian/Bulgarian MFA dictionary v3_0_0.html}},
year={2024},
month={Feb},
}
Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary bulgarian_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/bulgarian/mfa/Bulgarian MFA dictionary v3_0_0.dict).
Intended use#
This dictionary is intended for forced alignment of Bulgarian transcripts.
This dictionary uses the MFA phone set for Bulgarian, and was used in training the Bulgarian MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Dental |
Alveolar |
Alveopalatal |
Palatal |
Velar |
---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 2,255 Examples: * дома: [d̪ o m a] * марта: [m a r t̪ ɤ] * мокро: [m ɔ k r o] * можа: [m o ʒ a] Occurrences: 517 Examples: * дами: [d̪ a mʲ i] * минус: [mʲ i n̪ u s̪] * амин: [a mʲ i n̪] * мича: [mʲ i tʃ ɤ] |
Occurrences: 39 Examples: * инфо: [i ɱ f o] |
Occurrences: 8,047 Examples: * нси: [n̪ ɤ s̪ ɤ i] * троян: [t̪ r o j a n̪] * оцени: [ɔ t̪s̪ ɛ n̪ i] * янев: [j a n̪ ɛ f] |
Occurrences: 160 Examples: * някоя: [ɲ a k ɔ j ɤ] * сменя: [s̪ m ɛ ɲ ɤ] * синьо: [s̪ i ɲ ɔ] * сняг: [s̪ ɲ a k] |
Occurrences: 190 Examples: * танко: [t̪ a ŋ k o] * сянка: [sʲ a ŋ k ɤ] * ланка: [ɫ a a ŋ k ɤ] * гинка: [ɟ i ŋ k ɤ] |
||
Stop |
Occurrences: 4,098 Examples: * пеят: [p ɛ j ɤ t̪] * просо: [p r o s̪ ɔ] * поука: [p o u k ɤ] * общи: [o p ʃ t̪ i] Occurrences: 33 Examples: * търпя: [t̪ ɤ r pʲ ɤ] * спяха: [s̪ pʲ a x ɤ] * пяна: [pʲ a n̪ ɤ] * успя: [u s̪ pʲ ɤ] Occurrences: 1,735 Examples: * белия: [b ɛ ɫ i j ɤ] * биба: [b i b ɤ] * бивш: [b i f ʃ] * ръба: [r ɤ b ɤ] Occurrences: 74 Examples: * бягат: [bʲ a ɡ ɤ t̪] * бях: [bʲ a x] * бяга: [bʲ a ɡ ɤ] * бюро: [bʲ u r ɔ] |
Occurrences: 10,041 Examples: * кутев: [k u t̪ ɛ f] * там: [t̪ ɤ m] * тълпа: [t̪ ɤ ɫ p a] * бяхте: [bʲ a x t̪ ɛ] Occurrences: 2,743 Examples: * виден: [vʲ i d̪ ɛ n̪] * нидал: [n̪ i d̪ ɤ ɫ] * биде: [b i d̪ ɛ] * донт: [d̪ o n̪ t̪] |
Occurrences: 0 Examples: Occurrences: 92 Examples: * братя: [b r a tʲ ɤ] * матю: [m a tʲ u] * щяхме: [ʃ tʲ a x m ɛ] * нощя: [n̪ o ʃ tʲ a] Occurrences: 0 Examples: Occurrences: 663 Examples: * видят: [vʲ i dʲ ɤ t̪] * видиш: [vʲ i dʲ i ʃ] * гуди: [ɡ u dʲ i] * лидия: [ʎ i dʲ i j ɤ] |
Occurrences: 523 Examples: * папки: [p ɤ p c i] * киото: [c i o t̪ o] * стоки: [s̪ t̪ ɔ c i] * руски: [r u s̪ c i] Occurrences: 123 Examples: * агент: [ɤ ɟ ɛ n̪ t̪] * гинат: [ɟ i n̪ ɤ t̪] * магия: [m ɤ ɟ i j ɤ] * гинка: [ɟ i ŋ k ɤ] |
Occurrences: 3,519 Examples: * скъса: [s̪ k ɤ s̪ ɤ] * късно: [k ɤ s̪ n̪ o] * велик: [v ɛ ʎ i k] * мярка: [mʲ a r k ɤ] Occurrences: 1,515 Examples: * гюнер: [ɡ u n̪ ɛ r] * кого: [k o ɡ ɔ] * сега: [s̪ ɛ ɡ a] * гост: [ɡ o s̪ t̪] |
||
Affricate |
Occurrences: 1,208 Examples: * перца: [p ɛ r t̪s̪ ɤ] * царев: [t̪s̪ a r ɛ f] * целят: [t̪s̪ ɛ ʎ ɤ t̪] * цвета: [t̪s̪ v ɛ t̪ ɤ] |
Occurrences: 0 Examples: Occurrences: 9 Examples: * цял: [tsʲ a ɫ] * цяло: [tsʲ a ɫ o] * цяла: [tsʲ a ɫ ɤ] |
Occurrences: 1,640 Examples: * чудо: [tʃ u d̪ o] * чудех: [tʃ u d̪ ɛ x] * личи: [ʎ i tʃ i] * чаша: [tʃ a ʃ ɤ] Occurrences: 64 Examples: * джан: [dʒ a n̪] * джон: [dʒ ɔ n̪] * джоба: [dʒ ɔ b ɤ] * индже: [i n̪ dʒ ɛ] |
||||
Sibilant |
Occurrences: 5,655 Examples: * царя: [t̪s̪ a rʲ ɤ] * същи: [s̪ ɤ ʃ t̪ i] * среща: [s̪ r ɛ ʃ t̪ a] * кацна: [k a t̪s̪ n̪ ɤ] Occurrences: 2,235 Examples: * звук: [z̪ v u k] * взети: [v z̪ ɛ t̪ i] * захар: [z̪ a x ɤ r] * зимен: [z̪ i m ɛ n̪] |
Occurrences: 0 Examples: Occurrences: 51 Examples: * търся: [t̪ ɤ r sʲ ɤ] * сядат: [sʲ a d̪ ɤ t̪] * всяко: [f sʲ a k o] * внася: [v n̪ a sʲ ɤ] Occurrences: 0 Examples: Occurrences: 19 Examples: * мразя: [m r a zʲ ɤ] * пазят: [p a zʲ ɤ t̪] |
Occurrences: 1,536 Examples: * неша: [n̪ ɛ ʃ ɤ] * вадиш: [v a dʲ i ʃ] * свеж: [s̪ v ɛ ʃ] * общия: [o p ʃ t̪ i j ɤ] Occurrences: 849 Examples: * важни: [v a ʒ n̪ i] * женда: [ʒ ɛ n̪ d̪ ɤ] * вежди: [v ɛ ʒ dʲ i] * лъже: [ɫ ɤ ʒ ɛ] |
||||
Fricative |
Occurrences: 840 Examples: * цифра: [t̪s̪ i f r ɤ] * алфа: [ɤ ɫ f ɤ] * тачев: [t̪ a tʃ ɛ f] * авив: [a vʲ i f] Occurrences: 134 Examples: * фидел: [fʲ i d̪ ɛ ɫ] * афис: [a fʲ i s̪] * ефира: [ɛ fʲ i r ɤ] * офис: [ɔ fʲ i s̪] Occurrences: 3,779 Examples: * веда: [v ɛ d̪ ɤ] * нова: [n̪ ɔ v ɤ] * ливан: [ʎ i v a n̪] * дваж: [d̪ v a ʃ] Occurrences: 823 Examples: * вино: [vʲ i n̪ ɔ] * оживя: [o ʒ i vʲ a] * едвин: [ɛ d̪ vʲ i n̪] * сви: [s̪ vʲ i] |
Occurrences: 48 Examples: * архив: [ɤ r ç i f] * сухи: [s̪ u ç i] * хитър: [ç i t̪ ɤ r] * хитро: [ç i t̪ r ɔ] |
|||||
Approximant |
Occurrences: 1,740 Examples: * живей: [ʒ i v ɛ j] * ястия: [j a s̪ t̪ i j ɤ] * ратай: [r a t̪ a j] * яхя: [j ɤ ç a] |
||||||
Trill |
Occurrences: 6,031 Examples: * рота: [r o t̪ a] * зарад: [z̪ ɤ r a t̪] * вярна: [vʲ a r n̪ ɤ] * верни: [v ɛ r n̪ i] Occurrences: 1,212 Examples: * пирин: [p i rʲ i n̪] * царя: [t̪s̪ a rʲ ɤ] * одрин: [ɔ d̪ rʲ i n̪] * рязко: [rʲ a s̪ k o] |
||||||
Lateral |
Occurrences: 3,854 Examples: * плещи: [p ɫ ɛ ʃ t̪ i] * зил: [z̪ i ɫ] * млади: [m ɫ a dʲ i] * блеър: [b ɫ ɛ ɤ r] |
Occurrences: 964 Examples: * леля: [ɫ ɛ ʎ ɤ] * велик: [v ɛ ʎ i k] * школи: [ʃ k ɔ ʎ i] * улица: [u ʎ i t̪s̪ ɤ] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 11,453 Examples: * видим: [vʲ i dʲ i m] * жени: [ʒ ɛ n̪ i] * пиян: [p i j a n̪] * пита: [p i t̪ ɤ] |
Occurrences: 2,049 Examples: * шум: [ʃ u m] * скука: [s̪ k u k ɤ] * удар: [u d̪ ɤ r] * шутия: [ʃ u t̪ i j ɤ] |
|||
Close-Mid |
Occurrences: 11,459 Examples: * метна: [m ɛ t̪ n̪ ɤ] * платя: [p ɫ a tʲ ɤ] * гара: [ɡ a r ɤ] * наука: [n̪ a u k ɤ] Occurrences: 6,727 Examples: * дома: [d̪ o m a] * него: [n̪ ɛ ɡ o] * одеве: [o d̪ ɛ v ɛ] * идиот: [i dʲ i o t̪] |
||||
Open-Mid |
Occurrences: 10,737 Examples: * петре: [p ɛ t̪ r ɛ] * петна: [p ɛ t̪ n̪ ɤ] * цена: [t̪s̪ ɛ n̪ a] * боже: [b ɔ ʒ ɛ] |
Occurrences: 3,094 Examples: * като: [k ɤ t̪ ɔ] * сочи: [s̪ ɔ tʃ i] * вино: [vʲ i n̪ ɔ] * дошел: [d̪ ɔ ʃ ɛ ɫ] |
|||
Open |
Occurrences: 7,568 Examples: * ван: [v a n̪] * авел: [a v ɛ ɫ] * скала: [s̪ k a ɫ ɤ] * кат: [k a t̪] |