Arabic MFA dictionary v2.0.0#
@techreport{mfa_arabic_mfa_dictionary_2022,
author={Shmueli, Natalia and McAuliffe, Michael and Sonderegger, Morgan},
title={Arabic MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Arabic/Arabic MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary arabic_mfa
Or download from the release page.
The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.
Intended use#
This dictionary is intended for forced alignment of Arabic transcripts.
This dictionary uses the MFA phone set for Arabic, and was used in training the Arabic MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Dental |
Alveolar |
Alveopalatal |
Palatal |
Velar |
Uvular |
Pharyngeal |
Glottal |
---|---|---|---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 3,663 Examples: * قامَ: [q oː m] * دمع: [d æ m æ ʕ æ] * مزر: [m ɪ z r] * متنوع: [m ʊ t æ n æ wː ɪ ʕ] Occurrences: 166 Examples: * تذمر: [t æ ð æ mː ɑ r ɑ] * طمرة: [tˤ ʊ mː ɑ r ɑ] * أعمى: [ʔ ʊ ʕ æ mː aː] * جمعي: [dʒ æ mː ɪ ʕ iː] |
Occurrences: 3,417 Examples: * زنخ: [z æ n ɪ χ æ] * انقلب: [ɪ n q ɑ l æ b æ] * دند: [d æ n d] * نورج: [n æ w r ɑ dʒ] Occurrences: 151 Examples: * مسن: [m ɪ s æ nː] * صنارة: [sˤ ɪ nː ɑː r ɑ] * بَنَّ: [b ɪ nː æ] * تمنى: [t æ m æ nː aː] |
||||||||
Stop |
Occurrences: 10 Examples: * بكين: [p ɪ k iː n] * نيبال: [n ɪ p aː l] Occurrences: 2,803 Examples: * بزاق: [b ʊ z ɑː q] * لعبة: [l æ ʕ b æ] * انقلب: [ɪ n q ɑ l æ b æ] * غباء: [ɣ æ b aː ʔ] Occurrences: 171 Examples: * أزْبّ: [ʔ æ z æ bː] * أطباء: [ʔ ɑ tˤ ɪ bː aː ʔ] * أبدp: [ʔ æ bː æ d æ] * صّبَر: [sˤ ɑ bː ɑ r ɑ] |
Occurrences: 2,151 Examples: * رزتاق: [r ʊ z t ɑː q] * تذمر: [t æ ð æ mː ɑ r ɑ] * متنوع: [m ʊ t æ n æ wː ɪ ʕ] * ثمن: [t æ m æ n] Occurrences: 65 Examples: * بُتةُ: [b æ tː æ] * فتة: [f æ tː aː t] * متحد: [m ʊ tː æ ħ ɪ d] * ستة: [s ɪ tː æ] Occurrences: 995 Examples: * طيهوج: [tˤ iː h uː dʒ] * طمرة: [tˤ ʊ mː ɑ r ɑ] * طاش: [tˤ ɑː ʃ æ] * سطع: [s ɑ tˤ ɑ ʕ æ] Occurrences: 38 Examples: * خطُةُ: [χ ʊ tˤː ɑ] * عطن: [ʕ ɑ tˤː ɑ n æ] * قطارة: [q ɑ tˤː ɑː r ɑ] * مسطح: [m ʊ s ɑ tˤː ɑ ħ] Occurrences: 1,902 Examples: * دمع: [d æ m æ ʕ æ] * دند: [d æ n d] * دون: [d uː n æ] * دِنو: [d ʊ n ʊ wː] Occurrences: 177 Examples: * جدا: [dʒ ɪ dː æ n] * استبد: [ɪ s t æ b æ dː æ] * مدع: [m ʊ dː æ ʕ ɪ n] * نددُ: [n æ dː æ d æ] Occurrences: 503 Examples: * أنُضب: [ʔ æ n dˤ ɑ b æ] * ضجة: [dˤ ɑ dʒː æ] * أجهض: [ʔ æ dʒ h ɑ dˤ ɑ] * عرمض: [ʕ ɑ r m ɑ dˤ] Occurrences: 31 Examples: * تقِضب: [t ɑ q ɑ dˤː æ b æ] * وضب: [w ɑ dˤː æ b æ] * وضح: [w ɑ dˤː æ ħ æ] * وضاح: [w ɑ dˤː aː ħ] |
Occurrences: 1,287 Examples: * إياكُ: [ʔ ɪ jː aː k æ] * كبَر: [k æ b ʊ r ɑ] * كوكب: [k æ w k æ b] * كعبة: [k æ ʕ b æ] Occurrences: 86 Examples: * مكة: [m æ kː æ] * أكار: [ʔ æ kː ɑː r] * فكُ: [f æ kː æ] * مبكر: [m ʊ b æ kː ɪ r] Occurrences: 196 Examples: * قامَ: [ɡ aː m] * قَاعة: [ɡ aː ʕ æ] * جنس: [ɡ æ n s] * قنبر: [ɡ a m b r ɪ] Occurrences: 6 Examples: * حجّ: [ħ æ ɡː] * شقة: [ʃ ɪ ɡː æ] |
Occurrences: 1,933 Examples: * رزتاق: [r ʊ z t ɑː q] * قامَ: [q oː m] * بزاق: [b ʊ z ɑː q] * انقلب: [ɪ n q ɑ l æ b æ] Occurrences: 76 Examples: * وقع: [w ɑ qː ɑ ʕ æ] * انُشق: [ɪ n ʃ ɑ qː ɑ] * زقوم: [z ɑ qː uː m] * حقا: [ħ ɑ qː ɑ n] |
Occurrences: 2,483 Examples: * إياكُ: [ʔ ɪ jː aː k æ] * غباء: [ɣ æ b aː ʔ] * إفرند: [ʔ ɪ f r ɪ n d] * أعمى: [ʔ æ ʕ m aː] |
|||||
Affricate |
Occurrences: 10 Examples: * تشيلي: [tʃ iː l iː] Occurrences: 1,467 Examples: * نورج: [n æ w r ɑ dʒ] * طيهوج: [tˤ iː h uː dʒ] * جدا: [dʒ ɪ dː æ n] * خجول: [χ æ dʒ uː l] Occurrences: 60 Examples: * ضجة: [dˤ ɑ dʒː æ] * عجور: [ʕ æ dʒː uː r] * لَجِّ: [l æ dʒː æ] * رجْعُ: [r ɑ dʒː æ ʕ æ] |
|||||||||
Sibilant |
Occurrences: 2,315 Examples: * مسيح: [m æ s iː ħ] * عيسى: [ʕ iː s aː] * نسرين: [n ɪ s r iː n] * سعةُ: [s æ ʕ æ] Occurrences: 101 Examples: * جسم: [dʒ æ sː æ m æ] * حساسي: [ħ æ sː aː s ɪ jː] * أَمسٌ: [ʔ æ m æ sː æ] * مماس: [m ʊ m aː sː] Occurrences: 792 Examples: * غصب: [ɣ ɑ sˤ ɑ b æ] * راقصُ: [r ɑː q ɑ sˤ ɑ] * حرقوص: [ħ ɑ r q uː sˤ] * صاروج: [sˤ ɑː r uː dʒ] Occurrences: 68 Examples: * حصل: [ħ ɑ sˤː ɑ l æ] * تعصب: [t æ ʕ ɑ sˤː ɑ b æ] * حُصاد: [ħ ɑ sˤː ɑː d] * لصق: [l ɑ sˤː ɑ q ɑ] Occurrences: 796 Examples: * رزتاق: [r ʊ z t ɑː q] * زنخ: [z æ n ɪ χ æ] * بزاق: [b ʊ z ɑː q] * مزر: [m ɪ z r] Occurrences: 59 Examples: * أرْزُ: [ʔ ɑ r ɑ zː æ] * أعزَ: [ʔ æ ʕ æ zː] * هزة: [h æ zː æ] * كزاز: [k ʊ zː aː z] |
Occurrences: 1,143 Examples: * شام: [ʃ aː m æ] * طاش: [tˤ ɑː ʃ æ] * خشخاش: [χ æ ʃ χ aː ʃ] * شياف: [ʃ ɪ j aː f] Occurrences: 31 Examples: * مرشةُ: [m ɪ r ɑ ʃː æ] * خشاب: [χ æ ʃː aː b] * تشالة: [t æ ʃː aː l æ] * تعشق: [t æ ʕ æ ʃː ɑ q ɑ] Occurrences: 26 Examples: * جنس: [ʒ ɪ n s] * جِو: [ʒ æ wː] * جنازة: [ʒ n aː z æ] * ترجم: [t ɑ r d ʒ æ m æ] Occurrences: 2 Examples: * حجّ: [ħ æ ʒː] |
||||||||
Fricative |
Occurrences: 1,696 Examples: * عرف: [ʕ ɑ r f] * إفرند: [ʔ ɪ f r ɪ n d] * قْافُ: [q ɑː f] * شياف: [ʃ ɪ j aː f] Occurrences: 85 Examples: * جاف: [dʒ aː fː] * تفاحة: [t ʊ fː aː ħ æ] * ظفر: [ðˤ æ fː ɑ r ɑ] * تفاح: [t ʊ fː aː ħ] Occurrences: 41 Examples: * فيديو: [v iː d j oː] * فيروس: [v aː j r uː s] * لاتفي: [l aː t v ɪ jː] |
Occurrences: 375 Examples: * ثمن: [θ æ m æ n] * ثُور: [θ æ w r] * ثعلُب: [θ æ ʕ l æ b] * ثورة: [θ æ w r ɑ] Occurrences: 26 Examples: * انبث: [ɪ m b æ θː æ] * جثة: [dʒ ʊ θː æ] * ممثل: [m ʊ m æ θː ɪ l] * خِث: [χ ʊ θː] Occurrences: 364 Examples: * تذمر: [t æ ð æ mː ɑ r ɑ] * بذلة: [b æ ð l æ] * أنقذ: [ʔ æ n q ɑ ð æ] * فولاذ: [f uː l aː ð] Occurrences: 22 Examples: * أذن: [ʔ æ ðː æ n æ] * مغذ: [m ʊ ɣ æ ðː ɪ n] * جذر: [dʒ æ ðː ɑ r ɑ] * رذُ: [r ɑ ðː æ] Occurrences: 178 Examples: * حنظل: [ħ æ n ðˤ æ l] * قّيظّ: [q ɑ j ðˤ] * حنظلة: [ħ æ n ðˤ æ l æ] * قرظُ: [q ɑ r ɑ ðˤ] Occurrences: 28 Examples: * موظفة: [m ʊ w ɑ ðˤː æ f æ] * عظم: [ʕ ɑ ðˤː æ m æ] * منظمة: [m ʊ n ɑ ðˤː æ m æ] * نظف: [n ɑ ðˤː æ f æ] |
Occurrences: 755 Examples: * زنخ: [z æ n ɪ χ æ] * خشخاش: [χ æ ʃ χ aː ʃ] * خجول: [χ æ dʒ uː l] * خطُةُ: [χ ʊ tˤː ɑ] Occurrences: 31 Examples: * أخذ: [ʔ æ χː æ ð æ] * جخى: [dʒ æ χː aː] * شخص: [ʃ æ χː ɑ sˤ ɑ] * نخل: [n æ χː æ l æ] |
Occurrences: 1,431 Examples: * حياة: [ħ æ j aː h] * مسيح: [m æ s iː ħ] * انحاز: [ɪ n ħ aː z æ] * لاح: [l aː ħ æ] Occurrences: 38 Examples: * صححَ: [sˤ ɑ ħː æ ħ æ] * سحِمَ: [s æ ħː æ m æ] * صُحّي: [sˤ ɪ ħː ɪ jː] * زحافة: [z æ ħː aː f æ] Occurrences: 1,886 Examples: * دمع: [d æ m æ ʕ æ] * لعبة: [l æ ʕ b æ] * عليل: [ʕ æ l iː l] * عرف: [ʕ ɑ r f] |
Occurrences: 1,099 Examples: * حياة: [ħ æ j aː h] * طيهوج: [tˤ iː h uː dʒ] * بهر: [b æ h ɑ r ɑ] * هابيل: [h aː b iː l] |
|||||
Approximant |
Occurrences: 1,283 Examples: * نورج: [n æ w r ɑ dʒ] * كوكب: [k æ w k æ b] * تزويد: [t æ z w iː d] * أوحد: [ʔ æ w ħ æ d] Occurrences: 131 Examples: * دون: [d æ wː æ n æ] * متنوع: [m ʊ t æ n æ wː ɪ ʕ] * دِنو: [d ʊ n ʊ wː] * نُورَ: [n æ wː ɑ r ɑ] |
Occurrences: 1,139 Examples: * حياة: [ħ æ j aː h] * بناية: [b ɪ n aː j æ] * طيهوج: [tˤ ɑ j h uː dʒ] * زانية: [z aː n ɪ j æ] Occurrences: 1,045 Examples: * إياكُ: [ʔ ɪ jː aː k æ] * دين: [d æ jː æ n æ] * حبشي: [ħ æ b æ ʃ ɪ jː] * مخيم: [m ʊ χ æ jː æ m] |
||||||||
Trill |
Occurrences: 4,245 Examples: * رزتاق: [r ʊ z t ɑː q] * تذمر: [t æ ð æ mː ɑ r ɑ] * مزر: [m ɪ z r] * عرف: [ʕ ɑ r f] Occurrences: 248 Examples: * عرف: [ʕ ɑ rː ɑ f æ] * أقر: [ʔ ɑ q ɑ rː ɑ] * زرشك: [z ɑ rː ɑ ʃ k] * مبرة: [m æ b ɑ rː ɑ] |
|||||||||
Lateral |
Occurrences: 2,807 Examples: * لعبة: [l æ ʕ b æ] * عليل: [ʕ æ l iː l] * انقلب: [ɪ n q ɑ l æ b æ] * قلب: [q ɑ l b] Occurrences: 268 Examples: * قلب: [q ɑ ɫː ɑ b æ] * محل: [m æ ħ ɑ ɫː] * أطل: [ʔ ɑ tˤ ɑ ɫː ɑ] * استدل: [ɪ s t æ d ɑ ɫː ɑ] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 1,594 Examples: * عليل: [ʕ æ l iː l] * طيهوج: [tˤ iː h uː dʒ] * مسيح: [m æ s iː ħ] * عيسى: [ʕ iː s aː] |
Occurrences: 1,320 Examples: * دون: [d uː n æ] * طيهوج: [tˤ iː h uː dʒ] * خجول: [χ æ dʒ uː l] * حرقوص: [ħ ɑ r q uː sˤ] |
|||
Occurrences: 5,233 Examples: * زنخ: [z æ n ɪ χ æ] * إياكُ: [ʔ ɪ jː aː k æ] * مزر: [m ɪ z r] * انقلب: [ɪ n q ɑ l æ b æ] |
Occurrences: 3,049 Examples: * رزتاق: [r ʊ z t ɑː q] * بزاق: [b ʊ z ɑː q] * تذمر: [t æ ð æ mː ʊ r] * لعبة: [l ʊ ʕ b æ] |
||||
Close-Mid |
Occurrences: 48 Examples: * قارب: [ʔ ɑː r e b] * قهوة: [k æ h w e] * آخرة: [ʔ aː χ r e] * حلبة: [ħ ɪ l b e] Occurrences: 45 Examples: * بوصلة: [b uː sˤ l eː] * بيرو: [b eː r uː] * جنيه: [ɡ ʊ n eː h] * ويكة: [w eː k æ] |
Occurrences: 74 Examples: * لورد: [l o r d] * بوصلة: [b o sˤ l æ] * موسكو: [m o s k oː] * طرمبة: [tˤ r o m b æ] Occurrences: 103 Examples: * قامَ: [q oː m] * توغو: [t oː ɣ oː] * وارسو: [w ɑ r s oː] * يوليو: [j uː l ɪ j oː] |
|||
Open-Mid |
|||||
Occurrences: 12,024 Examples: * زنخ: [z æ n ɪ χ æ] * قامَ: [q ɑː m æ] * تذمر: [t æ ð æ mː ɑ r ɑ] * دمع: [d æ m æ ʕ æ] |
|||||
Open |
Occurrences: 43 Examples: * قنبر: [ɡ a m b r ɪ] * خنزير: [χ a n z iː r] * عبقري: [ʕ a b q ɑ r ɪ] * فهرس: [f a h r ɑ s] Occurrences: 3,663 Examples: * قامَ: [ɡ aː m] * إياكُ: [ʔ ɪ jː aː k æ] * حياة: [ħ æ j aː h] * غباء: [ɣ æ b aː ʔ] |
Occurrences: 5,823 Examples: * تذمر: [t æ ð æ mː ɑ r ɑ] * عرف: [ʕ ɑ r f] * انقلب: [ɪ n q ɑ l æ b æ] * نورج: [n æ w r ɑ dʒ] Occurrences: 1,855 Examples: * رزتاق: [r ʊ z t ɑː q] * قامَ: [q ɑː m æ] * بزاق: [b ʊ z ɑː q] * طاش: [tˤ ɑː ʃ æ] |