Arabic MFA dictionary v2.0.0#

  • Maintainer: Montreal Forced Aligner

  • Language: Arabic

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 10,753

  • Phones: a b d dʒː dˤː e f h j k l m n o p q r s sˤː t tˤː v w z æ ð ðː ðˤ ðˤː ħ ħː ɑ ɑː ɡ ɡː ɣ ɣː ɪ ɫː ʃ ʃː ʊ ʒ ʒː ʔ ʕ θ θː χ χː

  • License: CC BY 4.0

  • Compatible MFA version: v2.0.0

  • Citation:

@techreport{mfa_arabic_mfa_dictionary_2022,
	author={Shmueli, Natalia and McAuliffe, Michael and Sonderegger, Morgan},
	title={Arabic MFA dictionary v2.0.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Arabic/Arabic MFA dictionary v2_0_0.html}},
	year={2022},
	month={Mar},
}
../../_images/full_logo_yellow.svg

Installation#

Install from the MFA command line:

mfa model download dictionary arabic_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the plain dictionary.

Intended use#

This dictionary is intended for forced alignment of Arabic transcripts.

This dictionary uses the MFA phone set for Arabic, and was used in training the Arabic MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Dental

Alveolar

Alveopalatal

Palatal

Velar

Uvular

Pharyngeal

Glottal

Nasal

Occurrences:
3,663
Examples:
* قامَ:
[q m]
* دمع:
[d æ m æ ʕ æ]
* مزر:
[m ɪ z r]
* متنوع:
[m ʊ t æ n æ ɪ ʕ]
Occurrences:
166
Examples:
* تذمر:
[t æ ð æ ɑ r ɑ]
* طمرة:
[ ʊ ɑ r ɑ]
* أعمى:
[ʔ ʊ ʕ æ ]
* جمعي:
[ æ ɪ ʕ ]
Occurrences:
3,417
Examples:
* زنخ:
[z æ n ɪ χ æ]
* انقلب:
[ɪ n q ɑ l æ b æ]
* دند:
[d æ n d]
* نورج:
[n æ w r ɑ ]
Occurrences:
151
Examples:
* مسن:
[m ɪ s æ ]
* صنارة:
[ ɪ ɑː r ɑ]
* بَنَّ:
[b ɪ æ]
* تمنى:
[t æ m æ ]

Stop

Occurrences:
10
Examples:
* بكين:
[p ɪ k n]
* نيبال:
[n ɪ p l]
Occurrences:
2,803
Examples:
* بزاق:
[b ʊ z ɑː q]
* لعبة:
[l æ ʕ b æ]
* انقلب:
[ɪ n q ɑ l æ b æ]
* غباء:
[ɣ æ b ʔ]
Occurrences:
171
Examples:
* أزْبّ:
[ʔ æ z æ ]
* أطباء:
[ʔ ɑ ɪ ʔ]
* أبدp:
[ʔ æ æ d æ]
* صّبَر:
[ ɑ ɑ r ɑ]
Occurrences:
2,151
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* تذمر:
[t æ ð æ ɑ r ɑ]
* متنوع:
[m ʊ t æ n æ ɪ ʕ]
* ثمن:
[t æ m æ n]
Occurrences:
65
Examples:
* بُتةُ:
[b æ æ]
* فتة:
[f æ t]
* متحد:
[m ʊ æ ħ ɪ d]
* ستة:
[s ɪ æ]
Occurrences:
995
Examples:
* طيهوج:
[ h ]
* طمرة:
[ ʊ ɑ r ɑ]
* طاش:
[ ɑː ʃ æ]
* سطع:
[s ɑ ɑ ʕ æ]
Occurrences:
38
Examples:
* خطُةُ:
[χ ʊ tˤː ɑ]
* عطن:
[ʕ ɑ tˤː ɑ n æ]
* قطارة:
[q ɑ tˤː ɑː r ɑ]
* مسطح:
[m ʊ s ɑ tˤː ɑ ħ]
Occurrences:
1,902
Examples:
* دمع:
[d æ m æ ʕ æ]
* دند:
[d æ n d]
* دون:
[d n æ]
* دِنو:
[d ʊ n ʊ ]
Occurrences:
177
Examples:
* جدا:
[ ɪ æ n]
* استبد:
[ɪ s t æ b æ æ]
* مدع:
[m ʊ æ ʕ ɪ n]
* نددُ:
[n æ æ d æ]
Occurrences:
503
Examples:
* أنُضب:
[ʔ æ n ɑ b æ]
* ضجة:
[ ɑ dʒː æ]
* أجهض:
[ʔ æ h ɑ ɑ]
* عرمض:
[ʕ ɑ r m ɑ ]
Occurrences:
31
Examples:
* تقِضب:
[t ɑ q ɑ dˤː æ b æ]
* وضب:
[w ɑ dˤː æ b æ]
* وضح:
[w ɑ dˤː æ ħ æ]
* وضاح:
[w ɑ dˤː ħ]
Occurrences:
1,287
Examples:
* إياكُ:
[ʔ ɪ k æ]
* كبَر:
[k æ b ʊ r ɑ]
* كوكب:
[k æ w k æ b]
* كعبة:
[k æ ʕ b æ]
Occurrences:
86
Examples:
* مكة:
[m æ æ]
* أكار:
[ʔ æ ɑː r]
* فكُ:
[f æ æ]
* مبكر:
[m ʊ b æ ɪ r]
Occurrences:
196
Examples:
* قامَ:
[ɡ m]
* قَاعة:
[ɡ ʕ æ]
* جنس:
[ɡ æ n s]
* قنبر:
[ɡ a m b r ɪ]
Occurrences:
6
Examples:
* حجّ:
[ħ æ ɡː]
* شقة:
[ʃ ɪ ɡː æ]
Occurrences:
1,933
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* قامَ:
[q m]
* بزاق:
[b ʊ z ɑː q]
* انقلب:
[ɪ n q ɑ l æ b æ]
Occurrences:
76
Examples:
* وقع:
[w ɑ ɑ ʕ æ]
* انُشق:
[ɪ n ʃ ɑ ɑ]
* زقوم:
[z ɑ m]
* حقا:
[ħ ɑ ɑ n]
Occurrences:
2,483
Examples:
* إياكُ:
[ʔ ɪ k æ]
* غباء:
[ɣ æ b ʔ]
* إفرند:
[ʔ ɪ f r ɪ n d]
* أعمى:
[ʔ æ ʕ m ]

Affricate

Occurrences:
10
Examples:
* تشيلي:
[ l ]
Occurrences:
1,467
Examples:
* نورج:
[n æ w r ɑ ]
* طيهوج:
[ h ]
* جدا:
[ ɪ æ n]
* خجول:
[χ æ l]
Occurrences:
60
Examples:
* ضجة:
[ ɑ dʒː æ]
* عجور:
[ʕ æ dʒː r]
* لَجِّ:
[l æ dʒː æ]
* رجْعُ:
[r ɑ dʒː æ ʕ æ]

Sibilant

Occurrences:
2,315
Examples:
* مسيح:
[m æ s ħ]
* عيسى:
[ʕ s ]
* نسرين:
[n ɪ s r n]
* سعةُ:
[s æ ʕ æ]
Occurrences:
101
Examples:
* جسم:
[ æ æ m æ]
* حساسي:
[ħ æ s ɪ ]
* أَمسٌ:
[ʔ æ m æ æ]
* مماس:
[m ʊ m ]
Occurrences:
792
Examples:
* غصب:
[ɣ ɑ ɑ b æ]
* راقصُ:
[r ɑː q ɑ ɑ]
* حرقوص:
[ħ ɑ r q ]
* صاروج:
[ ɑː r ]
Occurrences:
68
Examples:
* حصل:
[ħ ɑ sˤː ɑ l æ]
* تعصب:
[t æ ʕ ɑ sˤː ɑ b æ]
* حُصاد:
[ħ ɑ sˤː ɑː d]
* لصق:
[l ɑ sˤː ɑ q ɑ]
Occurrences:
796
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* زنخ:
[z æ n ɪ χ æ]
* بزاق:
[b ʊ z ɑː q]
* مزر:
[m ɪ z r]
Occurrences:
59
Examples:
* أرْزُ:
[ʔ ɑ r ɑ æ]
* أعزَ:
[ʔ æ ʕ æ ]
* هزة:
[h æ æ]
* كزاز:
[k ʊ z]
Occurrences:
1,143
Examples:
* شام:
[ʃ m æ]
* طاش:
[ ɑː ʃ æ]
* خشخاش:
[χ æ ʃ χ ʃ]
* شياف:
[ʃ ɪ j f]
Occurrences:
31
Examples:
* مرشةُ:
[m ɪ r ɑ ʃː æ]
* خشاب:
[χ æ ʃː b]
* تشالة:
[t æ ʃː l æ]
* تعشق:
[t æ ʕ æ ʃː ɑ q ɑ]
Occurrences:
26
Examples:
* جنس:
[ʒ ɪ n s]
* جِو:
[ʒ æ ]
* جنازة:
[ʒ n z æ]
* ترجم:
[t ɑ r d ʒ æ m æ]
Occurrences:
2
Examples:
* حجّ:
[ħ æ ʒː]

Fricative

Occurrences:
1,696
Examples:
* عرف:
[ʕ ɑ r f]
* إفرند:
[ʔ ɪ f r ɪ n d]
* قْافُ:
[q ɑː f]
* شياف:
[ʃ ɪ j f]
Occurrences:
85
Examples:
* جاف:
[ ]
* تفاحة:
[t ʊ ħ æ]
* ظفر:
[ðˤ æ ɑ r ɑ]
* تفاح:
[t ʊ ħ]
Occurrences:
41
Examples:
* فيديو:
[v d j ]
* فيروس:
[v j r s]
* لاتفي:
[l t v ɪ ]
Occurrences:
375
Examples:
* ثمن:
[θ æ m æ n]
* ثُور:
[θ æ w r]
* ثعلُب:
[θ æ ʕ l æ b]
* ثورة:
[θ æ w r ɑ]
Occurrences:
26
Examples:
* انبث:
[ɪ m b æ θː æ]
* جثة:
[ ʊ θː æ]
* ممثل:
[m ʊ m æ θː ɪ l]
* خِث:
[χ ʊ θː]
Occurrences:
364
Examples:
* تذمر:
[t æ ð æ ɑ r ɑ]
* بذلة:
[b æ ð l æ]
* أنقذ:
[ʔ æ n q ɑ ð æ]
* فولاذ:
[f l ð]
Occurrences:
22
Examples:
* أذن:
[ʔ æ ðː æ n æ]
* مغذ:
[m ʊ ɣ æ ðː ɪ n]
* جذر:
[ æ ðː ɑ r ɑ]
* رذُ:
[r ɑ ðː æ]
Occurrences:
178
Examples:
* حنظل:
[ħ æ n ðˤ æ l]
* قّيظّ:
[q ɑ j ðˤ]
* حنظلة:
[ħ æ n ðˤ æ l æ]
* قرظُ:
[q ɑ r ɑ ðˤ]
Occurrences:
28
Examples:
* موظفة:
[m ʊ w ɑ ðˤː æ f æ]
* عظم:
[ʕ ɑ ðˤː æ m æ]
* منظمة:
[m ʊ n ɑ ðˤː æ m æ]
* نظف:
[n ɑ ðˤː æ f æ]
Occurrences:
755
Examples:
* زنخ:
[z æ n ɪ χ æ]
* خشخاش:
[χ æ ʃ χ ʃ]
* خجول:
[χ æ l]
* خطُةُ:
[χ ʊ tˤː ɑ]
Occurrences:
31
Examples:
* أخذ:
[ʔ æ χː æ ð æ]
* جخى:
[ æ χː ]
* شخص:
[ʃ æ χː ɑ ɑ]
* نخل:
[n æ χː æ l æ]
Occurrences:
1,431
Examples:
* حياة:
[ħ æ j h]
* مسيح:
[m æ s ħ]
* انحاز:
[ɪ n ħ z æ]
* لاح:
[l ħ æ]
Occurrences:
38
Examples:
* صححَ:
[ ɑ ħː æ ħ æ]
* سحِمَ:
[s æ ħː æ m æ]
* صُحّي:
[ ɪ ħː ɪ ]
* زحافة:
[z æ ħː f æ]
Occurrences:
1,886
Examples:
* دمع:
[d æ m æ ʕ æ]
* لعبة:
[l æ ʕ b æ]
* عليل:
[ʕ æ l l]
* عرف:
[ʕ ɑ r f]
Occurrences:
1,099
Examples:
* حياة:
[ħ æ j h]
* طيهوج:
[ h ]
* بهر:
[b æ h ɑ r ɑ]
* هابيل:
[h b l]

Approximant

Occurrences:
1,283
Examples:
* نورج:
[n æ w r ɑ ]
* كوكب:
[k æ w k æ b]
* تزويد:
[t æ z w d]
* أوحد:
[ʔ æ w ħ æ d]
Occurrences:
131
Examples:
* دون:
[d æ æ n æ]
* متنوع:
[m ʊ t æ n æ ɪ ʕ]
* دِنو:
[d ʊ n ʊ ]
* نُورَ:
[n æ ɑ r ɑ]
Occurrences:
1,139
Examples:
* حياة:
[ħ æ j h]
* بناية:
[b ɪ n j æ]
* طيهوج:
[ ɑ j h ]
* زانية:
[z n ɪ j æ]
Occurrences:
1,045
Examples:
* إياكُ:
[ʔ ɪ k æ]
* دين:
[d æ æ n æ]
* حبشي:
[ħ æ b æ ʃ ɪ ]
* مخيم:
[m ʊ χ æ æ m]

Trill

Occurrences:
4,245
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* تذمر:
[t æ ð æ ɑ r ɑ]
* مزر:
[m ɪ z r]
* عرف:
[ʕ ɑ r f]
Occurrences:
248
Examples:
* عرف:
[ʕ ɑ ɑ f æ]
* أقر:
[ʔ ɑ q ɑ ɑ]
* زرشك:
[z ɑ ɑ ʃ k]
* مبرة:
[m æ b ɑ ɑ]

Lateral

Occurrences:
2,807
Examples:
* لعبة:
[l æ ʕ b æ]
* عليل:
[ʕ æ l l]
* انقلب:
[ɪ n q ɑ l æ b æ]
* قلب:
[q ɑ l b]
Occurrences:
268
Examples:
* قلب:
[q ɑ ɫː ɑ b æ]
* محل:
[m æ ħ ɑ ɫː]
* أطل:
[ʔ ɑ ɑ ɫː ɑ]
* استدل:
[ɪ s t æ d ɑ ɫː ɑ]

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
1,594
Examples:
* عليل:
[ʕ æ l l]
* طيهوج:
[ h ]
* مسيح:
[m æ s ħ]
* عيسى:
[ʕ s ]
Occurrences:
1,320
Examples:
* دون:
[d n æ]
* طيهوج:
[ h ]
* خجول:
[χ æ l]
* حرقوص:
[ħ ɑ r q ]
Occurrences:
5,233
Examples:
* زنخ:
[z æ n ɪ χ æ]
* إياكُ:
[ʔ ɪ k æ]
* مزر:
[m ɪ z r]
* انقلب:
[ɪ n q ɑ l æ b æ]
Occurrences:
3,049
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* بزاق:
[b ʊ z ɑː q]
* تذمر:
[t æ ð æ ʊ r]
* لعبة:
[l ʊ ʕ b æ]

Close-Mid

Occurrences:
48
Examples:
* قارب:
[ʔ ɑː r e b]
* قهوة:
[k æ h w e]
* آخرة:
[ʔ χ r e]
* حلبة:
[ħ ɪ l b e]
Occurrences:
45
Examples:
* بوصلة:
[b l ]
* بيرو:
[b r ]
* جنيه:
[ɡ ʊ n h]
* ويكة:
[w k æ]
Occurrences:
74
Examples:
* لورد:
[l o r d]
* بوصلة:
[b o l æ]
* موسكو:
[m o s k ]
* طرمبة:
[ r o m b æ]
Occurrences:
103
Examples:
* قامَ:
[q m]
* توغو:
[t ɣ ]
* وارسو:
[w ɑ r s ]
* يوليو:
[j l ɪ j ]

Open-Mid

Occurrences:
12,024
Examples:
* زنخ:
[z æ n ɪ χ æ]
* قامَ:
[q ɑː m æ]
* تذمر:
[t æ ð æ ɑ r ɑ]
* دمع:
[d æ m æ ʕ æ]

Open

Occurrences:
43
Examples:
* قنبر:
[ɡ a m b r ɪ]
* خنزير:
[χ a n z r]
* عبقري:
[ʕ a b q ɑ r ɪ]
* فهرس:
[f a h r ɑ s]
Occurrences:
3,663
Examples:
* قامَ:
[ɡ m]
* إياكُ:
[ʔ ɪ k æ]
* حياة:
[ħ æ j h]
* غباء:
[ɣ æ b ʔ]
Occurrences:
5,823
Examples:
* تذمر:
[t æ ð æ ɑ r ɑ]
* عرف:
[ʕ ɑ r f]
* انقلب:
[ɪ n q ɑ l æ b æ]
* نورج:
[n æ w r ɑ ]
Occurrences:
1,855
Examples:
* رزتاق:
[r ʊ z t ɑː q]
* قامَ:
[q ɑː m æ]
* بزاق:
[b ʊ z ɑː q]
* طاش:
[ ɑː ʃ æ]