Urdu CV dictionary v2.0.0#
@misc{Ahn_Chodroff_2022,
author={Ahn, Emily and Chodroff, Eleanor},
title={VoxCommunis Corpus},
address={\url{https://osf.io/t957v}},
publisher={OSF},
year={2022},
month={Jan}
}
|
Installation#
Install from the MFA command line:
mfa model download dictionary urdu_cv
Or download from the release page.
Intended use#
This dictionary is intended for forced alignment of Urdu transcripts.
This dictionary uses the Epitran phone set for Urdu, and was used in training the Urdu Epitran acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Dental |
Alveolar |
Alveopalatal |
Retroflex |
Palatal |
Velar |
Uvular |
Pharyngeal |
Glottal |
---|---|---|---|---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 386 Examples: * مخمصہ: [m ə x m ə s ɑː] * جمود: [d͡ʒ ʊ m uː d̪] * مانے: [m ɑ̃ː eː] * مگر: [m ə ɡ ə ɾ] |
Occurrences: 350 Examples: * نشین: [n ə ʃ iː n] * نقل: [n ə q l] * لگانے: [l ə ɡ ɑː n eː] * وضع: [ʋ ə z ɛː n] Occurrences: 3 Examples: * جنتی: [d͡ʒ ə nː ə t̪ iː] * جناح: [d͡ʒ ə nː ɑː] * جنونی: [d͡ʒ ə nː oː n iː] |
Occurrences: 18 Examples: * انڈا: [ə ɳ ɖ ɑː] * منڈی: [m ʊ ɳ ɖ iː] * ونڈوز: [ʋ ə ɳ ɖ ə ʋ z] * ٹھنڈا: [ʈʰ ɳ ɖ ɑː] |
Occurrences: 26 Examples: * جنگ: [d͡ʒ ə ŋ ɡ] * منکسر: [m ə ŋ k s ə ɾ] * بنگلہ: [b ə ŋ ɡ l ɑː] * رنگار: [ɾ ə ŋ ɡ ɑː ɾ] |
|||||||
Stop |
Occurrences: 115 Examples: * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] * یورپ: [j oː ɾ p] * چھاپ: [t͡ʃʰ ɑː p] * پیکار: [p ɛː k ɑː ɾ] Occurrences: 1 Examples: * پیپلز: [p ɛː pː ə l z] Occurrences: 226 Examples: * عجیب: [ə d͡ʒ iː b] * سبزہ: [s ə b z ɑː] * باہمی: [b ɑː ɦ m iː] * نواب: [n ə ʋ ɑː b] |
Occurrences: 330 Examples: * لکھتے: [l ɪ kʰ t̪ eː] * سکتے: [s ə k t̪ eː] * ترقی: [t̪ ə ɾ q iː] * صحت: [s ɪ ɦ ə t̪] Occurrences: 1 Examples: * فاتح: [f ɪ t̪ː e ɦ] Occurrences: 213 Examples: * جمود: [d͡ʒ ʊ m uː d̪] * مقاصد: [m ə q ɑː s ə d̪] * دیا: [d̪ ə j ɑː] * مفاد: [m ʊ f ɑː d̪] |
Occurrences: 81 Examples: * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] * ایٹ: [ə eː ʈ] * ٹاکرے: [ʈ ɑː k ɾ eː] * نوٹ: [n ə oː ʈ] Occurrences: 47 Examples: * انڈا: [ə ɳ ɖ ɑː] * ڈالے: [ɖ ɑː l ɪ eː] * فوڈ: [f ɔː ʊ ɖ] * منڈی: [m ʊ ɳ ɖ iː] |
Occurrences: 247 Examples: * یکس: [j ə k s] * پیکار: [p ɛː k ɑː ɾ] * سکتے: [s ə k t̪ eː] * کوئی: [k oː iː] Occurrences: 102 Examples: * مگر: [m ə ɡ ə ɾ] * لگانے: [l ə ɡ ɑː n eː] * کانگو: [k ɑː n ɡ uː] * گزر: [ɡ ʊ z ə ɾ] |
Occurrences: 91 Examples: * نقل: [n ə q l] * قوم: [q ɔː m] * مقاصد: [m ə q ɑː s ə d̪] * ترقی: [t̪ ə ɾ q iː] |
Occurrences: 33 Examples: * معیار: [m ə ʔ j ɑː ɾ] * واقع: [ʋ ɑː q ʔ] * بعض: [b ə ʔ z] * تعصب: [t̪ ʔ s ʊ b] |
|||||
Affricate |
Occurrences: 1 Examples: |
Occurrences: 60 Examples: * بچے: [b ə t͡ʃ eː] * چیزوں: [t͡ʃ iː z ũː] * چاہتے: [t͡ʃ ɑː ɦ ə t̪ eː] * کراچی: [k ə ɾ ɑː t͡ʃ iː] Occurrences: 123 Examples: * عجیب: [ə d͡ʒ iː b] * وجود: [ɔː d͡ʒ uː d̪ᵊ] * جمود: [d͡ʒ ʊ m uː d̪] * جیسی: [d͡ʒ iː s iː] |
|||||||||
Sibilant |
Occurrences: 314 Examples: * یکس: [j ə k s] * مخمصہ: [m ə x m ə s ɑː] * سبزہ: [s ə b z ɑː] * جیسی: [d͡ʒ iː s iː] Occurrences: 146 Examples: * سبزہ: [s ə b z ɑː] * فضل: [f z ə l] * وضع: [ʋ ə z ɛː n] * کلیمز: [k ə l iː m ə z] |
Occurrences: 85 Examples: * نشین: [n ə ʃ iː n] * اشارہ: [ɪ ʃ ɑː ɾ ɑː] * شریف: [ʃ ə ɾ iː f] * شہادت: [ʃ ə ɦ ɑː d̪ t̪] Occurrences: 1 Examples: * وژن: [ʋ ə ʒ eː ə n] |
|||||||||
Fricative |
Occurrences: 94 Examples: * فضل: [f z ə l] * شریف: [ʃ ə ɾ iː f] * فرحان: [f ə ɾ ɦ ɑː n] * مفاد: [m ʊ f ɑː d̪] |
Occurrences: 2 Examples: * الطبع: [ə l t̪ ə b ʕ] * موضوع: [m ə ʋ ʊ z uː ʕ] |
Occurrences: 1 Examples: * نہاد: [n ɪ h ɑː d̪] Occurrences: 222 Examples: * ہرا: [ɦ ɾ ɑː] * باہمی: [b ɑː ɦ m iː] * ہمارے: [ɦ ə m ɑː ɾ eː] * فرحان: [f ə ɾ ɦ ɑː n] |
||||||||
Approximant |
Occurrences: 1 Examples: |
Occurrences: 119 Examples: * وضع: [ʋ ə z ɛː n] * نواب: [n ə ʋ ɑː b] * واپس: [ʋ ɑː p ə s] * عوام: [ə ʋ ɑː m] |
Occurrences: 127 Examples: * یکس: [j ə k s] * یورپ: [j oː ɾ p] * معیار: [m ə ʔ j ɑː ɾ] * دیا: [d̪ ə j ɑː] |
||||||||
Tap Plain |
Occurrences: 531 Examples: * ہرا: [ɦ ɾ ɑː] * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] * یورپ: [j oː ɾ p] * مگر: [m ə ɡ ə ɾ] Occurrences: 1 Examples: * تاریخ: [t̪ ɑː ˈɾ eː x] |
Occurrences: 21 Examples: * کھڑی: [kʰ ə ɽ iː] * تھپڑ: [t̪ʰ p ɽ] * پڑا: [p ɽ ɑː] * پڑوس: [p ɽ uː s] |
|||||||||
Aspirated |
Occurrences: 6 Examples: * بڑھے: [b ə ɽʱ eː] * چڑھتے: [t͡ʃ ə ɽʱ t̪ eː] * بڑھ: [b ə ɽʱ] * بڑھتے: [b ə ɽʱ t̪ eː] |
||||||||||
Trill |
Occurrences: 1 Examples: |
||||||||||
Lateral |
Occurrences: 2 Examples: * تلے: [t̪ ə l̪ eː] * مرحلے: [m ə ɾ ɦ ɛ l̪ eː] |
Occurrences: 307 Examples: * لکھتے: [l ɪ kʰ t̪ eː] * نقل: [n ə q l] * کھلا: [kʰ ə l ɑː] * فضل: [f z ə l] Occurrences: 1 Examples: * دلیل: [d̪ ɪ lː iː l] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Oral Vowels#
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 2 Examples: * ایئر: [ə ə̯ i ə ɾ] Occurrences: 420 Examples: * عجیب: [ə d͡ʒ iː b] * کھڑی: [kʰ ə ɽ iː] * نشین: [n ə ʃ iː n] * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] |
Occurrences: 108 Examples: * وجود: [ɔː d͡ʒ uː d̪ᵊ] * جمود: [d͡ʒ ʊ m uː d̪] * کانگو: [k ɑː n ɡ uː] * قومی: [q uː m iː] |
|||
Occurrences: 256 Examples: * لکھتے: [l ɪ kʰ t̪ eː] * اشارہ: [ɪ ʃ ɑː ɾ ɑː] * ڈالے: [ɖ ɑː l ɪ eː] * صحت: [s ɪ ɦ ə t̪] |
Occurrences: 117 Examples: * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] * جمود: [d͡ʒ ʊ m uː d̪] * مفاد: [m ʊ f ɑː d̪] * فوڈ: [f ɔː ʊ ɖ] |
||||
Close-Mid |
Occurrences: 6 Examples: * اکشے: [ɑː k e ʃ eː] * فاتح: [f ɪ t̪ː e ɦ] * انتہا: [ɪ n t̪ e ɦ ɑː] Occurrences: 2 Examples: * بہتر: [b eʱ t̪ ə ɾ] Occurrences: 280 Examples: * لکھتے: [l ɪ kʰ t̪ eː] * مانے: [m ɑ̃ː eː] * تھے: [t̪ʰ eː] * ایٹ: [ə eː ʈ] |
Occurrences: 4 Examples: * معطل: [m o ɪ t̪ ə l] * معمہ: [m o ɪ m ɑː] * معرفت: [m o ɪ ɾ ɪ f ə t̪] Occurrences: 86 Examples: * یورپ: [j oː ɾ p] * کوئی: [k oː iː] * نوٹ: [n ə oː ʈ] * روئے: [ɾ oː eː] |
|||
Occurrences: 934 Examples: * عجیب: [ə d͡ʒ iː b] * یکس: [j ə k s] * مخمصہ: [m ə x m ə s ɑː] * انڈا: [ə ɳ ɖ ɑː] Occurrences: 2 Examples: * ایئر: [ə ə̯ i ə ɾ] |
|||||
Open-Mid |
Occurrences: 3 Examples: * تشہیر: [t̪ ʃ ɛ ɦ iː ɾ] * مرحلے: [m ə ɾ ɦ ɛ l̪ eː] Occurrences: 35 Examples: * پیکار: [p ɛː k ɑː ɾ] * وضع: [ʋ ə z ɛː n] * اپریل: [ə p ɾ ɛː l] * سیٹ: [s ɛː ʈ] |
Occurrences: 44 Examples: * وجود: [ɔː d͡ʒ uː d̪ᵊ] * قوم: [q ɔː m] * فوڈ: [f ɔː ʊ ɖ] * موسم: [m ɔː s ə m] |
|||
Occurrences: 1 Examples: * کینسر: [k æ n s ə ɾ] |
|||||
Open |
Occurrences: 876 Examples: * مخمصہ: [m ə x m ə s ɑː] * انڈا: [ə ɳ ɖ ɑː] * ہرا: [ɦ ɾ ɑː] * سبزہ: [s ə b z ɑː] |
Nasal Vowels#
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 1 Examples: |
Occurrences: 39 Examples: * والوں: [ʋ ɑː l ũː] * چیزوں: [t͡ʃ iː z ũː] * فتووں: [f t̪ uː ũː] * کاموں: [k ɑː m ũː] |
|||
Occurrences: 1 Examples: |
|||||
Close-Mid |
Occurrences: 33 Examples: * لیں: [l ẽː] * ہوئیں: [ɦ oː ẽː] * سکیں: [s ə k ẽː] * گئیں: [ɡ ẽː] |
Occurrences: 16 Examples: * برسوں: [b ə ɾ s õː] * پہنچ: [p ə ɦ õː t͡ʃ] * ہونے: [ɦ õː eː] * گاؤں: [ɡ ɑː õː] |
|||
Occurrences: 2 Examples: |
|||||
Open-Mid |
Occurrences: 1 Examples: * فرینچ: [f ə ɾ ɛ̃ː t͡ʃ] |
||||
Open |
Occurrences: 25 Examples: * پارٹی: [p ɑ̃ː ɾ ʊ ʈ iː] * مانے: [m ɑ̃ː eː] * جہاں: [d͡ʒ ə ɦ ɑ̃ː] * یہاں: [j ə ɦ ɑ̃ː] |