Tamil MFA dictionary v2.0.0#
@techreport{mfa_tamil_mfa_dictionary_2022,
author={McAuliffe, Michael and Sonderegger, Morgan},
title={Tamil MFA dictionary v2.0.0},
address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/Tamil/Tamil MFA dictionary v2_0_0.html}},
year={2022},
month={Mar},
}
G2P models |
Installation#
Install from the MFA command line:
mfa model download dictionary tamil_mfa
Or download from the release page.
Intended use#
This dictionary is intended for forced alignment of Tamil transcripts.
This dictionary uses the MFA phone set for Tamil, and was used in training the Tamil MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Dental |
Alveolar |
Retroflex |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|---|---|
Nasal |
Occurrences: 1,115 Examples: * மோசம்: [m oː s a m] * பயம்: [p a j a m] * நாகம்: [n̪ aː ɡ a m] * மீன்: [m iː n] Occurrences: 9 Examples: * அம்மா: [ʔ a mː aː] |
Occurrences: 257 Examples: * நாகம்: [n̪ aː ɡ a m] * நேரம்: [n̪ eː ɾ a m] * நாம்: [n̪ aː m] * மந்து: [m a n̪ d̪ u] Occurrences: 2 Examples: |
Occurrences: 324 Examples: * மீன்: [m iː n] * ஜனகம்: [dʑ a n a ɡ a m] * மனம்: [m a n a m] * பானை: [p aː n aj] Occurrences: 31 Examples: * உன்னு: [ʔ u nː u] * சின்ன: [tɕ i nː a] * என்ன: [ʔ e nː a] |
Occurrences: 205 Examples: * கருணை: [k a ɾ u ɳ aj] * ஆண்மை: [ʔ aː ɳ m aj] * ஆணை: [ʔ aː ɳ aj] * உணவு: [ʔ u ɳ a ʋ u] Occurrences: 21 Examples: * அண்ணா: [ʔ a ɳː aː] * அண்ணி: [ʔ a ɳː i] * உண்ணி: [ʔ u ɳː i] |
Occurrences: 22 Examples: * இஞ்சி: [ʔ i ɲ dʑ i] * ஞானம்: [ɲ aː n a m] * பஞ்சை: [p a ɲ dʑ aj] * கஞ்சா: [k a ɲ dʑ aː] Occurrences: 2 Examples: * மஞ்ஞை: [m a ɲː aj] * அஞ்ஞை: [ʔ a ɲː aj] |
Occurrences: 113 Examples: * அங்கு: [ʔ a ŋ ɡ u] * செங்: [tɕ e ŋ] * கங்கு: [k a ŋ ɡ u] * இங்கு: [ʔ i ŋ ɡ u] Occurrences: 2 Examples: |
||
Stop |
Occurrences: 412 Examples: * படகு: [p a ɖ a ɡ u] * பயம்: [p a j a m] * பக்தி: [p a k t̪ i] * பாகு: [p aː ɡ u] Occurrences: 189 Examples: * ஒப்பு: [ʔ o pː u] * தப்பு: [t̪ a pː u] * அப்பா: [ʔ a pː aː] * உப்பு: [ʔ u pː u] Occurrences: 122 Examples: * ஆபிச்: [ʔ aː b i tɕ] * தூபி: [t̪ uː b i] * ஜனாபா: [dʑ a n aː b aː] * காபி: [k aː b i] |
Occurrences: 224 Examples: * பக்தி: [p a k t̪ i] * தூவல்: [t̪ uː ʋ a l] * தோல்: [t̪ oː l] * தளம்: [t̪ a ɭ a m] Occurrences: 134 Examples: * பத்து: [p a t̪ː u] * ஏத்து: [ʔ eː t̪ː u] * அத்தை: [ʔ a t̪ː aj] * கத்தி: [k a t̪ː i] Occurrences: 307 Examples: * காதை: [k aː d̪ aj] * விதவை: [ʋ i d̪ a ʋ aj] * உதவு: [ʔ u d̪ a ʋ u] * சிதமை: [tɕ i d̪ a m aj] |
Occurrences: 34 Examples: * ஊற்று: [ʔ uː tː ɾ u] |
Occurrences: 48 Examples: * சட்னி: [tɕ a ʈ n i] * ஔடதம்: [ʔ aw ʈ a d̪ a m] * ஆட்சி: [ʔ aː ʈ tɕ i] * பட்சி: [p a ʈ tɕ i] Occurrences: 109 Examples: * கிட்ட: [k i ʈː a] * கட்டி: [k a ʈː i] * ஊட்டி: [ʔ uː ʈː i] * சட்டி: [tɕ a ʈː i] Occurrences: 311 Examples: * சீடை: [tɕ iː ɖ aj] * படகு: [p a ɖ a ɡ u] * கிடை: [k i ɖ aj] * தடி: [t̪ a ɖ i] |
Occurrences: 395 Examples: * கிட்ட: [k i ʈː a] * காதை: [k aː d̪ aj] * கல்வி: [k a l ʋ i] * பக்தி: [p a k t̪ i] Occurrences: 212 Examples: * அக்கு: [ʔ a kː u] * நக்கு: [n̪ a kː u] * அக்கை: [ʔ a kː aj] * இக்கு: [ʔ i kː u] Occurrences: 355 Examples: * படகு: [p a ɖ a ɡ u] * நாகம்: [n̪ aː ɡ a m] * மேகம்: [m eː ɡ a m] * ஜனகம்: [dʑ a n a ɡ a m] |
Occurrences: 582 Examples: * ஊற்று: [ʔ uː tː ɾ u] * ஏலம்: [ʔ eː l a m] * உளவு: [ʔ u ɭ a ʋ u] * அங்கு: [ʔ a ŋ ɡ u] |
||
Affricate |
Occurrences: 259 Examples: * சீடை: [tɕ iː ɖ aj] * சேறு: [tɕ eː r u] * ஆபிச்: [ʔ aː b i tɕ] * சிதமை: [tɕ i d̪ a m aj] Occurrences: 76 Examples: * உச்சி: [ʔ u tɕː i] * பச்சை: [p a tɕː aj] Occurrences: 50 Examples: * ஜனகம்: [dʑ a n a ɡ a m] * இஞ்சி: [ʔ i ɲ dʑ i] * ஜீதம்: [dʑ iː d̪ a m] * ஜனாபா: [dʑ a n aː b aː] |
|||||||
Sibilant |
Occurrences: 196 Examples: * மோசம்: [m oː s a m] * மாசம்: [m aː s a m] * கஸாயி: [k a s aː j i] * அசர்: [ʔ a s a ɾ] |
Occurrences: 24 Examples: * பவிஷு: [p a ʋ i ʂ u] * ரிஷி: [ɾ i ʂ i] * க்ஷ: [k ʂ a] * ரஷ்யா: [ɾ a ʂ j aː] |
||||||
Fricative |
Occurrences: 8 Examples: * அஹம்: [ʔ a ɦ a m] * மஹால்: [m a ɦ aː l] |
|||||||
Approximant |
Occurrences: 577 Examples: * வயல்: [ʋ a j a l] * கல்வி: [k a l ʋ i] * விதவை: [ʋ i d̪ a ʋ aj] * உளவு: [ʔ u ɭ a ʋ u] Occurrences: 16 Examples: * வவ்வு: [ʋ a ʋː u] * அவ்வோ: [ʔ a ʋː oː] |
Occurrences: 166 Examples: * கெழு: [k e ɻ u] * வழி: [ʋ a ɻ i] * மழை: [m a ɻ aj] * பழைய: [p a ɻ aj j a] |
Occurrences: 353 Examples: * வயல்: [ʋ a j a l] * பயம்: [p a j a m] * கயிறு: [k a j i r u] * கஸாயி: [k a s aː j i] Occurrences: 6 Examples: |
|||||
Tap |
Occurrences: 749 Examples: * ஊற்று: [ʔ uː tː ɾ u] * நேரம்: [n̪ eː ɾ a m] * மார்: [m aː ɾ] * எரி: [ʔ e ɾ i] |
|||||||
Trill |
Occurrences: 212 Examples: * கயிறு: [k a j i r u] * அறம்: [ʔ a r a m] * சேறு: [tɕ eː r u] * கறவை: [k a r a ʋ aj] |
|||||||
Lateral |
Occurrences: 470 Examples: * வயல்: [ʋ a j a l] * கல்வி: [k a l ʋ i] * ஏலம்: [ʔ eː l a m] * கோல்: [k oː l] Occurrences: 38 Examples: * வல்லை: [ʋ a lː aj] * எல்லை: [ʔ e lː aj] * வல்ல: [ʋ a lː a] * இல்லை: [ʔ i lː aj] |
|||||||
Lateral Tap |
Occurrences: 193 Examples: * உளவு: [ʔ u ɭ a ʋ u] * களவு: [k a ɭ a ʋ u] * இருள்: [ʔ i ɾ u ɭ] * மீள்: [m iː ɭ] Occurrences: 24 Examples: * தள்ளை: [t̪ a ɭː aj] * பள்ளி: [p a ɭː i] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 1,301 Examples: * கிட்ட: [k i ʈː a] * கல்வி: [k a l ʋ i] * பக்தி: [p a k t̪ i] * விதவை: [ʋ i d̪ a ʋ aj] Occurrences: 103 Examples: * சீடை: [tɕ iː ɖ aj] * மீன்: [m iː n] * மீள்: [m iː ɭ] * தீவு: [t̪ iː ʋ u] |
Occurrences: 1,173 Examples: * ஊற்று: [ʔ uː tː ɾ u] * படகு: [p a ɖ a ɡ u] * கெழு: [k e ɻ u] * கயிறு: [k a j i r u] Occurrences: 107 Examples: * ஊற்று: [ʔ uː tː ɾ u] * தூவல்: [t̪ uː ʋ a l] * ஊர்: [ʔ uː ɾ] * ஊட்டி: [ʔ uː ʈː i] |
|||
Close-Mid |
Occurrences: 205 Examples: * கெழு: [k e ɻ u] * எரி: [ʔ e ɾ i] * செங்: [tɕ e ŋ] * பெயலை: [p e j a l aj] Occurrences: 167 Examples: * நேரம்: [n̪ eː ɾ a m] * மேகம்: [m eː ɡ a m] * ஏலம்: [ʔ eː l a m] * சேறு: [tɕ eː r u] |
Occurrences: 121 Examples: * ஒப்பு: [ʔ o pː u] * கொசு: [k o s u] * ஒழுகு: [ʔ o ɻ u ɡ u] * ஒலி: [ʔ o l i] Occurrences: 126 Examples: * மோசம்: [m oː s a m] * கோல்: [k oː l] * தோல்: [t̪ oː l] * அதோள்: [ʔ a d̪ oː ɭ] |
|||
Open-Mid |
|||||
Open |
Occurrences: 2,594 Examples: * வயல்: [ʋ a j a l] * கிட்ட: [k i ʈː a] * படகு: [p a ɖ a ɡ u] * மோசம்: [m oː s a m] Occurrences: 724 Examples: * காதை: [k aː d̪ aj] * நாகம்: [n̪ aː ɡ a m] * மாசம்: [m aː s a m] * மார்: [m aː ɾ] |
Diphthongs#
aj
aw