Thai CV dictionary v2.0.0#
@misc{Ahn_Chodroff_2022,
author={Ahn, Emily and Chodroff, Eleanor},
title={VoxCommunis Corpus},
address={\url{https://osf.io/t957v}},
publisher={OSF},
year={2022},
month={Jan}
}
Acoustic models |
Installation#
Install from the MFA command line:
mfa model download dictionary thai_cv
Or download from the release page.
Intended use#
This dictionary is intended for forced alignment of Thai transcripts.
This dictionary uses the XPF phone set for Thai, and was used in training the Thai XPF acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.
Performance Factors#
When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.
Ethical considerations#
Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.
Demographic Bias#
You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.
IPA Charts#
Consonants#
Obstruent symbols to the left of are unvoiced and those to the right are voiced.
Manner |
Labial |
Labiodental |
Alveolar |
Palatal |
Velar |
Glottal |
---|---|---|---|---|---|---|
Nasal |
Occurrences: 48,307 Examples: * ทอม: [tʰ ɔː m] * มะแม: [m a m ɛː] * แมป: [m ɛː p] * มาร์ช: [m aː t] |
Occurrences: 99,695 Examples: * ชาน: [t͡ɕʰ aː n] * พัฒนา: [pʰ a t n aː] * นูโว: [n uː w oː] * กันย์: [k a n] |
Occurrences: 48,260 Examples: * ไฟแรง: [f a j r ɛː ŋ] * กล๊อง: [k l ɔː ŋ] * น่อง: [n ɔː ŋ] * ของกู: [kʰ ɔː ŋ k uː] |
|||
Stop |
Occurrences: 22,226 Examples: * ปะทะ: [p a tʰ a ʔ] * ปิดจอ: [p i t t͡ɕ ɔː] * แมป: [m ɛː p] * ปรีดี: [p r iː d iː] Occurrences: 22,005 Examples: * burke: [b u n k e ʔ] * ครับ: [kʰ r a b] * บรรยง: [b o n r a j o ŋ] * บูทิค: [b uː tʰ i k] |
Occurrences: 43,221 Examples: * ปิดจอ: [p i t t͡ɕ ɔː] * สหรัฐ: [s o h r a t] * มาร์ช: [m aː t] * พัฒนา: [pʰ a t n aː] Occurrences: 20,966 Examples: * ที่ใด: [tʰ iː d a j] * ปรีดี: [p r iː d iː] * โดร่า: [d oː r aː] * ระดับ: [r a d a b] |
Occurrences: 5 Examples: |
Occurrences: 56,925 Examples: * รักษา: [r a k s aː] * ก็คือ: [k a kʰ ɯː ɔː] * หกล้ม: [h o k l o m] * กันย์: [k a n] |
Occurrences: 15,935 Examples: * ปะทะ: [p a tʰ a ʔ] * เอ้อ: [ʔ ɤː] * อุก: [ʔ u k] * burke: [b u n k e ʔ] |
|
Affricate |
Occurrences: 16,598 Examples: * ปิดจอ: [p i t t͡ɕ ɔː] * โจโฉ: [t͡ɕ oː t͡ɕʰ oː] * ฝาจีบ: [f aː t͡ɕ iː b] * เอเจ: [ʔ eː t͡ɕ eː] |
|||||
Sibilant |
Occurrences: 27,622 Examples: * รักษา: [r a k s aː] * ซอย: [s ɔː j] * สหรัฐ: [s o h r a t] * สาลี่: [s aː l iː] |
|||||
Fricative |
Occurrences: 2,979 Examples: * เลิฟ: [l ɤ f] * ไฟแรง: [f a j r ɛː ŋ] * สายไฟ: [s aː j f a j] * วูฟ?: [w uː f a] |
Occurrences: 26,144 Examples: * สหรัฐ: [s o h r a t] * หกล้ม: [h o k l o m] * เหลน: [h eː l o n] * หนอน: [h a n ɔː n] |
||||
Approximant |
Occurrences: 31,760 Examples: * นูโว: [n uː w oː] * วิชัย: [w i t͡ɕʰ a j] * วอลต์: [w ɔː n] * วูฟ?: [w uː f a] |
Occurrences: 67,850 Examples: * ที่ใด: [tʰ iː d a j] * ซอย: [s ɔː j] * ไฟแรง: [f a j r ɛː ŋ] * วิชัย: [w i t͡ɕʰ a j] |
||||
Trill |
Occurrences: 43,536 Examples: * รักษา: [r a k s aː] * สหรัฐ: [s o h r a t] * ไฟแรง: [f a j r ɛː ŋ] * ปรีดี: [p r iː d iː] |
|||||
Lateral |
Occurrences: 25,565 Examples: * เลิฟ: [l ɤ f] * สาลี่: [s aː l iː] * หกล้ม: [h o k l o m] * กล๊อง: [k l ɔː ŋ] |
Vowels#
Vowel symbols to the left of are unrounded and those to the right are rounded.
Front |
Near-Front |
Central |
Near-Back |
Back |
|
---|---|---|---|---|---|
Close |
Occurrences: 19,658 Examples: * ปิดจอ: [p i t t͡ɕ ɔː] * วิชัย: [w i t͡ɕʰ a j] * เคยสิ: [kʰ eː j s i ʔ] * บูทิค: [b uː tʰ i k] Occurrences: 33,787 Examples: * ที่ใด: [tʰ iː d a j] * สาลี่: [s aː l iː] * ปรีดี: [p r iː d iː] * ตีสาม: [t iː s aː m] Occurrences: 2 Examples: * fury: [f u r y] * kenny: [k e n n y] |
Occurrences: 5,360 Examples: * จึง: [t͡ɕ ɯ ŋ] * ถึง: [tʰ ɯ ŋ] * ตึก: [t ɯ k] * ฉึก: [t͡ɕʰ ɯ k] Occurrences: 13,130 Examples: * ก็คือ: [k a kʰ ɯː ɔː] * มะรืน: [m a r ɯː n] * ฮือ: [h ɯː ɔː] * ลืมไป: [l ɯː m p a j] Occurrences: 13,088 Examples: * อุก: [ʔ u k] * burke: [b u n k e ʔ] * นพคุณ: [n o p kʰ u n] * กุฎิ: [k u d i ʔ] Occurrences: 15,779 Examples: * นูโว: [n uː w oː] * อยู่: [ɔː j uː] * ของกู: [kʰ ɔː ŋ k uː] * ผู้ชม: [pʰ uː t͡ɕʰ o m] |
|||
Close-Mid |
Occurrences: 63 Examples: * burke: [b u n k e ʔ] * เป๊ะ: [p e ʔ] * เอ๊ะ: [ʔ e ʔ] * เงะ: [ŋ e ʔ] Occurrences: 33,814 Examples: * เหลน: [h eː l o n] * เทพไท: [tʰ eː p tʰ a j] * เคยสิ: [kʰ eː j s i ʔ] * ทำเลย: [tʰ a l eː j] |
Occurrences: 3,901 Examples: * เลิฟ: [l ɤ f] * เลิก: [l ɤ k] * เงิน: [ŋ ɤ n] * เริ่ม: [r ɤ m] Occurrences: 3,163 Examples: * เอ้อ: [ʔ ɤː] * เธอ: [tʰ ɤː] * มูเซอ: [m uː s ɤː] * เซอร์: [s ɤː] Occurrences: 28,530 Examples: * สหรัฐ: [s o h r a t] * หกล้ม: [h o k l o m] * เหลน: [h eː l o n] * ผู้ชม: [pʰ uː t͡ɕʰ o m] Occurrences: 6,466 Examples: * นูโว: [n uː w oː] * โจโฉ: [t͡ɕ oː t͡ɕʰ oː] * โธ่: [tʰ oː] * โดร่า: [d oː r aː] |
|||
Open-Mid |
Occurrences: 3,202 Examples: * แฉะ: [t͡ɕʰ ɛ ʔ] * แพะ: [pʰ ɛ ʔ] * แกะ: [k ɛ ʔ] * และชา: [l ɛ t͡ɕʰ aː] Occurrences: 13,390 Examples: * มะแม: [m a m ɛː] * แมป: [m ɛː p] * ไฟแรง: [f a j r ɛː ŋ] * วีแชต: [w iː t͡ɕʰ ɛː t] |
Occurrences: 272 Examples: * เกาะ: [k ɔ ʔ] * เงาะ: [ŋ ɔ ʔ] * เถาะ: [tʰ ɔ ʔ] Occurrences: 38,298 Examples: * ทอม: [tʰ ɔː m] * ปิดจอ: [p i t t͡ɕ ɔː] * ซอย: [s ɔː j] * ก็คือ: [k a kʰ ɯː ɔː] |
|||
Open |
Occurrences: 194,035 Examples: * รักษา: [r a k s aː] * ปะทะ: [p a tʰ a ʔ] * ที่ใด: [tʰ iː d a j] * มะแม: [m a m ɛː] Occurrences: 85,654 Examples: * รักษา: [r a k s aː] * ชาน: [t͡ɕʰ aː n] * สาลี่: [s aː l iː] * มาร์ช: [m aː t] Occurrences: 14,100 Examples: * เรียล: [r iː a̯ n] * เงียบ: [ŋ iː a̯ b] * เฮีย: [h iː a̯] * เลือด: [l ɯː a̯ t] |