English MFA dictionary v3.1.0#

  • Maintainer: Montreal Forced Aligner

  • Language: English

  • Dialect: N/A

  • Phone set: MFA

  • Number of words: 42,352

  • Phones: a aj aw b c d e ej f h i j k kp l m n o ow p s t u v w z æ ç ð ŋ ɐ ɑ ɑː ɒ ɒː ɔ ɔj ɖ ə əw ɚ ɛ ɛː ɜ ɜː ɝ ɟ ɟʷ ɡ ɡʷ ɪ ɫ ɲ ɹ ɾ ʃ ʈ ʈʲ ʈʷ ʉ ʉː ʊ ʋ ʎ ʒ θ

  • License: CC BY 4.0

  • Compatible MFA version: v3.1.0

  • Citation:

@techreport{mfa_english_mfa_dictionary_2024,
	author={McAuliffe, Michael and Sonderegger, Morgan},
	title={English MFA dictionary v3.1.0},
	address={\url{https://mfa-models.readthedocs.io/pronunciation dictionary/English/English MFA dictionary v3_1_0.html}},
	year={2024},
	month={Jun},
}

Installation#

Install from the MFA command line:

mfa model download dictionary english_mfa

Or download from the release page.

The dictionary available from the release page and command line installation has pronunciation and silence probabilities estimated as part acoustic model training (see Silence probability format and training pronunciation probabilities for more information. If you would like to use the version of this dictionary without probabilities, please see the [plain dictionary](https://raw.githubusercontent.com/MontrealCorpusTools/mfa-models/main/dictionary/english/mfa/English MFA dictionary v3_1_0.dict).

Intended use#

This dictionary is intended for forced alignment of English transcripts.

This dictionary uses the MFA phone set for English, and was used in training the English MFA acoustic model. Pronunciations can be added on top of the dictionary, as long as no additional phones are introduced.

Performance Factors#

When trying to get better alignment accuracy, adding pronunciations is generally helpful, especially for different styles and dialects. The most impactful improvements will generally be seen when adding reduced variants that involve deleting segments/syllables common in spontaneous speech. Alignment must include all phones specified in the pronunciation of a word, and each phone has a minimum duration (by default 10ms). If a speaker pronounces a multisyllabic word with just a single syllable, it can be hard for MFA to fit all the segments in, so it will lead to alignment errors on adjacent words as well.

Ethical considerations#

Deploying any Speech-to-Text model into any production setting has ethical implications. You should consider these implications before use.

Demographic Bias#

You should assume every machine learning model has demographic bias unless proven otherwise. For pronunciation dictionaries, it is often the case that transcription accuracy and lexicon coverage for the prestige variety modeled in this dictionary compared to other variants. If you are using this dictionary in production, you should acknowledge this as a potential issue.

IPA Charts#

Consonants#

Obstruent symbols to the left of are unvoiced and those to the right are voiced.

Manner

Labial

Labiodental

Dental

Alveolar

Alveopalatal

Retroflex

Palatal

Velar

Glottal

Nasal

Occurrences:
7,893
Examples:
Occurrences:
1,448
Examples:
Occurrences:
1
Examples:
Occurrences:
16,223
Examples:
Occurrences:
2,725
Examples:
Occurrences:
3,399
Examples:

Stop Plain

Occurrences:
6,672
Examples:
Occurrences:
320
Examples:
Occurrences:
3
Examples:
Occurrences:
5,067
Examples:
Occurrences:
855
Examples:
Occurrences:
656
Examples:
Occurrences:
93
Examples:
Occurrences:
8,341
Examples:
Occurrences:
2,662
Examples:
Occurrences:
45
Examples:
Occurrences:
7,826
Examples:
Occurrences:
1,929
Examples:
Occurrences:
4,719
Examples:
Occurrences:
847
Examples:
Occurrences:
3
Examples:
Occurrences:
1,727
Examples:
Occurrences:
2,580
Examples:
Occurrences:
204
Examples:
Occurrences:
658
Examples:
Occurrences:
31
Examples:
Occurrences:
8,663
Examples:
Occurrences:
188
Examples:
Occurrences:
3,146
Examples:
Occurrences:
36
Examples:

Aspirated

Occurrences:
484
Examples:
Occurrences:
780
Examples:
Occurrences:
816
Examples:
Occurrences:
445
Examples:

Affricate

Occurrences:
1,279
Examples:
Occurrences:
2,246
Examples:

Sibilant

Occurrences:
16,346
Examples:
Occurrences:
8,015
Examples:
Occurrences:
3,321
Examples:
Occurrences:
109
Examples:

Fricative

Occurrences:
3,475
Examples:
Occurrences:
924
Examples:
Occurrences:
2,578
Examples:
Occurrences:
231
Examples:
Occurrences:
325
Examples:
Occurrences:
155
Examples:
Occurrences:
480
Examples:
Occurrences:
1,765
Examples:

Approximant

Occurrences:
504
Examples:
Occurrences:
2,194
Examples:
Occurrences:
14,295
Examples:
Occurrences:
590
Examples:

Tap

Occurrences:
143
Examples:

Lateral

Occurrences:
7,854
Examples:
Occurrences:
3,204
Examples:
Occurrences:
3,874
Examples:

Vowels#

Vowel symbols to the left of are unrounded and those to the right are rounded.

Front

Near-Front

Central

Near-Back

Back

Close

Occurrences:
11,109
Examples:
Occurrences:
2,844
Examples:
Occurrences:
528
Examples:
Occurrences:
1,402
Examples:
Occurrences:
323
Examples:
Occurrences:
735
Examples:
Occurrences:
14,178
Examples:
Occurrences:
1,180
Examples:

Close-Mid

Occurrences:
1,923
Examples:
Occurrences:
1,174
Examples:
Occurrences:
2,017
Examples:
Occurrences:
262
Examples:
Occurrences:
621
Examples:
Occurrences:
892
Examples:
Occurrences:
18,989
Examples:
Occurrences:
1,139
Examples:

Open-Mid

Occurrences:
9,479
Examples:
Occurrences:
217
Examples:
Occurrences:
60
Examples:
Occurrences:
618
Examples:
Occurrences:
451
Examples:
Occurrences:
2,180
Examples:
Occurrences:
1,277
Examples:
Occurrences:
2,583
Examples:

Open

Occurrences:
11,153
Examples:
Occurrences:
590
Examples:
Occurrences:
2,490
Examples:
Occurrences:
1,211
Examples:
Occurrences:
4,054
Examples:
Occurrences:
1,366
Examples:

Diphthongs#

  • aj

  • aw

  • ej

  • ow

  • ɔj

  • əw