CMU Pronouncing Dictionary

From Wikipedia, the free encyclopedia
(Redirected from Cmudict)
Jump to navigation Jump to search

Template:Short description Script error: No such module "Infobox".Template:Template other Script error: No such module "Check for unknown parameters".Script error: No such module "Check for conflicting parameters". The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models[1] that will generate pronunciations for words not yet included in the dictionary.

The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available.[2]

Database format

The database is distributed as a plain text file with one entry to a line in the format "WORD  <pronunciation>" with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR).

The following is a table of phonemes used by CMU Pronouncing Dictionary.[2]

Vowels
ARPABET Rspl. IPA Example
AA ah Template:IPA link odd
AE a Template:IPA link at
AH0 ə Template:IPA link about
AH uh Template:IPA link hut
AO aw Template:IPA link ought, story
AW ow Script error: No such module "IPA". cow
AY eye Script error: No such module "IPA". hide
EH eh Template:IPA link Ed
Vowels
ARPABET Rspl. IPA Example
ER ur, ər Template:IPA link, Template:IPA link hurt
EY ay Script error: No such module "IPA". ate
IH i, ih Template:IPA link it
IY ee Template:IPA link eat
OW oh Script error: No such module "IPA". oat
OY oy Script error: No such module "IPA". toy
UH uu Template:IPA link hood
UW oo Template:IPA link two
Stress
AB Description
0 No stress
1 Primary stress
2 Secondary stress
Consonants
ARPABET Rspl. IPA Example
B b Template:IPA link be
CH ch, tch Template:IPA link cheese
D d Template:IPA link dee
DH dh Template:IPA link thee
F f Template:IPA link fee
G g Template:IPA link green
HH h Template:IPA link he
JH j Template:IPA link gee
Consonants
ARPABET Rspl. IPA Example
K k Template:IPA link key
L l Template:IPA link lee
M m Template:IPA link me
N n Template:IPA link knee
NG ng Template:IPA link ping
P p Template:IPA link pee
R r Template:IPA link read
S s, ss Template:IPA link sea
Consonants
ARPABET Rspl. IPA Example
SH sh Template:IPA link she
T t Template:IPA link tea
TH th Template:IPA link theta
V v Template:IPA link vee
W w, wh Template:IPA link we
Y y Template:IPA link yield
Z z Template:IPA link zee
ZH zh Template:IPA link seizure

History

Version Release date[3] License
0.1 16 September 1993 Public Domain
0.2 10 March 1994 Public Domain
0.3 28 September 1994 Public Domain
0.4 8 November 1995 Public Domain
0.5 No public release Public Domain
0.6 11 August 1998 Public Domain
0.7 No public release Public Domain
0.7a 18 February 2008 2-clause BSD
0.7b 19 November 2014[4] 2-clause BSD
GitHub (unversioned) 26 May 2021 2-clause BSD

Applications

  • The Unifon converter is based on the CMU Pronouncing Dictionary.
  • The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary.
  • The Carnegie Mellon Logios[5] tool incorporates the CMU Pronouncing Dictionary.
  • PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also supports searching by pronunciation.
  • Some singing voice synthesizer software like CeVIO Creative Studio and Synthesizer V uses modified version of CMU Pronouncing Dictionary for synthesizing English singing voices.
  • Transcriber, a tool for the full text phonetic transcription, uses the CMU Pronouncing Dictionary
  • 15.ai, a real-time text-to-speech tool using artificial intelligence, uses the CMU Pronouncing Dictionary

See also

References

  1. Script error: No such module "citation/CS1".
  2. a b Script error: No such module "citation/CS1".
  3. Template:Cite FTP
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".

External links