SAMPA-C: Machine readable Ipa a Phonologic consonants 23 a Phonologic vowels o Initials 21 口 finals 38 口 Retroflexed finals 38 o Tones and silences a Sound changes a Spontaneous phenomenon labels Center of speech Technology, Tsinghua University Slide 6
Center of Speech Technology, Tsinghua University Slide 6 ❑ Phonologic Consonants - 23 ❑ Phonologic Vowels - 9 ❑ Initials - 21 ❑ Finals - 38 ❑ Retroflexed finals - 38 ❑ Tones and Silences ❑ Sound Changes ❑ Spontaneous Phenomenon Labels SAMPA-C: Machine Readable IPA
Key points in PM (1) a Choosing and generating speech recognition unit (SrU set , So as to well describe the phone changes and sound changes ,s Could be syllable, semi-syllable, or INITIAL/FINAL a Constructing a multi-pronunciation lexicon(MPL) s a syllable-to-sru lexicon to reflect the relation between the ammatical units and acoustic models a Acoustically modeling spontaneous speech Theoretical framework . s CD modeling confusion matrix; data-driven Center of speech Technology, Tsinghua University Slide 7
Center of Speech Technology, Tsinghua University Slide 7 Key Points in PM (1) ❑ Choosing and generating speech recognition unit (SRU) set ❖ So as to well describe the phone changes and sound changes ❖ Could be syllable, semi-syllable, or INITIAL/FINAL. ❑ Constructing a multi-pronunciation lexicon (MPL) ❖ A syllable-to-SRU lexicon to reflect the relation between the grammatical units and acoustic models ❑ Acoustically modeling spontaneous speech ❖ Theoretical framework ❖ CD modeling; confusion matrix; data-driven
Key points in PM (2) a Customizing decoding algorithm according to new lexicon Improved time-synchronous search algorithm to reduce the path expansion(caused by CD modeling) a based algorithm based tree-trellis search algorithm to score multiple pronunciation variations simul taneously in the path a Modifying statistical language model W=arg max P(X W)P(W) W= arg max P(XIn)P() W W=Baseform() w=argmax P(X)(W)P(W) W=Baseform(l Center of speech Technology, Tsinghua University Slide 8
Center of Speech Technology, Tsinghua University Slide 8 Key Points in PM (2) ❑ Customizing decoding algorithm according to new lexicon ❖ Improved time-synchronous search algorithm to reduce the path expansion (caused by CD modeling) ❖ A* based algorithm based tree-trellis search algorithm to score multiple pronunciation variations simultaneously in the path ❑ Modifying statistical language model ˆ arg max ( | ) ( ) W W P X W P W = ( ) ˆ argmax ( | ) ( ) W Baseform V W P X V P V = = ( ) ˆ argmax ( | ) ( | ) ( ) W Baseform V W P X V P V W P W = =
Establishment of multi-Pron Lexicon a Two major approaches ☆ Define ed by linguists and phonetist Data-driven confusion matrix. rewritten rules decision tree 口 Our metho Find all possible pronunciations in SAMPA-C from database Reduce the size according to occurring frequencies Center of speech Technology, Tsinghua University Slide g
Center of Speech Technology, Tsinghua University Slide 9 ❑ Two major approaches ❖ Defined by linguists and phonetists ❖ Data-driven: confusion matrix, rewritten rules, decision tree ... ❑ Our method: ❖ Find all possible pronunciations in SAMPA-C from database ❖ Reduce the size according to occurring frequencies Establishment of Multi-Pron. Lexicon
Surface form for IF and syllable o Learning pronunciations Definition of Generalized Initial-Finals(GIFs) Collect all of them and choose the ts canonical most frequent ones ts v voiced as GIFs ts changed ts v changed to voiced ch canonica 7 troflexed or changed to ' e changed . Definition of Generalized Syllables(Gss)the lexicon Define them chang 0. tsh AN accordin ing to GIF chang 0. 1215 ts hv AN set chaI ng [0.0280] ts v AN chang [0.0187<deletion> AN chang [0.0187]z AN chang [0.0093<deletion> IAN P(GIFI GIF I Syllable) chang 0.0093]tsh AN chang [0.0093]tsh Center of Speech Technology, Tsinghua University Slide 10
Center of Speech Technology, Tsinghua University Slide 10 ❑ Learning pronunciations ❖ Definition of Generalized Initial-Finals (GIFs) ➢ z ts : canonical ➢ z ts_v : voiced ➢ z ts` : changed to ‘zh’ ➢ z ts`_v : changed to voiced ‘zh’ ➢ e 7 : canonical ➢ e 7` : retroflexed or changed to ‘er’ ➢ e @ : changed ❖ Definition of Generalized Syllables (GSs) – the lexicon ➢ chang [0.7850] ts`_h AN ➢ chang [0.1215] ts`_h_v AN ➢ chang [0.0280] ts`_v AN ➢ chang [0.0187] <deletion> AN ➢ chang [0.0187] z` AN ➢ chang [0.0093] <deletion> iAN ➢ chang [0.0093] ts_h AN ➢ chang [0.0093] ts`_h UN P ( [GIFi ] GIFf | Syllable ) Define them according to GIF set. Collect all of them and choose the most frequent ones as GIFs. Probabilistic lexicon. Surface form for IF and Syllable