Probabilistic Pronunciation Modeling a Theory 今 Recognizer goal AM K=argmaxk P(KlA)=argmaxk P(Ak)P(K) gs applying independent assumption LM P(Ak=In P(a,kn) ,s Pronunciation modeling part-via introducing surface s P(ak)=∑(ak,s)P(sk) Refined AM S y mboS Output Prob a: Acoustic signal, k: IF s: GIF K, S: corresponding string Center of speech Technology, Tsinghua University Slide 11
Center of Speech Technology, Tsinghua University Slide 11 ❑ Theory ❖ Recognizer goal ➢ K* =argmaxK P(K|A) = argmaxK P(A|K) P(K) ❖ Applying independent assumption ➢ P(A|K) = n P(an |kn ) ❖ Pronunciation modeling part – via introducing surface s ➢ P(a|k) = s P(a|k,s) P(s|k) ❖ Symbols ➢ a: Acoustic signal, k: IF, s: GIF ➢ A, K, S: corresponding string AM LM Refined AM Output Prob. Probabilistic Pronunciation Modeling
Refined Acoustic Modeling (RAM) a P(ak, s)--RAM It cannot be trained directly the solutions could be Use P(alk) instead Ⅰ fmodeling Use P(als) instead GIF modeling Adapt P(alk)to P(ak, s) B-GIF modeling Adapt P(als) to P(alk, s) S-GIF modeling IF-GIF transcription should be generated from the IFand GIF transcriptions Need more data, but the data amount is fixed Using adaptation IF( Syllable i l f3 GIF(SAMPA-C)gil & gi2 gi3 gif4gf3 ActuallY TE-GIF i1-gil f1-gfl12-gi2 13-i3y4-gf3-813 Center of speech Technology, Tsinghua University Slide 12
Center of Speech Technology, Tsinghua University Slide 12 ❑ P(a|k, s) -- RAM ❖ It cannot be trained directly, the solutions could be: ➢ Use P(a|k) instead -- IF modeling ➢ Use P(a|s) instead -- GIF modeling ➢ Adapt P(a|k) to P(a|k, s) -- B-GIF modeling ➢ Adapt P(a|s) to P(a|k, s) -- S-GIF modeling ☺ ▪ IF-GIF transcription should be generated from the IF and GIF transcriptions ▪ Need more data, but the data amount is fixed ▪ Using adaptation IF ( Syllable) i1 f1 i2 f2 i3 f3 … GIF ( SAMPA-C) gi1 gf1 gi2 gi3 gif4 gf3 … Actual IF i1 f1 i2 i3 if4 f3 … IF-GIF i1-gi1 f1-gf1 i2-gi2 i3-gi3 if4-gif4 f3-gf3 … Refined Acoustic Modeling (RAM)