6 Research Goal a To develop a general framework to model in dialectal Chinese Asr tasks: g Phonetic variability i Lexical variability and i Pronunciation variability u To find suitable methods to modify the baseline pth recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ g dialect-related knowledge(syllable mapping, cross-dialect synonyms,.), and training/adaptation data ( in relatively small quantities a Expectation: the resulted recognizer should also work for PTH, in other words it should be good for a mixture of Pth and dialectal chinese a This proposal was selected as one of three projects for 2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 6 Research Goal ❑ To develop a general framework to model in dialectal Chinese ASR tasks :- ❖ Phonetic variability, ❖ Lexical variability, and ❖ Pronunciation variability ❑ To find suitable methods to modify the baseline PTH recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ :- ❖ dialect-related knowledge (syllable mapping, cross-dialect synonyms, …), and ❖ training/adaptation data (in relatively small quantities) ❑ Expectation: the resulted recognizer should also work for PTH, in other words, it should be good for a mixture of PTH and dialectal Chinese. ❑ This proposal was selected as one of three projects for '2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS
h Standard Chinese Dialectal Chinese Related Speech Recognizer Knowledge Resources Dialectal Chinese Speech Recognition Framework Dialectal Chinese Speech recognizer Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 7 Dialectal Chinese Speech Recognition Framework Standard Chinese Speech Recognizer + Dialectal Chinese Speech Recognizer Dialectal Chinese Related Knowledge & Resources
h u For practical reasons, during the summer we only focused on one specific dialect, the wu dialect(Shanghai Area), and the target language was Wu dialectal Chinese(WDC for short) 日 Why wu dialect? 8 Population: more than 70 million people use Wu dialect, the 2nd popular dialect in China: 8 Economy: one of the most advanced city in China- Shanghai s Wu dialect is a full-developed language The syntax of Wu dialect is very complex The vocabulary is even more larger than Mandarin Many literature masterpiece were influenced by wu dialect (in history WU Mandarin Cantonese Phoneme# 50 37 <33 Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 8 ❑ For practical reasons, during the summer we only focused on one specific dialect, the Wu dialect (Shanghai Area), and the target language was Wu dialectal Chinese (WDC for short); ❑ Why Wu dialect? ❖ Population: more than 70 million people use WU dialect, the 2nd popular dialect in China; ❖ Economy: one of the most advanced city in China – Shanghai ❖ Wu dialect is a full-developed language ▪ The syntax of Wu dialect is very complex; ▪ The vocabulary is even more larger than Mandarin; ▪ Many literature masterpiece were influenced by WU dialect (in history). WU Mandarin Cantonese Phoneme# 50 37 <33
9 Useful Dialect-Related Knowledge a Chinese Syllable Mapping(CSM) This Csm is dialect-related ☆ Two types: Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh>z, ch->c, sh>S, n>L, and so on, and Final mappings include eng>en, ing>in, and so on; Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin guo2' is changed into 'guio in word'FfEl(China) but only the tone is changed in word过去past Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 9 Useful Dialect-Related Knowledge ❑ Chinese Syllable Mapping (CSM) ❖ This CSM is dialect-related. ❖ Two types: ▪ Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh→z, ch→c, sh→s, n→l, and so on, and Final mappings include engen, ingin, and so on; ▪ Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin 'guo2' is changed into 'gui0' in word '中国(China)' but only the tone is changed in word '过去(past)
h 10 A ☆ The CSm could be n→1,1→N,令 The CSm is or crossed not exact For any mapping Chuan dialect A→>B,itis BI kuo kui mostly that B2(B(3 the resulted pronunciation is not B Bi is a variation of b. such 克服/上课 exactly, but as 扩大/魁梧 something nasalization, quite similar centralization iced to B. more ku voiceless, similar to B rounding, syllabic Standard Chinese syllabe set than to any pharyngrealization other syllable. aspiration Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 10 ❖ The CSM is not exact. For any mapping A→B, it is mostly that the resulted pronunciation is not B exactly, but something quite similar to B, more similar to B than to any other syllable. A B B1 B3 B4 B2 Bi is a variation of B, such as :- nasalization, centralization, voiced, voiceless, rounding, syllabic, pharyngrealization, aspiration kei kuo kui... Standard Chinese Syllabe Set Chuan Dialect ke [克]服 上[课] kuo kui [扩]大 [魁]梧 ❖ The CSM could be N→1, 1→N, or crossed