Cambridge University, UK Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug.24,2007 Center for Speech and language Technologies Center for Speech and Language Technologies, Tsinghua University
Center for Speech and Language Technologies, Tsinghua University Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug. 24, 2007 @ Cambridge University, UK
2 Outline 口 Motivation u Dialectal chinese database collection ☆Wu Mi ☆ Chuan Approaches o Chinese syllable mapping 令 Lexicon adaptation State-dependent phoneme-based model merging sDPBMM Integration of SDPBMM with adaptation 口 Remarks Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 2 Outline ❑ Motivation ❑ Dialectal Chinese database collection ❖ Wu ❖ Min ❖ Chuan ❑ Approaches ❖ Chinese syllable mapping ❖ Lexicon adaptation ❖ State-dependent phoneme-based model merging (SDPBMM) ❖ Integration of SDPBMM with adaptation ❑ Remarks
Motivation u Chinese asr encounters an issue that is bigger than that of any other language-dialect a There are 8 major dialectal regions in addition to Mandarin(northern China), including o Wu Southern Jiangsu, Zhejiang, and shanghai Yue(guangdong, Hong Kong, Nanning Guangxi 8 Min(Fujian, Shantou Guangdong, Haikou Hainan, Taipei taiwan) Hakka meixian guangdong, Hsin-chu Taiwan 令Gan( Jiangxi); 令 Xiang( Hunan); 冷Hi( Anhui) o Jin shanxi, Hohehot Inner mongolia u Can be further divided into over 40 sub-categories Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 3 Motivation ❑ Chinese ASR encounters an issue that is bigger than that of any other language - dialect. ❑ There are 8 major dialectal regions in addition to Mandarin (Northern China), including:- ❖ Wu (Southern Jiangsu, Zhejiang, and Shanghai); ❖ Yue (Guangdong, Hong Kong, Nanning Guangxi); ❖ Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan); ❖ Hakka (Meixian Guangdong, Hsin-chu Taiwan); ❖ Gan (Jiangxi); ❖ Xiang (Hunan); ❖ Hui (Anhui) ❖ Jin (Shanxi, Hohehot Inner Mongolia). ❑ Can be further divided into over 40 sub-categories
中国汉语方言图 新州土 ]州话 东 请查地区 其请有制方 可a准 布老冒函 回回 北万冒晒 射语 语 中话 □ 话 客家话 客家函据民 若南盲语土话并用地区
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 4
h 5 a Chinese dialects share a same written language B The same Chinese pinyin set (canonically B The same Chinese character set (canonically), and The same vocabulary canonically) a And standard Chinese(known as Putonghua, or PTH) is widely spoken in most regions over china a However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese- Putonghua influenced by native dialect o In dialectal Chinese B Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect g asr relies to a great extent on the consistent pronunciation and usage of words within a language B ASR systems constructed to process PTh perform poorly for the great majority of the population Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 5 ❑ Chinese dialects share a same written language:- ❖ The same Chinese pinyin set (canonically), ❖ The same Chinese character set (canonically), and ❖ The same vocabulary (canonically). ❑ And standard Chinese (known as Putonghua, or PTH) is widely spoken in most regions over China. ❑ However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese - Putonghua influenced by native dialect ❑ In dialectal Chinese :- ❖ Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect. ❖ ASR relies to a great extent on the consistent pronunciation and usage of words within a language. ❖ ASR systems constructed to process PTH perform poorly for the great majority of the population