h 日 Lexicon Linguists say the vocabulary similarity rate between Pth and Wu dialect is about 60-70% &a dialect-related lexicon containing two parts a common part shared by standard chinese and most dialectal Chinese languages over 50k words), and a dialect-related part(several hundreds ☆ And in this lexicon each word has one pinyin string for standard Chinese pronunciation and a kind of representation for dialectal Chinese pronunciation, and each of those dialect-related words is corresponding to a word in the common part with the same meaning Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 11 ❑ Lexicon ❖ Linguists say the vocabulary similarity rate between PTH and Wu dialect is about 60~70% ❖ A dialect-related lexicon containing two parts :- ▪ a common part shared by standard Chinese and most dialectal Chinese languages (over 50k words), and ▪ a dialect-related part (several hundreds). ❖ And in this lexicon :- ▪ each word has one pinyin string for standard Chinese pronunciation and a kind of representation for dialectal Chinese pronunciation, and ▪ each of those dialect-related words is corresponding to a word in the common part with the same meaning
h 12 日 Language Though it is difficult to collect dialect texts, dialect related lexical entry replacement rules could be learned in advance and therefore The language post-processing or language model adaptation techniques could be adopted Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 12 ❑Language ❖Though it is difficult to collect dialect texts, dialectrelated lexical entry replacement rules could be learned in advance, and therefore ❖The language post-processing or language model adaptation techniques could be adopted
h 13 我做饭给你吃(PTH) ·●我烧饭给你吃Wu Dialectal words substitute for some words 你先走(PTH 你走先(Wu) W ord-order changes Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 13 … w1 wV2 w3 … … w1 w23 w32 … w2 w3 w2 w3 1 2 Dialectal words substitute for some words 我 做饭 给 你 吃 (PTH) 我 烧饭 给 你 吃(Wu) Word-order changes 你 先 走 (PTH) 你 走 先 (Wu)
h 14 Dialect AM2 Our AMo= AM for standard Chinese focus AMI=AM with accent AM2= AM with dialect AMI LMO LM for standard Chinese LMI=LM with dialectal lexicon LM2 =LM with dialectal lexicon/syntax Seldom-seen in dialectal Chinese AMO LMO LMI LM2 tandard chinese Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 14 AM0 = AM for standard Chinese AM1 = AM with accent AM2 = AM with dialect LM0 = LM for standard Chinese LM1 = LM with dialectal lexicon LM2 = LM with dialectal lexicon/syntax Seldom-seen in dialectal Chinese LM0 LM1 LM2 AM2 AM1 AM0 Dialect Standard Chinese Our focus
15 Database Collection Data Creation for WDC Database e-Dictionary iF Syllable Database Spee Set Definition Collection Transcription PTH Wu dialect Read Spontaneous C-Chars Words ords S Syllables IFS/GIFs PTH Words Only PTH Pron PTH Pron Misc Info Wu dialect pron Wu dialect Pron PTH Wu word Topics PTH Synonym IF: a Chinese Initial or Final; GIF: generalized IF; PTH: Putonghua(standard Chinese); WDC: Wu Dialectal Chinese Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 15 Database Collection Data Creation for WDC Database e-Dictionary Database Collection Speech Transcription Read Speech Spontaneous Speech PTH Words Only PTH + Wu Words Topics IF & Syllable Set Definition PTH Words Wu Dialect Words Misc Info IFs/GIFs Syllables C-Chars Wu Dialect Pron. PTH Pron. PTH Pron. PTH Synonym Wu Dialect Pron. IF: a Chinese Initial or Final; GIF: generalized IF; PTH: Putonghua (standard Chinese); WDC: Wu Dialectal Chinese