h 16 a Wu Dialectal Chinese(WDC) Database Collection( 1) ☆ Collection Totally 1l hours-Half read(r)+ half spontaneous(S) 100 Shanghai speakers*(3R+3S )minutes/ speaker 10 Beijing speakers *6S minutes/ speaker Read speech with well-balanced prompting sentences, Type I: each sentence contains PTH words only (5-6k Type II: each sentence contains one or two most commonly used Wu dialectal words while others are Pth words Spontaneous speech with Pre-defined talking topics Conversations with PtH speaker on self-selected topic from Bo, Sports, policy/economy, entertainment, lifestyles, technology lanced Speaker(gender, age, education, PTH level,. Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 16 ❑ Wu Dialectal Chinese (WDC) Database Collection (1) ❖ Collection: ▪ Totally 11 hours - Half read (R) + half spontaneous (S): – 100 Shanghai speakers * (3R +3S) minutes / speaker – 10 Beijing speakers * 6S minutes / speaker ▪ Read speech with well-balanced prompting sentences; – Type I: each sentence contains PTH words only (5-6k) – Type II: each sentence contains one or two most commonly used Wu dialectal words while others are PTH words ▪ Spontaneous speech with Pre-defined talking topics; – Conversations with PTH speaker on self-selected topic from: sports, policy/economy, entertainment, lifestyles, technology ▪ Balanced Speaker (gender, age, education, PTH level, …)
h 17 Gender Male: 50% emale: 50% ge 26-40:50% 41-50:50% Education Ordinary: 20%0 Well: 80% Goal Num of speakers Male Female Total 26-40 27 52 Age 41-50 23 48 Well 41 41 82 Education Ordinary 9 9 18 Actual WDc data diversity Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 17 Num of speakers Male Female Total Age 26-40 27 25 52 41-50 23 25 48 Education Well 41 41 82 Ordinary 9 9 18 Gender Male : 50% Female: 50% Age 26-40 : 50% 41-50: 50% Education Ordinary: 20% Well : 80% Actual WDC Data Diversity Goal
h 18 1 B 3A Accent Assessment by experts IA. CCTV-level radiobroadcaster; IB Province-level radiobroadcaster; 2A. Quite good 2B. Less accented: 3A. More accented: 3B. Hard to understand but known it is pth Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 18 Accent Assessment by experts 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 1 A 1 B 2 A 2 B 3 A 3 B 1A. CCTV-level radiobroadcaster; 1B. Province-level radiobroadcaster; 2A. Quite good; 2B. Less accented; 3A. More accented; 3B. Hard to understand but known it is PTH
h 19 30 25 口26-40 15 口41-50 1A 1B 2A 2B 3A 3B Accent assessment according to age Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 19 0 5 1 0 1 5 2 0 2 5 3 0 3 5 1 A 1 B 2 A 2 B 3 A 3 B 26-40 41-50 Accent Assessment according to age
h 20 45 30 25 口0 rdinary 口Well 15 IA Ib 2A 2B 3a 3B Accent assessment according to education level Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 20 0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 1 A 1 B 2 A 2 B 3 A 3 B Ordinary Well Accent Assessment according to education level