ecur 得意音通技术 6 Your Partner inthe Centum af snatch A中适 兰糖话 陶容话 e江官话 说明:本图《中国西喜 集(图A2) 言的 官话方言分布图
Your Partner in the Century of Speech 6
ecur 得意音通技术 Your Partnerinthe Century of speech 现代吴语方言分区图 江淮官话” 苏沪嘉小片 宣州片 灶 徽语 太湖片 州片 处衢」 福 瓯江片 建
Your Partner in the Century of Speech 7 太湖片 台 州 片 瓯江片 ? 处衢片 苏沪嘉小片 江淮官话 徽语 宣州片 杭州小片 林绍小片
ecur 得意音通技术 Factors to be considered in data creation(2) Your Partnerin the Century of Speech 日 Speaking style Read for asr in earlier research, or for Tts Spontaneous/ conversational: for ASR nowadays 口 Recording channel 8 Depending on goal of task or application, or the application environment Close-talk microphones: for personal computers(PCs) Telephone, and or cellular phone: for telephony applications Specific channel: for embedded applications(PDA, digital recorder, .) or broadcast news, TV news. Normally mono channel instead of stereo channel 4 However, microphone array may be used for some research purpose
Your Partner in the Century of Speech 8 Factors to be considered in data creation (2) ❑Speaking style :- ❖Read: for ASR in earlier research, or for TTS ❖Spontaneous/conversational: for ASR nowadays ❑Recording channel ❖Depending on goal of task or application, or the application environment ▪ Close-talk microphones: for personal computers (PCs) ▪ Telephone, and/or cellular phone: for telephony applications ▪ Specific channel: for embedded applications (PDA, digital recorder, ...), or broadcast news, TV news. ❖Normally mono channel instead of stereo channel. ❖However, microphone array may be used for some research purpose
ecur 得意音通技术 9 Factors to be considered in data creation (3) Your Partnerin the Century of Speech 口 Sampling rate: s8 kHz: for the telephone/ mobile-phone channel where the bandwidth is about 3. 4 khz 16 kHz: for the close-talk microphone PC channel though the bandwidth is higher than 8 kHz 日 Sampling precision: ☆16bits, normally. 88-bit A-law or Miu-law(13-bit wide after decompression) a Signal-to-Noise Ratio ( snr) level s Was/is often collected in a good environment (clean speech database For noise-related research, noisy data obtained via Noises(noiseX 92 )mixed with clean speech Collected in real-world noisy environments
Your Partner in the Century of Speech 9 Factors to be considered in data creation (3) ❑ Sampling rate :- ❖ 8 kHz: for the telephone/mobile-phone channel where the bandwidth is about 3.4 kHz ❖ 16 kHz: for the close-talk microphone PC channel though the bandwidth is higher than 8 kHz. ❑ Sampling precision :- ❖ 16 bits, normally. ❖ 8-bit A-law or Miu-law (13-bit wide after decompression). ❑ Signal-to-Noise Ratio (SNR) level: ❖ Was/is often collected in a good environment (clean speech database). ❖ For noise-related research, noisy data obtained via :- ▪ Noises (NOISEX 92) mixed with clean speech; ▪ Collected in real-world noisy environments
ecur 得意音通技术 10 Factors to be considered in data creation(4) Your Partnerin the Century of Speech U Number of speakers and speaker balance The more, the better: with a good speaker diversity according to Gender ge ■ Education Birthplace or dialectal background Occupation and so on 日 Corpus size: B Measured by either the number of speakers or the length of valid speech in hour, or both
Your Partner in the Century of Speech 10 Factors to be considered in data creation (4) ❑Number of speakers and Speaker balance: ❖The more, the better: with a good speaker diversity, according to :- ▪ Gender; ▪ Age; ▪ Education; ▪ Birthplace (or dialectal background); ▪ Occupation; ▪ and so on. ❑Corpus size: ❖Measured by either the number of speakers or the length of valid speech in hour, or both