ecur 得意音通技术 Outline Your Partnerin the Century of Speech PUrpose of speech corpora U factors to be considered in data creation Data creation 日 Data transcription ULearning from corpora aChinese Corpus Consortium(CCc)
Your Partner in the Century of Speech 11 Outline ❑Purpose of speech corpora ❑Factors to be considered in data creation ❑Data creation ❑Data transcription ❑Learning from corpora ❑Chinese Corpus Consortium (CCC)
ecur 得意音通技术 12 Data creation Your Partnerin the Century of Speech u Purposes for asr corpus 8 acoustic training Training Set Speech recognition evaluation (testing)-Testing Set a Categories of ASR corpus 心 Read speech i Spontaneous/ conversational speech Design of a speech database before creation f Aspects as mentioned above language, speaking style, recording channel, sampling rate and precision, and corpus size: according to the application /task background; SNR levels, number of speakers and the speaker balance: for diversity consideration Speaking content balance- for content diversity consideration, to provide a good training set, For read speech, the balance could be on a basis of phone, di-phone, tri-phone, and so on IF, di-IF, tri-IF, syllable, di-syllable, tri-Syllable for Chinese For spontaneous speech, topics design
Your Partner in the Century of Speech 12 Data Creation ❑ Purposes for ASR corpus :- ❖ Acoustic training - Training Set ❖ Speech recognition evaluation (testing) - Testing Set ❑ Categories of ASR corpus :- ❖ Read speech; ❖ Spontaneous/conversational speech. ❑ Design of a speech database before creation :- ❖ Aspects as mentioned above: ▪ language, speaking style, recording channel, sampling rate and precision, and corpus size: according to the application/task background; ▪ SNR levels, number of speakers and the speaker balance: for diversity consideration. ❖ Speaking content balance - for content diversity consideration, to provide a good training set, ▪ For read speech, the balance could be on a basis of – phone, di-phone, tri-phone, and so on; – IF, di-IF, tri-IF, syllable, di-syllable, tri-syllable for Chinese ▪ For spontaneous speech, topics design
ecur 得意音通技术 13 Data Creation- Read Speech (1) Your Partnerin the Century of Speech aThough spontaneous asr is becoming one of the research focuses, the read speech database collection is still necessary lA high quality read speech corpus is helpful to train a good initial acoustic model, and then OIn spontaneous ASR, pronunciation modelling techniques as well as pronunciation lexicons are adopted to get a practically good acoustic model finall
Your Partner in the Century of Speech 13 Data Creation – Read Speech (1) ❑Though spontaneous ASR is becoming one of the research focuses, the read speech database collection is still necessary. ❑A high quality read speech corpus is helpful to train a good initial acoustic model, and then ❑In spontaneous ASR, pronunciation modelling techniques as well as pronunciation lexicons are adopted to get a practically good acoustic model finally
ecur 得意音通技术 14 Data Creation- Read Speech(2) Your Partnerin the Century of Speech aGoal of read speech corpus design is often to balance the speech recognition units(modelling units), so as to cover as many co-articulations as possible using a set of sentences as small as possible u Such a minimal sentence set can be used for not only the training of acoustic models but also the speaker adaptation
Your Partner in the Century of Speech 14 Data Creation – Read Speech (2) ❑Goal of read speech corpus design is often to balance the speech recognition units (modelling units), so as to cover as many co-articulations as possible using a set of sentences as small as possible. ❑Such a minimal sentence set can be used for not only the training of acoustic models but also the speaker adaptation
ecur 得意音通技术 15 Data Creation- Read Speech (3) Your Partnerin the Century of Speech u Sentence design example goal is to choose 6,000 sentences (about 0. 75%)from 800,000 sentences taken from the People's daily with a balanced di-if distribution ☆ Several criteria: Natural selection - randomly. Almost natural di-IF distribution Restraining high-frequency di-IFsRHF] To restrain those high-frequency di-IFs from occurring more frequently so that each di-iF occurs almost equally to well train the acoustic mode Encouraging low-frequency di-IFs(ELF). As an alternative to encourage those low-frequency di-IFs as frequently as possible. of occurring times of any di- If should be greater than a doable pre-defined threshold
Your Partner in the Century of Speech 15 Data Creation – Read Speech (3) ❑ Sentence design example. ❖ Goal is to choose 6,000 sentences (about 0.75%) from 800,000 sentences taken from the People's Daily with a balanced di-IF distribution. ❖ Several criteria: ▪ Natural selection - randomly. Almost natural di-IF distribution. ▪ Restraining high-frequency di-IFs (RHF). To restrain those high-frequency di-IFs from occurring more frequently so that each di-IF occurs almost equally to well train the acoustic model. ▪ Encouraging low-frequency di-IFs (ELF). As an alternative, to encourage those low-frequency di-IFs as frequently as possible. # of occurring times of any di-IF should be greater than a doable pre-defined threshold