当前位置：和泉文库 > 计算机 > 浏览文档

Landmark-Based Speech Recognition

The Marriage of High-Dimensional Machine Learning Techniques with Modern Linguistic Representations

文件格式：PPT，文件大小：1.13MB，售价：9.12元

文档详细内容（约40页）

Landmark-Based Speech Recognition The marriage of high-Dimensional machine Learning Techniques with Modern Linguistic Representations Mark hasegawa-Johnson Thasegamauiuc edu Research performed in colla boration with James Baker( Carnegie Mellon), Sarah Borys(lino is) Ken Chen(linois), Emily Coogan(llinois). Steven Greenberg(Berkeley), Amit Juneja( Maryland), Katrin Kirchhoff (Washington), Karen Livescu(MIT), Srividya Mohan(Johns Hopkins), Jen muller( dept of Defense), Kemal Sonmez (sri, and Tianyu wang (georgia Tech)

Landmark-Based Speech Recognition The Marriage of High-Dimensional Machine Learning Techniques with Modern Linguistic Representations Mark Hasegawa-Johnson jhasegaw@uiuc.edu Research performed in collaboration with James Baker (Carnegie Mellon), Sarah Borys (Illinois), Ken Chen (Illinois), Emily Coogan (Illinois), Steven Greenberg (Berkeley), Amit Juneja (Maryland), Katrin Kirchhoff (Washington), Karen Livescu (MIT), Srividya Mohan (Johns Hopkins), Jen Muller (Dept. of Defense), Kemal Sonmez (SRI), and Tianyu Wang (Georgia Tech)

What are landmarks Time-frequency regions of high mutual information between phone and signal (maxima of i(phone label; acoustics(t,f)) Acoustic events with similar importance in all languages, and across all speaking styles Acoustic events that can be detected even in extremely noisy environments Where do these things happen? Syllable onset consonant release Syllable nucleus Vowel Center Syllable coda a consonant closure I(phone; acoustics)experiment: Hasegawa-Johnson, 2000

What are Landmarks? • Time-frequency regions of high mutual information between phone and signal (maxima of I(phone label; acoustics(t,f)) ) • Acoustic events with similar importance in all languages, and across all speaking styles • Acoustic events that can be detected even in extremely noisy environments • Syllable Onset ≈ Consonant Release • Syllable Nucleus ≈ Vowel Center • Syllable Coda ≈ Consonant Closure Where do these things happen? I(phone;acoustics) experiment: Hasegawa-Johnson, 2000

Landmark-Based Speech Recognition Lattice hypothesis 3 backed up 已5 Words Times Scores Pronunciation 0 Variants W卜 200 backed up 100 backup 02 0.4 06 0.8 T back up ONSET ONSET backt ihp Syllable NUCLEUS UCLEUS wack ihp Structure CODA CODA

Landmark-Based Speech Recognition ONSET NUCLEUS CODA NUCLEUS CODA ONSET Pronunciation Variants: … backed up … … backtup .. … back up … … backt ihp … … wackt ihp… … Lattice hypothesis: … backed up … Syllable Structure Scores Words Times

Talk outline Overview 1. Acoustic Modeling Speech data and acoustic features Landmark detection Estimation of real-valued"distinctive features" using support vector machines(SVM 2. Pronunciation Modeling A Dynamic Bayesian network(DBn)implementation of Articulatory Phonology A Discriminative Pronunciation model implemented using Maximum Entropy(MaxEnt) 3. Technological Evaluation Rescoring of word lattice output from an hMm-based recognizer New errors that we caused: Pronunciation models trained on 3 hours can't compete with triphone models trained on 3000 hours Future plans

Talk Outline Overview 1. Acoustic Modeling – Speech data and acoustic features – Landmark detection – Estimation of real-valued “distinctive features” using support vector machines (SVM) 2. Pronunciation Modeling – A Dynamic Bayesian network (DBN) implementation of Articulatory Phonology – A Discriminative Pronunciation model implemented using Maximum Entropy (MaxEnt) 3. Technological Evaluation – Rescoring of word lattice output from an HMM-based recognizer – Errors that we fixed: Channel noise, Laughter, etcetera – New errors that we caused: Pronunciation models trained on 3 hours can’t compete with triphone models trained on 3000 hours. – Future Plans

Overview History Research described in this talk was performed between June 30 and August 17, 2004, at the Johns Hopkins summer workshop WS04 Scientific goal To use high-dimensional machine learning technologies (SVM, DBn to create representations capable of learning from data, the types of speech knowledge that humans exhibit in psychophysical speech perception experiments Technological Goal Long-term: To create a better speech recognizer Short-term: lattice rescoring, applied to word lattices produced by SrIs nn/hmm hybrid

Overview • History – Research described in this talk was performed between June 30 and August 17, 2004, at the Johns Hopkins summer workshop WS04 • Scientific Goal – To use high-dimensional machine learning technologies (SVM, DBN) to create representations capable of learning, from data, the types of speech knowledge that humans exhibit in psychophysical speech perception experiments • Technological Goal – Long-term: To create a better speech recognizer – Short-term: lattice rescoring, applied to word lattices produced by SRI’s NN/HMM hybrid

点击进入文档下载页（PPT格式）

共40页，可试读14页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学技术大学：《现代密码学理论与实践》课程教学资源（PPT课件讲稿）第9章公钥密码学与RSA
中国科学技术大学：《数据结构及其算法》课程电子教案（PPT课件讲稿）第六章二叉树和树
计算机外设及电源故障处理（PPT课件讲稿）
《计算机系统结构》课程教学资源（PPT课件讲稿）第三章流水线技术
四川大学：《Java面向对象编程》课程PPT教学课件（Object-Oriented Programming - Java）Unit 1.2 Designing Classes
软件开发环境与工具的选用（PPT课件讲稿）Select software development tool
电子科技大学：《微机原理与接口技术》课程教学资源（PPT实验讲稿，习友宝）
北京师范大学：《多媒体技术与网页制作》课程教学资源（PPT课件）数字音频技术
清华大学出版社：《C语言程序设计》课程教学资源（PPT课件讲稿，共十二章，田丽华、岳俊华、孙颖馨）
《算法设计与分析》课程教学资源（PPT讲稿）第十五讲 NP完全性理论与近似算法
西安电子科技大学：《现代密码学》课程教学资源（PPT课件讲稿）第八章密钥分配与密钥管理
河南中医药大学（河南中医学院）：《计算机网络》课程教学资源（PPT课件讲稿）第二章物理层（阮晓龙）
《微型计算机原理及应用》课程教学资源（PPT课件讲稿）第2章微处理器
《计算机网络》课程教学资源（PPT课件讲稿）第六章 IP路由
Urandaline Investments The Perils of Down Under：Chinese Investment in Australia
四川大学：《数据库技术》课程教学资源（PPT课件讲稿）第1章数据库技术概论
《数据结构》课程教学资源（PPT课件讲稿）第四章串
西安电子科技大学：《Mobile Programming》课程PPT教学课件（Android Programming）Lecture 7 数据持久化 Data Persistence
《轻松学习C语言》教学资源（PPT课件讲稿，繁体版，共十二章）
《计算机组装维修及实训教程》课程教学资源（PPT课件）第2章中央处理器
《操作系统》课程教学资源（PPT课件）第六章设备管理 Devices Management
《编译原理》课程教学资源（PPT课件讲稿）第三章语法分析
Object-Oriented Programming（Java）
Threads, SMP, and MicroKernels

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录