当前位置：和泉文库 > 计算机 > 浏览文档

Landmark-Based Speech Recognition

The Marriage of High-Dimensional Machine Learning Techniques with Modern Linguistic Representations

文件格式：PPT，文件大小：1.13MB，售价：9.12元

文档详细内容（约40页）

Overview of Systems to be described Rescoring: log-Linear score combination p(MFCC, PLPword), p(word words) First-Pass asr Word lattice Ip(SVMword word label start end times Pronunciation Model (dbn or MaxEnt) p(landmarkS) Acoustic model: svms concatenate 4-15 frames MFCC(5ms lms frame period), Formants, Phonetic auditory model Parameters

… … Acoustic Model: SVMs p(landmark|SVM) MFCC (5ms & 1ms frame period), Formants, Phonetic & Auditory Model Parameters concatenate 4-15 frames Pronunciation Model (DBN or MaxEnt) First-Pass ASR Word Lattice p(SVM|word) Rescoring: Log-Linear Score Combination p(MFCC,PLP|word), p(word|words) word label, start & end times Overview of Systems to be Described

I Acoustic Modeling Goal: Learn precise and generalizable models of the acoustic boundary associated with each distinctive feature Methods Large input vector space(many acoustic feature types) Regularized binary classifiers(SVMs) SVM outputs"smoothed" using dynamic programming SVM outputs converted to posterior probabi estimates once/5ms using histogram

I. Acoustic Modeling • Goal: Learn precise and generalizable models of the acoustic boundary associated with each distinctive feature. • Methods: – Large input vector space (many acoustic feature types) – Regularized binary classifiers (SVMs) – SVM outputs “smoothed” using dynamic programming – SVM outputs converted to posterior probability estimates once/5ms using histogram

Speech Databases SI Ize Phonetic Word lattices T transcr NTIMIT 14hrs manual WS96&97 3.5hrs manual SWB1 WS04 subset 12hrs auto-SRI BBN Evalo1 10hrs bbn sri rto3 Dev 6hrs SRI RTO3 Eval 6hrs SRI

Speech Databases Size Phonetic Transcr. Word Lattices NTIMIT 14hrs manual - WS96&97 3.5hrs manual - SWB1 WS04 subset 12hrs auto-SRI BBN Eval01 10hrs - BBN & SRI RT03 Dev 6hrs - SRI RT03 Eval 6hrs - SRI

Acoustic and auditory Features MFCCS, 25ms window(standard asr features) Spectral shape: energy, spectral tilt, and spectral compactness, once/millisecond Noise-robust MUSIC-based formant frequencies amplitudes, and bandwidths(zheng hasegawa Johnson, ICSLP 2004) Acoustic-phonetic parameters formant-based relative spectral measures and time-domain measures Bitar espy-Wilson, 1996) Rate-place model of neural response fields in the cat auditory cortex ( Carlyon shamma, JASA 2003)

Acoustic and Auditory Features • MFCCs, 25ms window (standard ASR features) • Spectral shape: energy, spectral tilt, and spectral compactness, once/millisecond • Noise-robust MUSIC-based formant frequencies, amplitudes, and bandwidths (Zheng & HasegawaJohnson, ICSLP 2004) • Acoustic-phonetic parameters (Formant-based relative spectral measures and time-domain measures; Bitar & Espy-Wilson, 1996) • Rate-place model of neural response fields in the cat auditory cortex (Carlyon & Shamma, JASA 2003)

What are distinctive Features? What are landmarks? · Distinctive feature= a binary partition of the phonemes (Jakobson, 1952) that compactly describes pronunciation variability (halle and correlates with distinct acoustic cues(Stevens) Landmark Change in the value of a manner Feature [+sonorant to [sonorant], [-sonorant to [+sonorant 5 manner features: Consonantal, continuant, syllabic, silence] Place and Voicing features: SVMs are only trained at landmarks Primary articulator: lips, tongue blade, or tongue body Features of primary articulator: anterior, strident Features of secondary articulator nasal, voiced

What are Distinctive Features? What are Landmarks? • Distinctive feature = – a binary partition of the phonemes (Jakobson, 1952) – … that compactly describes pronunciation variability (Halle) – … and correlates with distinct acoustic cues (Stevens) • Landmark = Change in the value of a Manner Feature – [+sonorant] to [–sonorant], [–sonorant] to [+sonorant] – 5 manner features: [consonantal, continuant, syllabic, silence] • Place and Voicing features: SVMs are only trained at landmarks – Primary articulator: lips, tongue blade, or tongue body – Features of primary articulator: anterior, strident – Features of secondary articulator: nasal, voiced

点击进入文档下载页（PPT格式）

共40页，可试读14页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学技术大学：《现代密码学理论与实践》课程教学资源（PPT课件讲稿）第9章公钥密码学与RSA
中国科学技术大学：《数据结构及其算法》课程电子教案（PPT课件讲稿）第六章二叉树和树
计算机外设及电源故障处理（PPT课件讲稿）
《计算机系统结构》课程教学资源（PPT课件讲稿）第三章流水线技术
四川大学：《Java面向对象编程》课程PPT教学课件（Object-Oriented Programming - Java）Unit 1.2 Designing Classes
软件开发环境与工具的选用（PPT课件讲稿）Select software development tool
电子科技大学：《微机原理与接口技术》课程教学资源（PPT实验讲稿，习友宝）
北京师范大学：《多媒体技术与网页制作》课程教学资源（PPT课件）数字音频技术
清华大学出版社：《C语言程序设计》课程教学资源（PPT课件讲稿，共十二章，田丽华、岳俊华、孙颖馨）
《算法设计与分析》课程教学资源（PPT讲稿）第十五讲 NP完全性理论与近似算法
西安电子科技大学：《现代密码学》课程教学资源（PPT课件讲稿）第八章密钥分配与密钥管理
河南中医药大学（河南中医学院）：《计算机网络》课程教学资源（PPT课件讲稿）第二章物理层（阮晓龙）
《微型计算机原理及应用》课程教学资源（PPT课件讲稿）第2章微处理器
《计算机网络》课程教学资源（PPT课件讲稿）第六章 IP路由
Urandaline Investments The Perils of Down Under：Chinese Investment in Australia
四川大学：《数据库技术》课程教学资源（PPT课件讲稿）第1章数据库技术概论
《数据结构》课程教学资源（PPT课件讲稿）第四章串
西安电子科技大学：《Mobile Programming》课程PPT教学课件（Android Programming）Lecture 7 数据持久化 Data Persistence
《轻松学习C语言》教学资源（PPT课件讲稿，繁体版，共十二章）
《计算机组装维修及实训教程》课程教学资源（PPT课件）第2章中央处理器
《操作系统》课程教学资源（PPT课件）第六章设备管理 Devices Management
《编译原理》课程教学资源（PPT课件讲稿）第三章语法分析
Object-Oriented Programming（Java）
Threads, SMP, and MicroKernels

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录