Natural Language Processing 历忠毛子代枚大学 XIDIAN UNIVERSITY →Language Modeling Language Modeling (a few) ≈Text Mining Text/Document Clustering Text/Document Classification ■Topic Modeling Hierarchical topic modeling Sentiment Classification >Aspect-level sentiment classifiction Entity (Relation)Extraction ■..etc 6
2017/10/25 Software Engineering Natural Language Processing Language Modeling Language Modeling (a few) ≈ Text Mining Text/Document Clustering Text/Document Classification Topic Modeling ➢ Hierarchical topic modeling Sentiment Classification ➢ Aspect-level sentiment classifiction Entity (Relation) Extraction …etc 6
Natural Language Processing 历些毛子代拔大》 XIDIAN UNIVERSITY →Language Generation Language Generation (a few) Machine Translation Document Summarization ■Q&A(小冰,小娜) ■Poetry Generation ■News Generation Short Text Generation(sentence,weibo) ■.etc Topic Modeling
2017/10/25 Software Engineering Natural Language Processing Language Generation Language Generation (a few) Machine Translation Document Summarization Q&A (小冰,小娜) Poetry Generation News Generation Short Text Generation (sentence, weibo) …etc 7 Topic Modeling
Topic Modeling 历忠毛子代枚大学 XIDIAN UNIVERSITY Information Overloading Big Data Cloud Computing Artificial Intelligence TODAY ⊕@ Deep Learning Chinese International Travel Monitor 2015 at o glance Hoteis.com .…,etc /0 we need summarization 0m> Visualization 相微〉阳强 惠%>是°x Dimensional 签芝iii 5 Reduction 麻人
2017/10/25 Software Engineering Topic Modeling Information Overloading 8 we need summarization Visualization Dimensional Reduction Big Data Cloud Computing Artificial Intelligence Deep Learning …, etc
Background 历些毛子种枝大等 XIDIAN UNIVERSITY Dimensional Reduction(Text) Document Summarization What do these docs (or this doc)talk about? Laptop Reviews ■Sentiment Analysis What do these consumers care about or complain about? Short Text/Tweets Mining 目e1e 可 dnarents didn't coma to America all What are people discussing about? 7 2 ▣Basic tool 动1且亲月去日B7 Topic modeling:learn latent semantic topics from a corpus/ text collection
2017/10/25 Software Engineering Background Dimensional Reduction(Text) Document Summarization What do these docs (or this doc) talk about? Sentiment Analysis What do these consumers care about or complain about? Short Text/Tweets Mining What are people discussing about? 9 Basic tool Topic modeling: learn latent semantic topics from a corpus/ text collection
Topic Modeling 历忠子代枚大号 XIDIAN UNIVERSITY ▣Topic modeling an example in Chinese (from my doctorate thesis) Corpus 继续实施稳健的货币政策,保 从员额上来看,这次改革远远超 持松紧适度适时预调微调,做 过了裁军的数量,它是一种结构 好与供给侧结构,并综合运用 性的改革,是军队组织结构现代 Doc4 Do 数量、价格等多种货币政策 化的一个关键步骤 美元作为主要国际货币的地位在 独立学院从母体高校“断奶”后 可预见的将来仍无可取代,唯 可能会面临品牌、招生等方面阵 的出路是推动全球治理向更均衡 痛,但是在国家和省市鼓励民间 Doc1 的方向发展。国际货币基金组织 资本进入教育领域的实施意见发 布后,一些独立学院果断切割连 Doc2 总裁拉加德日前在美国马里兰大 学演讲时就呼吁,国际治理改革 接母体大学的“脐带”,自立门 应认清新兴经济体越来越重要这 户发展。 十现实。 10
2017/10/25 Software Engineering Topic Modeling Topic modeling an example in Chinese (from my doctorate thesis) 10 继续实施稳健的货币政策,保 持松紧适度适时预调微调,做 好与供给侧结构,并综合运用 数量、价格等多种货币政策 从员额上来看,这次改革远远超 过了裁军的数量,它是一种结构 性的改革,是军队组织结构现代 化的一个关键步骤 美元作为主要国际货币的地位在 可预见的将来仍无可取代,唯一 的出路是推动全球治理向更均衡 的方向发展。国际货币基金组织 总裁拉加德日前在美国马里兰大 学演讲时就呼吁,国际治理改革 应认清新兴经济体越来越重要这 一现实。 独立学院从母体高校“断奶”后, 可能会面临品牌、招生等方面阵 痛,但是在国家和省市鼓励民间 资本进入教育领域的实施意见发 布后,一些独立学院果断切割连 接母体大学的“脐带”,自立门 户发展。 Corpus Doc1 Doc2 Doc3 Doc4