信息检索与数据挖掘 2019/4/29 18 常见模型图示 Hidden Markov Models Markoy Random Fields Bayesian Networks Decision Graphs Markoy Decision Processes Relational Probabilistic Graphical Models Graphical Causal Models
信息检索与数据挖掘 2019/4/29 18 Graphical Causal Models 常见模型图示 Decision Graphs Markov Decision Processes Relational Probabilistic Graphical Models Markov Random Fields Hidden Markov Models Bayesian Networks
信息检索与数据挖掘 2019/4/29 19 小结:什么是Graphical Model ·Graphical Model(概率图模型) Probabilistic Graphical Models (PGMs) 展开的概率图 盘式记法 X X2 XN X ·概率图理论共分为三个部分 ·Representation、Inference、Learning •概率图模型的常见类型 ·贝叶斯网络采用有向无环图(Directed Acyclic Graph) ·马尔可夫随机场则采用无向图(Undirected Graph)
信息检索与数据挖掘 2019/4/29 19 小结:什么是Graphical Model • Graphical Model(概率图模型) • Probabilistic Graphical Models (PGMs) • 概率图理论共分为三个部分 • Representation、Inference、Learning • 概率图模型的常见类型 • 贝叶斯网络采用有向无环图(Directed Acyclic Graph) • 马尔可夫随机场则采用无向图(Undirected Graph) 展开的概率图 盘式记法
信息检索与数据挖掘 2019/4/29 20 概率图及主题模型 Probabilistic Graphical Models/Topic Model .什么是Graphical Model ·定义、示例 ·Representation、Inference、Learning 。主题模型与分类 LSA (Latent Semantic Analysis),1990 pLSA(probabilistic Latent Semantic Analysis),1999 LDA(Latent Dirichlet Allocation),2003 Hierarchical Bayesian model ·主题模型的R语言实现示例
信息检索与数据挖掘 2019/4/29 20 概率图及主题模型 Probabilistic Graphical Models / Topic Model • 什么是Graphical Model • 定义、示例 • Representation、Inference、Learning • 主题模型与分类 • LSA (Latent Semantic Analysis), 1990 • pLSA (probabilistic Latent Semantic Analysis), 1999 • LDA(Latent Dirichlet Allocation), 2003 • Hierarchical Bayesian model • 主题模型的R语言实现示例
信息检索与数据挖掘 2019/4/29 21 什么是主题模型?概念示意 Topics Documents Topic proportions and assignments gene 0.04 dna 0.02 genetic 0.01 Seeking Life's Bare (Genetic)Necessities COLD SPRING HARBOR,NEW YORK- "are not all that far apart."especially in How many genes does anrenism need to comparison to the 75.CCC genes in the hu survive.Last week at the genome meeting me.notes Siv Andersson here,two genome researchers with radically different approches presented complemen er.But coming up with a come life 0.02 tary views of the hasic genes neede for life us answer may be more than iust a evolve 0.01 One research team,using comruter analy- numbers sme paicularly more organism 0.01 ses to compare known gnomes concluded more genomes are comered that toay'sorganisms can he sustained with equence.“ltma下e a way of organ2 just 250 genes,and that the earliest life forms ny newlye4hcim”explains required a mere 128 g The Arcady Mushegian,a computational mo. other researcher mapped genes lecular biologist at the Natinl Center in a simple parasite and esti for Biotechnology Information CBI) mated that for this organism. in Bethesda.Maryland.Comparing an brain 0.04 SC0 genes are plenty todothe job-but that anything short neuron 0.02 of 100 wouldn't be enough. nerve 0.01 Although the numbers don't match precisely,those predictions Genome Mapping and Sequenc. ing.Cold Spring Harbor.New York Stripping down.Computer analysis yields an esti- May 8 to 12. mate of the minimum modern and ancient genomes. data 0.02 0.02 s1ENCE·L272·14MAY1990 number computer 0.01 We assume that some number of \topics,"which are distributions over words,exist for the whole collection(far left).Each document is assumed to be generated as follows.First choose a distribution over the topics(the histogram at right);then,for each word,choose a topic assignment(the colored coins)and choose the word from the corresponding topic.The topics and topic assignments in this gure are illustrative -they are not fit from real data. Probabilistic topic models,DM Blei,Communications of the ACM,2012 Retrieved:2017.04.06 Google cited:1622
信息检索与数据挖掘 2019/4/29 21 什么是主题模型?概念示意 Probabilistic topic models, DM Blei, Communications of the ACM, 2012 Retrieved: 2017.04.06 Google cited: 1622 We assume that some number of \topics," which are distributions over words, exist for the whole collection (far left). Each document is assumed to be generated as follows. First choose a distribution over the topics (the histogram at right); then, for each word, choose a topic assignment (the colored coins) and choose the word from the corresponding topic. The topics and topic assignments in this gure are illustrative - they are not fit from real data
信息检索与数据挖掘 2019/4/29 22 什么是主题模型?例子 “Genetics” “Evolution” “Disease” “Computers'” human evolution disease computer genome evolutionary host models dna species bacteria information genetic organisms diseases data genes life resistance computers sequence origin bacterial system g gene biology new network molecular groups strains systems sequencing phylogenetic control model map living infectious parallel information diversity malaria methods 8 genetics group parasite networks 18162636465666768696 mapping new parasites software project two united new Topics sequences common tuberculosis simulations We fit a 100-topic LDA model to 17,000 articles from the journal Science.At left is the inferred topic proportions for the example article(上页图所示文章).At right are the top 15 most frequent words from the most frequent topics found in this article. Probabilistic topic models,DM Blei,Communications of the ACM,2012 Retrieved:2017.04.06 Google cited:1622
信息检索与数据挖掘 2019/4/29 22 什么是主题模型?例子 We fit a 100-topic LDA model to 17,000 articles from the journal Science. At left is the inferred topic proportions for the example article(上页图所示文章). At right are the top 15 most frequent words from the most frequent topics found in this article. Probabilistic topic models, DM Blei, Communications of the ACM, 2012 Retrieved: 2017.04.06 Google cited: 1622