当前位置：和泉文库 > 计算机 > 北京大学：《信息检索》课程教学资源（PPT课件讲稿）Essential Background

北京大学：《信息检索》课程教学资源（PPT课件讲稿）Essential Background

• Probability & statistics • Basic concepts in information theory • Linear Algebra

文件格式：PPT，文件大小：925.5KB，售价：11.24元

文档详细内容（约47页）

Interpretation of Bayes'Rule Hypothesis space:H=(H1,...,H) Evidence:E P(H,E)=P(EH)P(H P(E) If we want to pick the most likely hypothesis H*,we can drop P(E) Posterior probability of Hi Prior probability of Hi P(H E)P(E H)P(H) Likelihood of data/evidence if H;is true CCF-ADL at Zhengzhou University, June25-27,2010

CCF-ADL at Zhengzhou University, June 25-27, 2010 7 Interpretation of Bayes’ Rule ( ) ( | ) ( ) ( | ) P E P E H P H P H E i i i = Hypothesis space: H={H1 , …,Hn} Evidence: E If we want to pick the most likely hypothesis H*, we can drop P(E) Posterior probability of Hi Prior probability of Hi Likelihood of data/evidence if Hi is true ( | ) ( | ) ( ) P Hi E  P E Hi P Hi

Random Variable ·X:S>gR(“measure”of outcome) -E.g.,number of heads,all same face?,... Events can be defined according to X -E(X=a)={silX(si)=a} -E(X≥a)={slX(s)≥a} So,probabilities can be defined on X -P(X=a)=P(E(X=a)) -P(a≥X)=P(E(a≥X) Discrete vs.continuous random variable(think of "partitioning the sample space") CCF-ADL at Zhengzhou University, June25-27,2010

CCF-ADL at Zhengzhou University, June 25-27, 2010 8 Random Variable • X: S →  (“measure” of outcome) – E.g., number of heads, all same face?, … • Events can be defined according to X – E(X=a) = {si|X(si )=a} – E(Xa) = {si|X(si )  a} • So, probabilities can be defined on X – P(X=a) = P(E(X=a)) – P(aX) = P(E(aX)) • Discrete vs. continuous random variable (think of “partitioning the sample space”)

An Example:Doc Classification Sample Space S={x1,...,xn} For 3 topics,four words,n=? Topic the computer game baseball Conditional Probabilities: X:[sport 0 1] P(Esport Epaseball )P(Epaseball Espor) P(Esport|Ebaseball,computer力… X2:[sport 1 1] Thinking in terms of random variables X3:[computer 1 0] X:[computer 1 1 1 0] Topic:T∈{“sport'”,“computer”,“other'”}, “Baseball”:Be{0,l},… Xs:[other 0 0 1] P(T=“sp0rt”IB=1),P(B=1T=“sport”), Events An inference problem: Esport={在|topic(G)）=“spot") Suppose we observe that“base ball'”is mentioned,how likely the topic is about "sport"? Ebaseball ={xi baseball(xi )=1) P(T=“spot”IB=1)ocPB=1T=“spot”)P(T=“sport'”) Ebase ball,computer {xi baseball(xi )=1 computer(xi )=0} But,P(B=1T=“sport'”)=?,P(T=“sport'”)）=? CCF-ADL at Zhengzhou 9 University,June 25-27,2010

CCF-ADL at Zhengzhou University, June 25-27, 2010 9 An Example: Doc Classification X1 : [sport 1 0 1 1] Topic the computer game baseball X2 : [sport 1 1 1 1] X3 : [computer 1 1 0 0] X4 : [computer 1 1 1 0] X5 : [other 0 0 1 1] … … For 3 topics, four words, n=? Events Esport ={xi | topic(xi )=“sport”} Ebaseball ={xi | baseball(xi )=1} Ebaseball,computer = {xi | baseball(xi )=1 & computer(xi )=0} Sample Space S={x1 ,…, xn } Conditional Probabilities: P(Esport | Ebaseball ), P(Ebaseball|Esport), P(Esport | Ebaseball, computer ), ... An inference problem: Suppose we observe that “baseball” is mentioned, how likely the topic is about “sport”? But, P(B=1|T=“sport”)=?, P(T=“sport” )=? P(T=“sport”|B=1)  P(B=1|T=“sport”)P(T=“sport”) Thinking in terms of random variables Topic: T {“sport”, “computer”, “other”}, “Baseball”: B {0,1}, … P(T=“sport”|B=1), P(B=1|T=“sport”),

Getting to Statistics .. P(B=1T="sport")=?(parameter estimation) If we see the results of a huge number of random experiments,then P(B=1T-"sport")=cout(B=LT="sport") count(T ="sport") But,what if we only see a small sample (e.g.,2)?Is this estimate still reliable? In general,statistics has to do with drawing conclusions on the whole population based on observations of a sample(data) CCF-ADL at Zhengzhou University, 10 June25-27,2010

CCF-ADL at Zhengzhou University, June 25-27, 2010 10 Getting to Statistics ... • P(B=1|T=“sport”)=? (parameter estimation) – If we see the results of a huge number of random experiments, then – But, what if we only see a small sample (e.g., 2)? Is this estimate still reliable? • In general, statistics has to do with drawing conclusions on the whole population based on observations of a sample (data) ( " ") ( 1, " ") ( 1| " ") ˆ count T sport count B T sport P B T sport = = = = = =

Parameter Estimation 。General setting: -Given a(hypothesized probabilistic)model that governs the random experiment -The model gives a probability of any data p(D) that depends on the parameter 0 -Now,given actual sample data X={x1,..,xn},what can we say about the value of 0? Intuitively,take your best guess of 0--"best" means“best explaining/fitting the data” Generally an optimization problem CCF-ADL at Zhengzhou University, 11 June25-27,2010

CCF-ADL at Zhengzhou University, June 25-27, 2010 11 Parameter Estimation • General setting: – Given a (hypothesized & probabilistic) model that governs the random experiment – The model gives a probability of any data p(D|) that depends on the parameter  – Now, given actual sample data X={x1 ,…,xn }, what can we say about the value of ? • Intuitively, take your best guess of  -- “best” means “best explaining/fitting the data” • Generally an optimization problem

点击进入文档下载页（PPT格式）

共47页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

北京大学：《信息检索》课程教学资源（PPT课件讲稿）Retrieval Models
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Crawling the Web
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Web Search
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Course Overview（主讲：闫宏飞）
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 01 Introdution（主讲：吉建民）
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 15 智能机器人系统介绍
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 14 Reinforcement Learning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 13 神经网络与深度学习
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 09 AI Planning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 08 First-Order Logic and Inference in FOL
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 11 马尔可夫决策过程
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 10 Uncertainty and Bayesian Networks
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）文本分类 Text Categorization（主讲：刘挺）
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息过滤（主讲：刘挺）
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息检索模型 IRModel
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息检索概述
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）搜索引擎技术 SearchEngine
《统计自然语言处理》课程教学资源（PPT课件讲稿）第7章汉语自动分词与词性标注
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）01 Introduction（主讲：彭波）The CCF Advanced Disciplines Lectures
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）02 Link Analysis
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）03 Web Spam
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）04 Recommendation System
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）05 Infrastructure and Cloud
河南科技学院：信息工程学院本科课程教学大纲汇编（计算机科学与技术专业）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录