Interpretation of Bayes'Rule Hypothesis space:H=(H1,...,H) Evidence:E P(H,E)=P(EH)P(H P(E) If we want to pick the most likely hypothesis H*,we can drop P(E) Posterior probability of Hi Prior probability of Hi P(H E)P(E H)P(H) Likelihood of data/evidence if H;is true CCF-ADL at Zhengzhou University, June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 7 Interpretation of Bayes’ Rule ( ) ( | ) ( ) ( | ) P E P E H P H P H E i i i = Hypothesis space: H={H1 , …,Hn} Evidence: E If we want to pick the most likely hypothesis H*, we can drop P(E) Posterior probability of Hi Prior probability of Hi Likelihood of data/evidence if Hi is true ( | ) ( | ) ( ) P Hi E P E Hi P Hi
Random Variable ·X:S>gR(“measure”of outcome) -E.g.,number of heads,all same face?,... Events can be defined according to X -E(X=a)={silX(si)=a} -E(X≥a)={slX(s)≥a} So,probabilities can be defined on X -P(X=a)=P(E(X=a)) -P(a≥X)=P(E(a≥X) Discrete vs.continuous random variable(think of "partitioning the sample space") CCF-ADL at Zhengzhou University, June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 8 Random Variable • X: S → (“measure” of outcome) – E.g., number of heads, all same face?, … • Events can be defined according to X – E(X=a) = {si|X(si )=a} – E(Xa) = {si|X(si ) a} • So, probabilities can be defined on X – P(X=a) = P(E(X=a)) – P(aX) = P(E(aX)) • Discrete vs. continuous random variable (think of “partitioning the sample space”)
An Example:Doc Classification Sample Space S={x1,...,xn} For 3 topics,four words,n=? Topic the computer game baseball Conditional Probabilities: X:[sport 0 1] P(Esport Epaseball )P(Epaseball Espor) P(Esport|Ebaseball,computer力… X2:[sport 1 1] Thinking in terms of random variables X3:[computer 1 0] X:[computer 1 1 1 0] Topic:T∈{“sport'”,“computer”,“other'”}, “Baseball”:Be{0,l},… Xs:[other 0 0 1] P(T=“sp0rt”IB=1),P(B=1T=“sport”), Events An inference problem: Esport={在|topic(G))=“spot") Suppose we observe that“base ball'”is mentioned,how likely the topic is about "sport"? Ebaseball ={xi baseball(xi )=1) P(T=“spot”IB=1)ocPB=1T=“spot”)P(T=“sport'”) Ebase ball,computer {xi baseball(xi )=1 computer(xi )=0} But,P(B=1T=“sport'”)=?,P(T=“sport'”))=? CCF-ADL at Zhengzhou 9 University,June 25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 9 An Example: Doc Classification X1 : [sport 1 0 1 1] Topic the computer game baseball X2 : [sport 1 1 1 1] X3 : [computer 1 1 0 0] X4 : [computer 1 1 1 0] X5 : [other 0 0 1 1] … … For 3 topics, four words, n=? Events Esport ={xi | topic(xi )=“sport”} Ebaseball ={xi | baseball(xi )=1} Ebaseball,computer = {xi | baseball(xi )=1 & computer(xi )=0} Sample Space S={x1 ,…, xn } Conditional Probabilities: P(Esport | Ebaseball ), P(Ebaseball|Esport), P(Esport | Ebaseball, computer ), ... An inference problem: Suppose we observe that “baseball” is mentioned, how likely the topic is about “sport”? But, P(B=1|T=“sport”)=?, P(T=“sport” )=? P(T=“sport”|B=1) P(B=1|T=“sport”)P(T=“sport”) Thinking in terms of random variables Topic: T {“sport”, “computer”, “other”}, “Baseball”: B {0,1}, … P(T=“sport”|B=1), P(B=1|T=“sport”),
Getting to Statistics .. P(B=1T="sport")=?(parameter estimation) If we see the results of a huge number of random experiments,then P(B=1T-"sport")=cout(B=LT="sport") count(T ="sport") But,what if we only see a small sample (e.g.,2)?Is this estimate still reliable? In general,statistics has to do with drawing conclusions on the whole population based on observations of a sample(data) CCF-ADL at Zhengzhou University, 10 June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 10 Getting to Statistics ... • P(B=1|T=“sport”)=? (parameter estimation) – If we see the results of a huge number of random experiments, then – But, what if we only see a small sample (e.g., 2)? Is this estimate still reliable? • In general, statistics has to do with drawing conclusions on the whole population based on observations of a sample (data) ( " ") ( 1, " ") ( 1| " ") ˆ count T sport count B T sport P B T sport = = = = = =
Parameter Estimation 。General setting: -Given a(hypothesized probabilistic)model that governs the random experiment -The model gives a probability of any data p(D) that depends on the parameter 0 -Now,given actual sample data X={x1,..,xn},what can we say about the value of 0? Intuitively,take your best guess of 0--"best" means“best explaining/fitting the data” Generally an optimization problem CCF-ADL at Zhengzhou University, 11 June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 11 Parameter Estimation • General setting: – Given a (hypothesized & probabilistic) model that governs the random experiment – The model gives a probability of any data p(D|) that depends on the parameter – Now, given actual sample data X={x1 ,…,xn }, what can we say about the value of ? • Intuitively, take your best guess of -- “best” means “best explaining/fitting the data” • Generally an optimization problem