Outline Probability statistics Basic concepts in information theory 。Linear Algebra CCF-ADL at Zhengzhou University, 2 June25-27,2010
Outline • Probability & statistics • Basic concepts in information theory • Linear Algebra 2 CCF-ADL at Zhengzhou University, June 25-27, 2010
Essential Backgroud 1: Probability Statistics
Essential Backgroud 1: Probability & Statistics
Prob/Statistics Text Management Probability statistics provide a principled way to quantify the uncertainties associated with natural language Allow us to answer questions like: Given that we observe "baseball"three times and "game"once in a news article,how likely is it about "sports"? (text categorization, information retrieval) Given that a user is interested in sports news,how likely would the user use "baseball"in a query? (information retrieval) CCF-ADL at Zhengzhou University, 4 June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 4 Prob/Statistics & Text Management • Probability & statistics provide a principled way to quantify the uncertainties associated with natural language • Allow us to answer questions like: – Given that we observe “baseball” three times and “game” once in a news article, how likely is it about “sports”? (text categorization, information retrieval) – Given that a user is interested in sports news, how likely would the user use “baseball” in a query? (information retrieval)
Basic Concepts in Probability Random experiment:an experiment with uncertain outcome (e.g.,tossing a coin,picking a word from text) Sample space:all possible outcomes,e.g., Tossing 2 fair coins,S=[HH,HT,TH,TT} Event:ECS,E happens iff outcome is in E,e.g., E={HH}(all heads) E={HH,TT}(same face) Impossible event ({})certain event (S) 。Probability of Event:1≥P(E)≥0,s.t. -P(S)=1(outcome always in S) -P(AUB)=P(A)+P(B)if (AB)=(e.g.,A=same face,B=different CCF-ADLat University. 5 June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 5 Basic Concepts in Probability • Random experiment: an experiment with uncertain outcome (e.g., tossing a coin, picking a word from text) • Sample space: all possible outcomes, e.g., – Tossing 2 fair coins, S ={HH, HT, TH, TT} • Event: ES, E happens iff outcome is in E, e.g., – E={HH} (all heads) – E={HH,TT} (same face) – Impossible event ({}), certain event (S) • Probability of Event : 1P(E) 0, s.t. – P(S)=1 (outcome always in S) – P(A B)=P(A)+P(B) if (AB)= (e.g., A=same face, B=different face)
Basic Concepts of Prob.(cont.) .Conditional Probability P(B|A)=P(AB)/P(A) P(AOB)=P(A)P(BIA)=P(B)P(AIB) -So,P(A|B)=P(B|A)P(A)/P(B)(Bayes'Rule) For independent events,P(AB)=P(A)P(B),so P(A|B)=P(A) Total probability:If A1,...,An form a partition of S,then -P(B)=P(BOS)=P(BOA1)+...+P(BOAn)(why?) So,P(A;lB)=P(BIA )P(A:)/P(B) P(BIA)P(A )/[P(BIA)P(A)+...+P(BIA )P(A ) This allows us to compute P(AlB)based on P(B|A) CCF-ADL at Zhengzhou University, 6 June25-27,2010
CCF-ADL at Zhengzhou University, June 25-27, 2010 6 Basic Concepts of Prob. (cont.) • Conditional Probability :P(B|A)=P(AB)/P(A) – P(AB) = P(A)P(B|A) =P(B)P(A|B) – So, P(A|B)=P(B|A)P(A)/P(B) (Bayes’ Rule) – For independent events, P(AB) = P(A)P(B), so P(A|B)=P(A) • Total probability: If A1 , …, An form a partition of S, then – P(B)= P(BS)=P(BA1 )+…+P(B An ) (why?) – So, P(Ai|B)=P(B|Ai )P(Ai )/P(B) = P(B|Ai )P(Ai )/[P(B|A1 )P(A1 )+…+P(B|An )P(An )] – This allows us to compute P(Ai|B) based on P(B|Ai )