2.Naive Bayes Given training data X,posteriori probability of a hypothesis H, P(H|X),follows the Bayes theorem P(HIX)=PXH)P(H) PX) Predicts X belongs to C>iff the probability P(C2|X)is the highest among all the P(CkX)for all the k classes ■ Practical difficulty:require initial knowledge of many probabilities,significant computational cost
2. Naïve Bayes Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem Predicts X belongs to C2 iff the probability P(C2 |X) is the highest among all the P(Ck |X) for all the k classes Practical difficulty: require initial knowledge of many probabilities, significant computational cost ( ) ( | ) ( ) ( | ) X X X P P H P H P H
Class Conditional independent P(XIC)=ΠP(xk|C) k=1 n P(CX)= RTR ).9 k=1 P(X P(X argmaxP(CilX)=P(C)II P(XilCi) k=1
n k xk Ci C P i P X 1 ( | ) ( | ) ( ) 1 ( ) ( | ) ( ) ( | ) ( ) ( | ) P X n k Ci i P X i P C P X i P C i P X C X i P C Class Conditional independent n k Ci i P X i X P C i P C i 1 argmax ( | ) ( ) ( | )
Case Study Spam Email Problem:classifying documents by their content: whether a document is a spam email or a non-spam email? Namely,what is the probability that a given document D belongs to a given class C?"In other words,what isPr(CD)? For spam email investigation,by Bayes'theorem,we have Pr(SID)=Pr(S)Pr(DIS) Pr(D) Pr(SD)=Pr(S)Pr(DI) Pr(D) where S means class of spam email and Se is the class of normal email
Case Study Spam Email Problem: classifying documents by their content: whether a document is a spam email or a non-spam email? For spam email investigation, by Bayes' theorem, we have Namely, what is the probability that a given document D belongs to a given class C?“ In other words, what is ? Pr( ) Pr( )Pr( | ) Pr( | ) Pr( ) Pr( )Pr( | ) Pr( | ) D S D S S D D S D S S D C C C Pr(C | D) where S means class of spam email and Sc is the class of normal email
The problem is transferred to determine which posterior probability is much higher? Pr(S )Pr(DSi) argmax Pr(SD)= Pr(D) Since Pr(D)is a constant and is not relevant to S,,the equation can be further written as: argmaxPr(S,)Pr(D S,) The most common format Given a document D,we can then use this formulas to determine whether it is a spam email or not
The problem is transferred to determine which posterior probability is much higher? Pr( ) Pr( )Pr( | ) argmaxPr( | ) D S D S S D j j j j Since is a constant and is not relevant to Sj , the equation can be further written as: Pr(D) argmaxPr( )Pr( | ) j j j S D S Given a document D, we can then use this formulas to determine whether it is a spam email or not. The most common format
To compute the posterior probability,we must first compute the prior probability Pr(S,)and the conditional probability Pr(DS,) Suppose we have already known the class information(spam or non-spam)of some emails (which are called as "training data"). The Pr(S,)can be easily obtained to compute based on the training data. Pr(S)= spam #total Pr(Se)=1-#spam #total For Pr(DS),it can be computed as follows.As each document can be modelled as sets of words,the probability that a given document occurs in a document from class S,can be written as
To compute the posterior probability, we must first compute the prior probability and the conditional probability Pr( ) Sj Pr( | ) D Sj Suppose we have already known the class information (spam or non-spam) of some emails (which are called as “training data”). The can be easily obtained to compute based on the training data. Pr( ) Sj For , it can be computed as follows. As each document can be modelled as sets of words, the probability that a given document occurs in a document from class Sj can be written as Pr( | ) D Sj total spam S total spam S C # # Pr( ) # # Pr( ) 1