Classification Naive Bayes Naive bayes: Classifying positions<all word positions in current document which contain tokens found in Vocabulary Return CNB, Where BNB=argmax P()P( Ic) ∈C I∈ positions
Classification 16 Naive Bayes: Classifying ▪ positions all word positions in current document which contain tokens found in Vocabulary ▪ Return cNB, where = i positions j i j c C NB c argmax P(c ) P(x | c ) j Naïve Bayes
Classification Naive Bayes Naive bayes: Time complexit For document classification Training Time: O(D Lve +Cv) where Lave is the average length of a document in D Assumes all counts are pre-computed in o(D Lave time during one pass through all of the data Generally just O(D Lave)since usually CllV< D Lave Test Time: O(ICI Lt) where Lt is the average length of a test document Very efficient overall linearly proportional to the time needed to ust read in all the data 17
Classification 17 Naive Bayes: Time Complexity For document classification: ▪ Training Time: O(|D|Lave + |C||V|)) where Lave is the average length of a document in D. ▪ Assumes all counts are pre-computed in O(|D|Lave) time during one pass through all of the data. ▪ Generally just O(|D|Lave) since usually |C||V| < |D|Lave ▪ Test Time: O(|C| Lt ) where Lt is the average length of a test document. ▪ Very efficient overall, linearly proportional to the time needed to just read in all the data. Naïve Bayes
Classification Naive Bayes Underflow Prevention: using logs Multiplying lots of probabilities, which are between0 and 1 by definition, can result in floating-point underflow Since log xy =log(x)+ logl), it is better to perform all computations by summing logs of probabilities rather than multiplying probabilities Class with highest final un-normalized log probability score is still the most probable CNB argmax [log P(c, ) ∑logP(x1(c ∈C ∈ positions Note that model is now just max of sum of weights
Classification 18 Underflow Prevention: using logs ▪ Multiplying lots of probabilities, which are between 0 and 1 by definition, can result in floating-point underflow. ▪ Since log(xy) = log(x) + log(y), it is better to perform all computations by summing logs of probabilities rather than multiplying probabilities. ▪ Class with highest final un-normalized log probability score is still the most probable. ▪ Note that model is now just max of sum of weights… cNB = argmax cj C [log P(c j ) + log P(xi | c j ) ipositions ] Naïve Bayes
Classification Naive Bayes Naive Bayes classifier CNB -argmax [log P(c, ) log P(x;ci) C;∈C l∈ positions Simple interpretation Each conditional parameter log Plx, ci) is a weight that indicates how good an indicator x is for The prior log plc is a weight that indicates the relative frequency of c the sum is then a measure of how much evidence there is for the document being in the class We select the class with the most evidence for it
Classification 19 Naive Bayes Classifier ▪ Simple interpretation: Each conditional parameter log P(xi|cj ) is a weight that indicates how good an indicator xi is for cj . ▪ The prior log P(cj ) is a weight that indicates the relative frequency of cj . ▪ The sum is then a measure of how much evidence there is for the document being in the class. ▪ We select the class with the most evidence for it 19 cNB = argmax cj C [log P(c j ) + log P(xi | c j ) ipositions ] Naïve Bayes
Classification Classification methods Perceptrons Naive bayes kNN Support vector machine(svm
Classification 20 Classification Methods ▪ Perceptrons ▪ Naïve Bayes ▪ kNN ▪ Support vector machine (SVM)