Typical Applications Credit approval Classes can be high risk Low risk? 罐 Target marketing What are the classes? Medical diagnosis Classes can be customers with different diseases w Treatment effectiveness analysIs Classes can be patience with different degrees of recovery
6 Typical Applications Credit approval. – Classes can be High Risk, Low Risk? Target marketing. – What are the classes? Medical diagnosis – Classes can be customers with different diseases. Treatment effectiveness analysis. – Classes can be patience with different degrees of recovery
Techniques for Discoveirng of Classification Rules s The k-Nearest Neighbor Algorithm s The linear discriminant function s The Bayesian Approach s The decision tree approach The Neural Network approach s The genetic algorithm approach
7 Techniques for Discoveirng of Classification Rules The k-Nearest Neighbor Algorithm. The Linear Discriminant Function. The Bayesian Approach. The Decision Tree approach. The Neural Network approach. The Genetic Algorithm approach
Example Using The K-NN Algorithm Salary Age Insurance 15K 28 Bt 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Bu 25K 27 Not buy 42K 32 Not Buy 18K 38 Not bur 33K 44 Not Buy John earns 24K per month and is 42 years old Will he buy insurance?
8 Example Using The k-NN Algorithm Salary Age Insurance 15K 28 Buy 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Buy 25K 27 Not Buy 42K 32 Not Buy 18K 38 Not Buy 33K 44 Not Buy John earns 24K per month and is 42 years old. Will he buy insurance?
The k-Nearest Neighbor Algorithm All data records correspond to points in the n Dimensional space Nearest neighbor defined in terms of Euclidean distance s k-nn returns the most common class label among k training examples nearest to xq
9 The k-Nearest Neighbor Algorithm All data records correspond to points in the nDimensional space. Nearest neighbor defined in terms of Euclidean distance. k-NN returns the most common class label among k training examples nearest to xq. . _ + _ xq + _ _ + _ _ +
The K-NN Algorithm(2) N k-nn can be for continuous-valued labels Calculate the mean values of the k nearest neighbors w Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point x a Advantage X Robust to noisy data by averaging k-nearest neighbors 罐 Disadvantage Distance between neighbors could be dominated by irrelevant attributes 10
10 The k-NN Algorithm (2) k-NN can be for continuous-valued labels. – Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm – Weight the contribution of each of the k neighbors according to their distance to the query point xq Advantage: – Robust to noisy data by averaging k-nearest neighbors Disadvantage: – Distance between neighbors could be dominated by irrelevant attributes. w d x q x i 1 2 ( , )