Chapter 7. Classification: Basic Concepts Classification: Basic Concepts Decision tree induction Bayes Classification Methods Rule-Based classification Model evaluation and selection ■ Summary
1 Chapter 7. Classification: Basic Concepts ◼ Classification: Basic Concepts ◼ Decision Tree Induction ◼ Bayes Classification Methods ◼ Rule-Based Classification ◼ Model Evaluation and Selection ◼ Summary
Classification vs Prediction Classification predicts categorical class labels(discrete or nominal classifies data(constructs a model) based on the training set and the values class labels)in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i. e, predicts unknown or missing values
2 ◼ Classification ◼ predicts categorical class labels (discrete or nominal) ◼ classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data ◼ Prediction ◼ models continuous-valued functions, i.e., predicts unknown or missing values Classification vs. Prediction
Classifications Definition Given a collection of records (training set Each record contains a set of attributes one of the attributes is the class find a model for class attribute as a function of the values of other attributes Goal previously unseen records should be assigned a class as accurately as possible a test set is used to determine the accuracy of the model Usually the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
3 Classification: Definition ◼ Given a collection of records (training set ) ◼ Each record contains a set of attributes, one of the attributes is the class. ◼ Find a model for class attribute as a function of the values of other attributes. ◼ Goal: previously unseen records should be assigned a class as accurately as possible. ◼ A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
Supervised vs Unsupervised Learning Supervised learning (classification) Supervision The training data (observations measurements etc are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning(clustering) The class labels of training data is unknown Given a set of measurements observations etc with the aim of establishing the existence of classes or clusters in e data 4
4 Supervised vs. Unsupervised Learning ◼ Supervised learning (classification) ◼ Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations ◼ New data is classified based on the training set ◼ Unsupervised learning (clustering) ◼ The class labels of training data is unknown ◼ Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Prediction problems: Classification vs. Numeric Prediction Classification predicts categorical class labels(discrete or nominal) classifies data(constructs a model) based on the training set and the values (class labels)in a classifying attribute and uses it in classifying new data ■ Numeric Prediction models continuous-valued functions, i.e. predicts unknown or missing values Typical applications Credit/loan approval Medical diagnosis: if a tumor is cancerous or benign a fraud detection: if a transaction is fraudulent u Web page categorization: which category it is 5
5 ◼ Classification ◼ predicts categorical class labels (discrete or nominal) ◼ classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data ◼ Numeric Prediction ◼ models continuous-valued functions, i.e., predicts unknown or missing values ◼ Typical applications ◼ Credit/loan approval: ◼ Medical diagnosis: if a tumor is cancerous or benign ◼ Fraud detection: if a transaction is fraudulent ◼ Web page categorization: which category it is Prediction Problems: Classification vs. Numeric Prediction