Big data Analysis and mining Decision Tree Qinpei zhao赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall 2021/2/9
2021/2/9 1 Big Data Analysis and Mining Qinpei Zhao 赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall Decision Tree
Illustrating Classification Task Tid Attrib Attrib2 Attrib3 Class Learning algorithm Small Medium120 Induction Yes Medium Yes 220K No Learn 8 85K Model No Medium No Small 90K Yes Training set Model Apply Tid Attrib Attrib2 Attrib3 Class Model 12 Yes Medium 110K Deduction 14 No 15 67K est set
Illustrating Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set
Classification: Definition a Given a collection of records(training set e Each record contains a set of attributes one of the attributes is the class find a mode for class attribute as a function of the values of other attributes a Goal: previously unseen records should be assigned a class as accurately as possible atest set is used to determine the accuracy of the model Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
Classification: Definition ◼ Given a collection of records (training set ) ◆ Each record contains a set of attributes, one of the attributes is the class. ◼ Find a model for class attribute as a function of the values of other attributes. ◼ Goal: previously unseen records should be assigned a class as accurately as possible. ◆ A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc
Examples of Classification Task ◼ Predicting tumor cells as benign or malignant ◼ Classifying credit card transactions as legitimate or fraudulent ◼ Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil ◼ Categorizing news stories as finance, weather, entertainment, sports, etc
What is a Decision Tree? u An inductive learning task o Use particular facts to make more generalized conclusions aA predictive model based on a branching series of Boolean tests o These smaller boolean tests are less complex than a one-stage classifier a Let's look at a sample decision tree
◼ An inductive learning task ◆ Use particular facts to make more generalized conclusions ◼ A predictive model based on a branching series of Boolean tests ◆ These smaller Boolean tests are less complex than a one-stage classifier ◼ Let’s look at a sample decision tree… What is a Decision Tree?