Example of a Decision Tree Splitting Attributes Tid Refund Marital Taxable Status Income Cheat 1 Y Single 125K No 2No Married 100K No Refund Yes 3No Single 70K No 4 Yes Married 120K No NO MarT 5No Divorced 95K Yes Single, Divorced Married nO Married60K No 7 Y Divorced220K No TaxIng NO 8No Yes <80K >80K No Married 75K No NO YES 10No Single 90K Yes Training data Model: decision tree 11
11 Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc NO YES NO NO Yes No Single, Divorced Married < 80K > 80K Splitting Attributes Training Data Model: Decision Tree
Another Example of Decision Tree MasT Single Married Divorced d refund marital Taxable Status Income Cheat NO Refund Single 125K No Yes No Married 100K No 3No Single 70K No NO TaxIne Married 120K No <80K >80K 5No Divorced 95K Yes NO YES 6No Married 60K No Divorced 220K No 8No 85K Yes 9No Married 75K No There could be more than one tree that 10No Single 90KYes fits the same datal
12 Another Example of Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 MarSt Refund TaxInc NO YES NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data!
Decision Tree Induction: Training Dataset age income student credit rating buys computer <=30 high no faIr no <=30 high no excellent no 31. high no fair yes >40 medium no fair yes >40 W yester yes >40 W yes excellent no 31.40oW yes excellent yes <=30 medium no fair no =30 low yes fair yes 40 medium yes fair yes <=30 medium yes excellent yes 31. Medium no excellent yes 31. high affair yes >40 medium excellent no
13 Decision Tree Induction: Training Dataset age income student credit_rating buys_computer <=30 high n o fair n o <=30 high n o excellent n o 31…40 high n o fair yes >40 medium n o fair yes >40 low yes fair yes >40 low yes excellent n o 31…40 low yes excellent yes <=30 medium n o fair n o <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium n o excellent yes 31…40 high yes fair yes >40 medium n o excellent n o
Output: A Decision Tree for "buys computer a Training data set: Buys computer a The data set follows an example of Quinlan s I d3 ( Playing Tennis 口 Resulting tree age <=30 31.40 40 student? yes credit rating? no yes excellent fai no yes no yes 14
14 Output: A Decision Tree for “buys_computer” age? overcast student? credit rating? <=30 >40 no yes yes yes 31..40 no yes excellent fair ❑ Training data set: Buys_computer ❑ The data set follows an example of Quinlan’s ID3 (Playing Tennis) ❑ Resulting tree:
Decision Tree classification task Tid Attrib1 Attrib2 Attrib3 Class Tree 125K Induction algorithm Small 120K N Induction 95K Medium 60K Y Large Learn 8 N Small 85K Model Medium 75K 10No Sma Yes Training set Model Apply Decision Model Tid Attrib1 Attrib2 Attrib3 Class Tree Small 12 Yes edium Deduction 110K Small 15 No Large 67K Test set 15
15 Decision Tree Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Tree Induction algorithm Training Set Decision Tree