nformation gain o When an attribute a splits the set s into subsets si Information gain for attribute a Gm(S4=-1,.4=E(S)E(S) o The attribute that maximizes the information gain is selected 17
17 Information Gain ⚫ When an attribute A splits the set S into subsets Si ⚫ The attribute that maximizes the information gain is selected
nformation gain All the records at the node belong to one class o A significant majority fraction of records belong to a single class e The segment contains only one or very small number of records e The improvement is not substantial enough to warrant making the split 18
18 Information Gain ⚫ All the records at the node belong to one class ⚫ A significant majority fraction of records belong to a single class ⚫ The segment contains only one or very small number of records ⚫ The improvement is not substantial enough to warrant making the split
Outline ntroduction Constructing a Decision Tree ■|D3 C4.5 Regression Trees CART Gradient Boosting
Outline ▪ Introduction ▪ Constructing a Decision Tree ▪ ID3 ▪ C4.5 ▪ Regression Trees ▪ CART ▪ Gradient Boosting
Iterative Dichotomise 3 (ID3) ● Quinlan(1986) o Each node corresponds to a splitting attribute o Each arc is a possible value of that attribute At each node the splitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root o The algorithm uses the criterion of information gain to determine the goodness of a split The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute
20 Iterative Dichotomiser 3(ID3) ⚫ Quinlan (1986) ⚫ Each node corresponds to a splitting attribute ⚫ Each arc is a possible value of that attribute. ⚫ At each node the splitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root. ⚫ The algorithm uses the criterion of information gain to determine the goodness of a split. • The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute
Training dataset o In this dataset, there are five categorical attributes outlook temperature, humidity, windy, and play o We are interested in building a system which will enable us to decide Whether or not to play the game on the basis of the weather conditions, i.e. we wish to predict the value of play using outlook temperature, humidity and windy o We can think of the attribute we wish to predict, i.e. play, as the output attribute and the other attributes as input attributes e In this problem we have 14 examples in which: 9 examples with play= yes and 5 examples with play= no So
21 Training Dataset ⚫ In this dataset, there are five categorical attributes outlook , temperature , humidity , windy , and play. ⚫ We are interested in building a system which will enable us to decide whether or not to play the game on the basis of the weather conditions, i.e. we wish to predict the value of play using outlook, temperature, humidity and windy. ⚫ We can think of the attribute we wish to predict, i.e. play, as the output attribute, and the other attributes as input attributes. ⚫ In this problem we have 14 examples in which:9 examples with play= yes and 5 examples with play = no So