Outline Introduction Constructing a decision Tree D3 C4.5 Regression Trees CART Gradient Boosting
Outline ▪ Introduction ▪ Constructing a Decision Tree ▪ ID3 ▪ C4.5 ▪ Regression Trees ▪ CART ▪ Gradient Boosting
Constructing a Decision Tree ● Two Aspects Which attribute to choose? ■| nformation gain ENTROPY ◆ Where to stop? Termination criteria 13
13 Constructing a Decision Tree ⚫ Two Aspects ◆ Which attribute to choose? ◼ Information Gain ➢ ENTROPY ◆ Where to stop? ◼ Termination criteria
Calculation of Entropy o Entropy is a measure of uncertainty in the data Entropy(S)=∑ (i=1 to 1-1S/|S|*log2(|S|/|S ◆S= set of examples Si subset of s with value vi under the target attribute size of the range of the target attribute 14
14 Calculation of Entropy ⚫ Entropy is a measure of uncertainty in the data ◆ S = set of examples ◆ Si = subset of S with value vi under the target attribute ◆ l = size of the range of the target attribute
Entropy o Let us say, I am considering an action like a coin toss. Say, I have five coins with probabilities for heads 0, 0.25, 0.5, 0.75 and 1. When I toss them which one has highest uncertainty and which one has the least? o Information gain= Entropy of the system before split-Entropy of the system after split 15
15 Entropy ⚫ Let us say, I am considering an action like a coin toss. Say, I have five coins with probabilities for heads 0, 0.25, 0.5, 0.75 and 1. When I toss them which one has highest uncertainty and which one has the least? ⚫ Information gain= Entropy of the system before split – Entropy of the system after split
Entropy: Measure of Randomness Total entropy 1.2 08 0.6 04 0.2 0 0.2 04 0.6 08 1 16
16 Entropy: Measure of Randomness