How to specify Test Condition? Depends on attribute types Nominal Ordinal Continuous Depends on number of ways to split 2-way split Multi-way split C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to Specify Test Condition? Depends on attribute types – Nominal – Ordinal – Continuous Depends on number of ways to split – 2-way split – Multi-way split
Splitting Based on Nominal Attributes Multi-way split: Use as many partitions as distinct values Car Type Family Luxury Sport Binary split: Divides values into two subsets Need to find optimal partitioning Car Type Car lype rSports Fami OR [Family, Luxury] Luxury [Sports C Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Splitting Based on Nominal Attributes Multi-way split: Use as many partitions as distinct values. Binary split: Divides values into two subsets. Need to find optimal partitioning. CarType Family Sports Luxury CarType {Family, Luxury} {Sports} CarType {Sports, Luxury} {Family} OR
Splitting Based on Ordinal Attributes Multi-way split: Use as many partitions as distinct values Small Large Medium Binary split: Divides values into two subsets Need to find optimal partitioning Size [Small OR (Medium,Size Medium Large) Large] SMally What about this split? (Small, Large] MMedium) C Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Multi-way split: Use as many partitions as distinct values. Binary split: Divides values into two subsets. Need to find optimal partitioning. What about this split? Splitting Based on Ordinal Attributes Size Small Medium Large Size {Medium, Large} {Small} Size {Small, Medium} {Large} OR Size {Small, Large} {Medium}
Splitting Based on Continuous Attributes Different ways of handling Discretization to form an ordinal categorical attribute Static -discretize once at the beginning o Dynamic -ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles), or clustering Binary Decision :(A<v) or(a2 v) consider all possible splits and finds the best cut e can be more compute intensive C Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Splitting Based on Continuous Attributes Different ways of handling – Discretization to form an ordinal categorical attribute ◆ Static – discretize once at the beginning ◆ Dynamic – ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles), or clustering. – Binary Decision: (A < v) or (A v) ◆ consider all possible splits and finds the best cut ◆ can be more compute intensive
Splitting Based on Continuous Attributes Taxable Taxable Income Income? >80K? <10K >80K Yes No [10K,25K)[25K,50K)[50K,80kK Binary split (i Multi-way split C Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Splitting Based on Continuous Attributes Taxable Income > 80K? Yes No Taxable Income? (i) Binary split (ii) Multi-way split < 10K [10K,25K) [25K,50K) [50K,80K) > 80K