Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support 2 minsup threshold confidence> minconf threshold Brute-force approach List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold Brute-force approach: – List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds Computationally prohibitive!
Mining association Rules Example of Rules TID tems Bread. milk MMilk, Diaper>Beer](s=0.4, C=0.67) Bread, Diaper, Beer, eggs MMilk, Beer] >Diaper)(s=0.4, C=1.0) Milk, Diaper, beer, Coke [Diaper, Beer]->Milk(s=0.4, C=0.67) [Beer]->Milk, Diaper](s=0.4, C=0.67) Bread, Milk, Diaper, Beer [Diaper]->Milk, Beer](s=0.4, C=0.5) Bread, Milk, Diaper, Coke MMilk>Diaper, Beer)(s=0.4, C=0.5) Observations All the above rules are binary partitions of the same itemset MIlk, Diaper, Beer] Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Example of Rules: {Milk,Diaper} → {Beer} (s=0.4, c=0.67) {Milk,Beer} → {Diaper} (s=0.4, c=1.0) {Diaper,Beer} → {Milk} (s=0.4, c=0.67) {Beer} → {Milk,Diaper} (s=0.4, c=0.67) {Diaper} → {Milk,Beer} (s=0.4, c=0.5) {Milk} → {Diaper,Beer} (s=0.4, c=0.5) TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements
Mining association Rules TWo-step approach 1. Frequent Itemset Generation Generate all itemsets whose support minsup 2. Rule generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive
Frequent Itemset Generation null BD BE ABC)(ABD)(ABE)(ACD)(ACE ADE BCD BCE BDE(CDE ABCD ABCE ABDE ACDE BCDE Given d items. there are 2a possible ABCDE candidate itemsets n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets
Frequent Itemset Generation Brute-force approach Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Transactions List of Candidates TID tems Bread. milk Bread, Diaper, Beer, Eggs 2345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, beer Bread, Milk, Diaper, Coke W atch each transaction against every candidate Complexity -O(NMw)=> Expensive since M=2d! ! O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!! TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions List of Candidates M w