The apriori algorithm RSBCh Candidate itemset of size k seudo-code Lk frequent itemset of size k L,=frequent items], to Ck+z= candidates generated from LK (=1; Lk!=0;k++)do begin for each transaction tin database do increment the count of all candidates in Ck+l hat are contained in t ck+1 candidates in Ck+, with min_support end return∪hL February 4, 2021 Data Mining: Concepts and Techniques 13
February 4, 2021 Data Mining: Concepts and Techniques 13 The Apriori Algorithm ◼ Pseudo-code: Ck : Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk ; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk ;
Important Details of Apriori a How to generate candidates? Step 1: self-joining Lk Step 2: pruning How to count supports of candidates? Example of Candidate-generation L3abc abd, acd, ace, bcdy Self-joining: L3 L abcd from abc and abd adde from add and ace Pruning: acde is removed because ade is not in L3 Abcd February 4, 2021 Data Mining: Concepts and Techniques 14
February 4, 2021 Data Mining: Concepts and Techniques 14 Important Details of Apriori ◼ How to generate candidates? ◼ Step 1: self-joining Lk ◼ Step 2: pruning ◼ How to count supports of candidates? ◼ Example of Candidate-generation ◼ L3={abc, abd, acd, ace, bcd} ◼ Self-joining: L3*L3 ◼ abcd from abc and abd ◼ acde from acd and ace ◼ Pruning: ◼ acde is removed because ade is not in L3 ◼ C4={abcd}
How to generate Candidates Suppose the items in Lk-, are listed in an order Step 1: self-joining Lk-1 insert into CK select p, item p item ay p.itemk-y g itemk-1 from Lk- p, lk-i9 where p item,=g. itemy my pitem -=g. itemk-ypitem- quitemk-1 Step 2: pruning forall itemsets c in Ci do forall (k-1)-subsets s of cdo if (s is not in Lk- then delete cfrom CK February 4, 2021 Data Mining: Concepts and Techniques 15
February 4, 2021 Data Mining: Concepts and Techniques 15 How to Generate Candidates? ◼ Suppose the items in Lk-1 are listed in an order ◼ Step 1: self-joining Lk-1 insert into Ck select p.item1 , p.item2 , …, p.itemk-1 , q.itemk-1 from Lk-1 p, Lk-1 q where p.item1=q.item1 , …, p.itemk-2=q.itemk-2 , p.itemk-1 < q.itemk-1 ◼ Step 2: pruning forall itemsets c in Ck do forall (k-1)-subsets s of c do if (s is not in Lk-1 ) then delete c from Ck
How to Count Supports of Candidates? Why counting supports of candidates a problem? The total number of candidates can be very huge One transaction may contain many candidates Method Candidate itemsets are stored in a hash-tree Leaf node of hash-tree contains a list of itemsets and counts Interior node contains a hash table Subset function: finds all the candidates contained in a transaction February 4, 2021 Data Mining: Concepts and Techniques 16
February 4, 2021 Data Mining: Concepts and Techniques 16 How to Count Supports of Candidates? ◼ Why counting supports of candidates a problem? ◼ The total number of candidates can be very huge ◼ One transaction may contain many candidates ◼ Method: ◼ Candidate itemsets are stored in a hash-tree ◼ Leaf node of hash-tree contains a list of itemsets and counts ◼ Interior node contains a hash table ◼ Subset function: finds all the candidates contained in a transaction
Example: Counting supports of Candidates Subset function Transaction: 12356 3,6,9 2.5.8 1+2356 13+56- 234 567 145 136 345356367 12+356 357368 124 457 25159 458 February 4, 2021 Data Mining: Concepts and Techniques 17
February 4, 2021 Data Mining: Concepts and Techniques 17 Example: Counting Supports of Candidates 1,4,7 2,5,8 3,6,9 Subset function 2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 1 2 5 4 5 8 1 5 9 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 Transaction: 1 2 3 5 6 1 + 2 3 5 6 1 2 + 3 5 6 1 3 + 5 6