The Apriori Algorithm Probably the best known algorithm Two steps Find all itemsets that have minimum support frequent itemsets, also called large itemsets) e Use frequent itemsets to generate rules a E.g., a frequent itemset cHicken, Clothes, Milk [sup= 3/71 and one rule from the frequent itemset Clothes>Milk, Chicken [sup= 3/7, conf = 3/3
16 ◼ Probably the best known algorithm ◼ Two steps: ◆ Find all itemsets that have minimum support (frequent itemsets, also called large itemsets). ◆ Use frequent itemsets to generate rules. ◼ E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes → Milk, Chicken [sup = 3/7, conf = 3/3] The Apriori Algorithm
Step 1: Mining all frequent itemsets A frequent itemset is an itemset whose support s≥ minsup. Key idea: The apriori property(downward closure property: any subsets of a frequent itemset are also frequent itemsets ABC ABD ACD BCD AB AC AD BC BD CD A B C D
17 Step 1: Mining all frequent itemsets ◼ A frequent itemset is an itemset whose support is ≥ minsup. ◼ Key idea: The apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets AB AC AD BC BD CD A B C D ABC ABD ACD BCD
The Algorithm a Iterative algo. also called level-wise search) Find all 1-item frequent itemsets then all 2-item frequent itemsets, and so on o In each iteration k, only consider itemsets that contain some k-1 frequent itemset Find frequent itemsets of size 1: F1 ■ From k=2 + Ck= candidates of size k: those itemsets of size k that could be frequent, given Fk-1 o Fk= those itemsets that are actually frequent, Fk C Ck(need to scan the database once
18 The Algorithm ◼ Iterative algo. (also called level-wise search): Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on. ◆ In each iteration k, only consider itemsets that contain some k-1 frequent itemset. ◼ Find frequent itemsets of size 1: F1 ◼ From k = 2 ◆ Ck = candidates of size k: those itemsets of size k that could be frequent, given Fk-1 ◆ Fk = those itemsets that are actually frequent, Fk Ck (need to scan the database once)