Frequent Itemsets and Association rules Association Rules Scale of the problem WalMart sells 100,000 items and can store billions of baskets The Web has billions of words and many billions of pages. 11
Frequent Itemsets and Association Rules 11 Scale of the Problem ▪ WalMart sells 100,000 items and can store billions of baskets. ▪ The Web has billions of words and many billions of pages. Association Rules
Frequent Itemsets and Association rules Association Rules AssociatⅰonRu|es If-then rules about the contents of baskets 1,2…,}→ means:" if a basket contains all of 1., iK then it is likely to contain j Confidence of this association rule is the probability of j given i1灬…ik
Frequent Itemsets and Association Rules 12 Association Rules ▪ If-then rules about the contents of baskets. ▪ {i1 , i2 ,…,ik } → j means: “if a basket contains all of i1 ,…,ik then it is likely to contain j.” ▪ Confidence of this association rule is the probability of j given i1 ,…,ik . Association Rules
Frequent Itemsets and Association rules Association Rules Example: Confidence +B1={m,cb} B,=m, p, jl B3=m, by B4=c, jl B5={m,p,b} +b6=m, c,b, j] B,=ic, b,jl B8=ib, cN An association rule: m, b-c Confidence =2/4=50%
Frequent Itemsets and Association Rules 13 Example: Confidence B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} ▪ An association rule: {m, b} → c. ▪ Confidence = 2/4 = 50%. + _ _ + Association Rules
Frequent Itemsets and Association rules Association Rules Finding Association Rules Question: " find all association rules with support 2 s and confidence≥c. Note: " support"of an association rule is the support of the set of items on the left Hard part: finding the frequent itemsets Note: if ( i, i,,i-j has high support and confidence, then both{i1,i2…,ik}and{i,i2… ik,) will be frequent
Frequent Itemsets and Association Rules 14 Finding Association Rules ▪ Question: “find all association rules with support ≥ s and confidence ≥ c .” ▪ Note: “support” of an association rule is the support of the set of items on the left. ▪ Hard part: finding the frequent itemsets. ▪ Note: if {i1 , i2 ,…,ik } → j has high support and confidence, then both {i1 , i2 ,…,ik } and {i1 , i2 ,…,ik ,j } will be “frequent.” Association Rules
Frequent Itemsets and Association rules Association Rules Computation Model Typically, data is kept in flat files rather than in a database system Stored on disk Stored basket-by-basket Expand baskets into pairs triples, etc. as you read baskets Use k nested loops to generate all sets of size k 15
Frequent Itemsets and Association Rules 15 Computation Model ▪ Typically, data is kept in flat files rather than in a database system. ▪ Stored on disk. ▪ Stored basket-by-basket. ▪ Expand baskets into pairs, triples, etc. as you read baskets. ▪ Use k nested loops to generate all sets of size k. Association Rules