Frequent Itemsets and Association rules Association Rules File Organization Item Item Basket 1 Example: items are Item positive integers, and boundaries Item Basket 2 between baskets It are-1 Item Basket 3 Item Etc 16
Frequent Itemsets and Association Rules 16 File Organization Item Item Item Item Item Item Item Item Item Item Item Item Basket 1 Basket 2 Basket 3 Etc. Example: items are positive integers, and boundaries between baskets are –1. Association Rules
Frequent Itemsets and Association rules Association Rules Computation Model -(2) The true cost of mining disk-resident data is usually the number of disk l/os In practice, association-rule algorithms read the data in passes - all baskets read in turn Thus, we measure the cost by the number of passes an algorithm takes
Frequent Itemsets and Association Rules 17 Computation Model – (2) ▪ The true cost of mining disk-resident data is usually the number of disk I/O’ s. ▪ In practice, association-rule algorithms read the data in passes – all baskets read in turn. ▪ Thus, we measure the cost by the number of passes an algorithm takes. Association Rules
Frequent Itemsets and Association rules Association Rules Main-Memory Bottleneck or many frequent-itemset algorithms, main memory is the critical resource As we read baskets, we need to count something e g occurrences of pairs The number of different things we can count is limited by main memory. Swapping counts in/out is a disaster(why? 18
Frequent Itemsets and Association Rules 18 Main-Memory Bottleneck ▪ For many frequent-itemset algorithms, main memory is the critical resource. ▪ As we read baskets, we need to count something, e.g., occurrences of pairs. ▪ The number of different things we can count is limited by main memory. ▪ Swapping counts in/out is a disaster (why?). Association Rules
Frequent Itemsets and Association rules Association Rules Finding Frequent Pairs The hardest problem often turns out to be finding the frequent pairs Why? Often frequent pairs are common frequent triples are rare Why Probability of being frequent drops exponentially with size; number of sets grows more slowly with size We'll concentrate on pairs then extend to larger sets 19
Frequent Itemsets and Association Rules 19 Finding Frequent Pairs ▪ The hardest problem often turns out to be finding the frequent pairs. ▪ Why? Often frequent pairs are common, frequent triples are rare. ▪ Why? Probability of being frequent drops exponentially with size; number of sets grows more slowly with size. ▪ We’ll concentrate on pairs, then extend to larger sets. Association Rules
Frequent Itemsets and Association rules Association Rules Naive algorithm Read file once, counting in main memory the occurrences of each pair. From each basket of n items generate its n(n-1)2 pairs by two nested loops Fails if(#items 2 exceeds main memory Remember: #items can be 100K(Wal-Mart)or 10B (Web pages) 20
Frequent Itemsets and Association Rules 20 Naïve Algorithm ▪ Read file once, counting in main memory the occurrences of each pair. ▪ From each basket of n items, generate its n (n -1)/2 pairs by two nested loops. ▪ Fails if (#items)2 exceeds main memory. ▪ Remember: #items can be 100K (Wal-Mart) or 10B (Web pages). Association Rules