Chapter 5: Mining Frequent Patterns Association and correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysIs Constraint-based association mining Summary February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 3 Chapter 5: Mining Frequent Patterns, Association and Correlations ◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary
What Is frequent pattern analysis? Frequent pattern: a pattern(a set of items, subsequences, substructures etc. that occurs frequently in a data set First proposed by agrawal, imielinski, and Swami [ais3] in the context of frequent itemsets and association rule mining Motivation Finding inherent regularities in data What products were often purchased together?-Beer and diapers? What are the subsequent purchases after buying a pc? What kinds of DNa are sensitive to this new drug Can we automatically classify web documents? pplications Basket data analysis cross-marketing catalog design, sale campaign analysis, Web log(click stream)analysis and dNa sequence analysis February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 4 What Is Frequent Pattern Analysis? ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining ◼ Motivation: Finding inherent regularities in data ◼ What products were often purchased together?— Beer and diapers?! ◼ What are the subsequent purchases after buying a PC? ◼ What kinds of DNA are sensitive to this new drug? ◼ Can we automatically classify web documents? ◼ Applications ◼ Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis
Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural(e.g,, sub-graph) patterns Pattern analysis in spatiotemporal, multimedia time series and stream data Classification associative classification Cluster analysis: frequent pattern-based clustering Data warehousing iceberg cube and cube -gradient Semantic data compression fascicles Broad applications February 4, 2021 Data Mining: Concepts and Techniques 5
February 4, 2021 Data Mining: Concepts and Techniques 5 Why Is Freq. Pattern Mining Important? ◼ Discloses an intrinsic and important property of data sets ◼ Forms the foundation for many essential data mining tasks ◼ Association, correlation, and causality analysis ◼ Sequential, structural (e.g., sub-graph) patterns ◼ Pattern analysis in spatiotemporal, multimedia, timeseries, and stream data ◼ Classification: associative classification ◼ Cluster analysis: frequent pattern-based clustering ◼ Data warehousing: iceberg cube and cube-gradient ◼ Semantic data compression: fascicles ◼ Broad applications
Basic Concepts: Frequent Patterns and Association rules Transaction-id Items boug Itemset X={X1…,} A.B. D Find all the rules x>with minimum 20 A.C. D support and confidence 30 A,DE pport s, probability that a 40 B,EF transaction contains xu y 50 B, C,,EF confidence, c conditional Customer Customer probability that a transaction DUVS DO buys diaper having X also contains r Let supmin=50%, COl 50% Freg. Pat :A: 3, B: 3, D 4 E: 3, AD: 3y Association rules. Customer A→D(60%,100% buys beer D→A(60%,75%) February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 6 Basic Concepts: Frequent Patterns and Association Rules ◼ Itemset X = {x1 , …, xk} ◼ Find all the rules X → Y with minimum support and confidence ◼ support, s, probability that a transaction contains X Y ◼ confidence, c, conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A → D (60%, 100%) D → A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F
Closed patterns and max-Patterns A long pattern contains a combinatorial number of sub- patterns e.g,{a…,a1o0 contains(10)+(10)+…+ (100)=20-1=1.27*1030 sub-patterns Solution: Mine closed patterns and max-patterns instead An itemset X is closed if X is freguent and there exists no super-patternY x, with the same support as X (proposed by pasquier, et al. ICDT99) An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern y X(proposed by Bayardo SigMOd98) Closed pattern is a lossless compression of freq. patterns Reducing the of patterns and rules February 4, 2021 Data Mining: Concepts and Techniques 7
February 4, 2021 Data Mining: Concepts and Techniques 7 Closed Patterns and Max-Patterns ◼ A long pattern contains a combinatorial number of subpatterns, e.g., {a1 , …, a100} contains (100 1 ) + (100 2 ) + … + (1 1 0 0 0 0 ) = 2 100 – 1 = 1.27*1030 sub-patterns! ◼ Solution: Mine closed patterns and max-patterns instead ◼ An itemset X is closed if X is frequent and there exists no super-pattern Y כ X, with the same support as X (proposed by Pasquier, et al. @ ICDT’99) ◼ An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y כ X (proposed by Bayardo @ SIGMOD’98) ◼ Closed pattern is a lossless compression of freq. patterns ◼ Reducing the # of patterns and rules