Chapter 5: Mining Frequent Patterns, Association and Correlations: Basic Concepts and Methods ■ Basic concepts Frequent itemset Mining methods Which Patterns Are Interesting?Pattern Evaluation methods Summary
1 Chapter 5: Mining Frequent Patterns, Association and Correlations: Basic Concepts and Methods ◼ Basic Concepts ◼ Frequent Itemset Mining Methods ◼ Which Patterns Are Interesting?—Pattern Evaluation Methods ◼ Summary
What Is Frequent Pattern Analysis? Frequent pattern a pattern(a set of items subsequences substructures etc. that occurs frequently in a data set First proposed by agrawal, Imielinski, and Swami [ais93] in the context of frequent itemsets and association rule mining Motivation Finding inherent regularities in data What products were often purchased together? Beer and diapers? What are the subsequent purchases after buying a pc? What kinds of dna are sensitive to this new drug? Can we automatically classify web documents? Applications Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log ( click stream) analysis and dna sequence analysis
2 What Is Frequent Pattern Analysis? ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining ◼ Motivation: Finding inherent regularities in data ◼ What products were often purchased together?— Beer and diapers?! ◼ What are the subsequent purchases after buying a PC? ◼ What kinds of DNA are sensitive to this new drug? ◼ Can we automatically classify web documents? ◼ Applications ◼ Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis
Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Exam ple of Association Rules TD ltems Bread. milk [Diaper> Beer), MIlk, Bread]>[Eggs, Coke), Bread, Diaper, Beer, Eggs Beer, Bread>(Milk 345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, beer Implication means co-occurrence Bread, Milk, Diaper, Coke not causality! 3
3 Association Rule Mining ◼ Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper} → {Beer}, {Milk, Bread} → {Eggs,Coke}, {Beer, Bread} → {Milk}, Implication means co-occurrence, not causality!
Why Is Freq Pattern Mining Important? Freq pattern An intrinsic and important property of datasets Foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural (e. g, sub-graph) patterns Pattern analysis in spatiotemporal, multimedia, time series, and stream data Classification: discriminative, frequent pattern analysis Cluster analysis: frequent pattern-based clustering Data warehousing iceberg cube and cube-gradient Semantic data compression fascicles Broad applications
4 Why Is Freq. Pattern Mining Important? ◼ Freq. pattern: An intrinsic and important property of datasets ◼ Foundation for many essential data mining tasks ◼ Association, correlation, and causality analysis ◼ Sequential, structural (e.g., sub-graph) patterns ◼ Pattern analysis in spatiotemporal, multimedia, timeseries, and stream data ◼ Classification: discriminative, frequent pattern analysis ◼ Cluster analysis: frequent pattern-based clustering ◼ Data warehousing: iceberg cube and cube-gradient ◼ Semantic data compression: fascicles ◼ Broad applications
Basic Concepts: Frequent Patterns id Items bought a itemset: a set of one or more Beer, Nuts, Diaper items 20 Beer, Coffee, Diaper k- itemset x={x1…,X} 30 Beer, Diaper, Eggs absolute) support, or, support 40 Nuts, Eggs, Milk count of X: Frequency or 50Nuts, Coffee, Diaper, Eggs, Milk occurrence of an itemset x Customer Customer (relative)support, s, is the buys both buys diaper fraction of transactions that contains X(i.e. the probability that a transaction contains X) An itemset X is frequent if Xs support is no less than a minsup Customer threshold buys beer 5
5 Basic Concepts: Frequent Patterns ◼ itemset: A set of one or more items ◼ k-itemset X = {x1 , …, xk} ◼ (absolute) support, or, support count of X: Frequency or occurrence of an itemset X ◼ (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) ◼ An itemset X is frequent if X’s support is no less than a minsup threshold Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk