Level 1 min sup=5% computer(support= 10%] Level 2 min sup=3% laptop computetsupport =6%] desktop computer [suppot=4%] Figure 7. 4 Multilevel mining with reduced support
Figure 7.4 Multilevel mining with reduced support
Mining Multiple-Level Association Rules Items often form hierarchies Flexible support settings Items at the lower level are expected to have lower support Exploration of shared multi-level mining(agrawal Srikant@VLB95, Han Fu@VLDB95) uniform support reduced support Level l Milk min sup =5% Level 1 Support=10%1 min sup=5% Level 2 Milk Skim milk Level2 min_sup=5% [support=6%1: [support=4%1 min sup =3% 7
7 Mining Multiple-Level Association Rules ◼ Items often form hierarchies ◼ Flexible support settings ◼ Items at the lower level are expected to have lower support ◼ Exploration of shared multi-level mining (Agrawal & Srikant@VLB’95, Han & Fu@VLDB’95) uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support
Multi-level Association: Flexible Support and Redundancy filtering Flexible min-support thresholds: Some items are more valuable but less frequent Use non-uniform, group-based min-support E.g. diamond watch, camera]: 0. 05% bread milk 5%/ Redundancy filtering Some rules may be redundant due to ancestor"relationships between items milk= wheat bread [support=8%, confidence= 70%] 2 milk wheat bread [support= 2%, confidence = 72%] The first rule is an ancestor of the second rule a rule is redundant if its support is close to the expected"value based on the rule's ancestor
8 Multi-level Association: Flexible Support and Redundancy filtering ◼ Flexible min-support thresholds: Some items are more valuable but less frequent ◼ Use non-uniform, group-based min-support ◼ E.g., {diamond, watch, camera}: 0.05%; {bread, milk}: 5%; … ◼ Redundancy Filtering: Some rules may be redundant due to “ancestor” relationships between items ◼ milk wheat bread [support = 8%, confidence = 70%] ◼ 2% milk wheat bread [support = 2%, confidence = 72%] The first rule is an ancestor of the second rule ◼ A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor
Mining Multi-Dimensional Association Single-dimensional rules buys(X,"milk)= buys(X,"bread) Multi-dimensional rules:22 dimensions or predicates Inter-dimension assoc rules(no repeated predicates) age(X, 19-25)A occupation(X, student)= buys(X,"coke hybrid-dimension assoc rules(repeated predicates) age(X, 19-25)A buys(X, popcorn)= buys(X,coke") Categorical Attributes: finite number of possible values,no ordering among values--data cube approach Quantitative Attributes: Numeric, implicit ordering among valuesdiscretization, clustering and gradient approaches
9 Mining Multi-Dimensional Association ◼ Single-dimensional rules: buys(X, “milk”) buys(X, “bread”) ◼ Multi-dimensional rules: 2 dimensions or predicates ◼ Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”) occupation(X,“student”) buys(X, “coke”) ◼ hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”) ◼ Categorical Attributes: finite number of possible values, no ordering among values—data cube approach ◼ Quantitative Attributes: Numeric, implicit ordering among values—discretization, clustering, and gradient approaches
Mining Quantitative Associations Techniques can be categorized by how numerical attributes such as age or salary are treated 1. Static discretization based on predefined concept hierarchies(data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules eg Agrawal srikant@SIGMOD96 3. Clustering: Distance-based association(e.g. Yang Miller@SIGMOD97 One dimensional clustering then association 4. Deviation:(such as Aumann and Lindell@KDD99) Sex= female = Wage: mean=$7/hr(overall mean= $9)
10 Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies (data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Agrawal & Srikant@SIGMOD96) 3. Clustering: Distance-based association (e.g., Yang & Miller@SIGMOD97) ◼ One dimensional clustering then association 4. Deviation: (such as Aumann and Lindell@KDD99) Sex = female => Wage: mean=$7/hr (overall mean = $9)