Constraint-based(Query-Directed) Mining Finding all the patterns in a database autonomously?-unrealistic The patterns could be too many but not focused Data mining should be an interactive process User directs what to be mined using a data mining query language(or a graphical user interface) Constraint-based mining User flexibility: provides constraints on what to be mined Optimization explores such constraints for efficient mining constraint-based mining: constraint-pushing similar to push selection first in dB query processing Note: still find all the answers satisfying constraints not finding some answers in heuristic search 16
16 Constraint-based (Query-Directed) Mining ◼ Finding all the patterns in a database autonomously? — unrealistic! ◼ The patterns could be too many but not focused! ◼ Data mining should be an interactive process ◼ User directs what to be mined using a data mining query language (or a graphical user interface) ◼ Constraint-based mining ◼ User flexibility: provides constraints on what to be mined ◼ Optimization: explores such constraints for efficient mining — constraint-based mining: constraint-pushing, similar to push selection first in DB query processing ◼ Note: still find all the answers satisfying constraints, not finding some answers in “heuristic search
Constraints in Data Mining Knowledge type constraint: classification association etc Data constraint- using sQL-like queries find product pairs sold together in stores in Chicago this year Dimension/level constraint in relevance to region, price, brand, customer category Rule(or pattern) constraint small sales(price $10) triggers big sales(sum $200) Interestingness constraint strong rules: min_support≥3%, min confidence≥ 60% 17
17 Constraints in Data Mining ◼ Knowledge type constraint: ◼ classification, association, etc. ◼ Data constraint — using SQL-like queries ◼ find product pairs sold together in stores in Chicago this year ◼ Dimension/level constraint ◼ in relevance to region, price, brand, customer category ◼ Rule (or pattern) constraint ◼ small sales (price < $10) triggers big sales (sum > $200) ◼ Interestingness constraint ◼ strong rules: min_support 3%, min_confidence 60%
Meta-Rule Guided Mining Meta-rule can be in the rule form with partially instantiated predicates and constants PIX, X, W)=>buys(X,iPad") The resulting rule derived can be age(, 15-25) profession(X, student")=> buysX ipad") In general, it can be in the form of Q1^Q2 八八 Method to find meta-rules Find frequent(+r predicates(based on min-support threshold) Push constants deeply when possible into the mining process( see the remaining discussions on constraint-push techniques) Use confidence correlation and other measures when possible
Meta-Rule Guided Mining ◼ Meta-rule can be in the rule form with partially instantiated predicates and constants P1 (X, Y) ^ P2 (X, W) => buys(X, “iPad”) ◼ The resulting rule derived can be age(X, “15-25”) ^ profession(X, “student”) => buys(X, “iPad”) ◼ In general, it can be in the form of P1 ^ P2 ^ … ^ Pl => Q1 ^ Q2 ^ … ^ Qr ◼ Method to find meta-rules ◼ Find frequent (l+r) predicates (based on min-support threshold) ◼ Push constants deeply when possible into the mining process (see the remaining discussions on constraint-push techniques) ◼ Use confidence, correlation, and other measures when possible 18
Constraint-Based Frequent Pattern Mining Pattern space pruning constraints Anti-monotonic: If constraint c is violated its further mining can be terminated Monotonic: If c is satisfied no need to check c again Succinct c must be satisfied so one can start with the data sets satisfying c Convertible: c is not monotonic nor anti-monotonic, but it can be converted into it if items in the transaction can be properly ordered Data space pruning constraint Data succinct: Data space can be pruned at the initial pattern mIning process Data anti-monotonic: If a transaction t does not satisfy c, t can be pruned from its further mining
19 Constraint-Based Frequent Pattern Mining ◼ Pattern space pruning constraints ◼ Anti-monotonic: If constraint c is violated, its further mining can be terminated ◼ Monotonic: If c is satisfied, no need to check c again ◼ Succinct: c must be satisfied, so one can start with the data sets satisfying c ◼ Convertible: c is not monotonic nor anti-monotonic, but it can be converted into it if items in the transaction can be properly ordered ◼ Data space pruning constraint ◼ Data succinct: Data space can be pruned at the initial pattern mining process ◼ Data anti-monotonic: If a transaction t does not satisfy c, t can be pruned from its further mining
Pattern Space Pruning with Anti-Monotonicity Constraints TDB (min sup=2) A constraint C is anti-monotone if the super TID Transaction pattern satisfies C all of its sub-patterns do so 10 a, b, C, d, f too 20 b, c, d, f, g, h In other words, anti-monotonicity, If an itemset 30 a c,d e, f S violates the constraint, so does any of its 40 C, e f g superset I EX. 1. sum price) v is anti-monotone Item Profit EX. 2. range(s profit)s 15 is anti-monotone a 40 Itemset ab violates c C 20 So does every superset of ab d 10 Ex. 3. sum s Price)> v is not anti-monotone e -30 EX. 4. support count is anti-monotone: core 30 property used in Apriori 20 10120
20 Pattern Space Pruning with Anti-Monotonicity Constraints ◼ A constraint C is anti-monotone if the super pattern satisfies C, all of its sub-patterns do so too ◼ In other words, anti-monotonicity: If an itemset S violates the constraint, so does any of its superset ◼ Ex. 1. sum(S.price) v is anti-monotone ◼ Ex. 2. range(S.profit) 15 is anti-monotone ◼ Itemset ab violates C ◼ So does every superset of ab ◼ Ex. 3. sum(S.Price) v is not anti-monotone ◼ Ex. 4. support count is anti-monotone: core property used in Apriori TID Transaction 10 a, b, c, d, f 20 b, c, d, f, g, h 30 a, c, d, e, f 40 c, e, f, g TDB (min_sup=2) Item Profit a 40 b 0 c -20 d 10 e -30 f 30 g 20 h -10