2. Association rule discovery or market basket analysis Illustration with a real industrial example at Peugeot-Citroen car manufacturing company (Ph. d of Marie plasse Beijing, 2008
Beijing, 2008 11 2. Association rule discovery, or market basket analysis ▪ Illustration with a real industrial example at Peugeot-Citroen car manufacturing company. ▪ (Ph.D of Marie Plasse)
PSA PEUGEOT CITROEN ASSOCIATION RULES MINING Marketing target basket data analysis Basket Purchases [bread, butter, milk [bread, meat] n fruit juice, fish, strawberries, bread 90% of transactions that purchase bread and butter also purchase milk"(Agrawal et al., 1993) { bread, butter;→mik ONSERVAIOIRI Itemset A→ Itemset C where a∩C=0 antecedent consequent 3rd IASC world conference on Computational Statistics Data Analysis, Limassol, Cyprus, 28-31 October, 2005 CEDRIC
3rd IASC world conference on Computational Statistics & Data Analysis, Limassol, Cyprus, 28-31 October, 2005 ASSOCIATION RULES MINING ⚫ Marketing target : basket data analysis Basket Purchases n {fruit juice, fish, strawberries, bread} … 2 {bread, meat} 1 {bread, butter, milk} "90% of transactions that purchase bread and butter also purchase milk" (Agrawal et al., 1993) { bread, butter } {milk } antecedent consequent Itemset A Itemset C where A ∩ C = Ø
PSA PEUGEOT CITROEN bre,bner;→{mil Itemset a Itemset C antecedent consequent Reliability: Support: of transactions that contain all items of A and c Sp(A→C)=PA∩C)=P(C/A)P(A ·Supp=30%→30% of transactions contain Strength: Confidence: of transactions that contain C among the ones that contain c conf(A→C)=PC/A)= P(AC)sp(A→C (A) sup(a) ONSERVAIOIRI Conf= 90%90%of transactions that contain ") 2T tain also a 3rd IASC world conference on Computational Statistics Data Analysis, Limassol, Cyprus, 28-31 October, 2005 CEDRIC
3rd IASC world conference on Computational Statistics & Data Analysis, Limassol, Cyprus, 28-31 October, 2005 ⚫ Reliability : Support : % of transactions that contain all items of A and C sup( A C ) P( A C ) P( C / A ) P( A ) = = ⚫ Strength : Confidence : % of transactions that contain C among the ones that contain C P( A C ) sup( A C ) conf ( A C ) P( C / A) P( A) sup( A) = = = ⚫ Supp = 30 % ➔ 30% of transactions contain + + ⚫ Conf = 90 % ➔ 90% of transactions that contain + , contain also {bread, butter } { milk } antecedent Itemset A consequent Itemset C
Support:P(A⌒C Confidence: P(C/A) thresholds soet co Interesting result only if P(c/a) is much larger than p(c or P(c/not a)is low Lift P(C/A)P(C∩A) P(C) P(AP(C Beijing, 2008 14
Beijing, 2008 14 ▪ Support: P(AC) ▪ Confidence: P(C/A) ▪ thresholds s0 et c0 ▪ Interesting result only if P(C/A) is much larger than P(C) or P(C/not A) is low. ▪ Lift: ( / ) ( ) ( ) ( ) ( ) P C A P C A P C P A P C =
PSA PEUGEOT CITROEN MOTIVATION ● Industrial data: A set of vehicles described by a large set of VehiclesA12A2A2A3…AP binary flags 10010 00 0 01 Motivation: decision-making aid Always searching for a greater quality level, the 100001 00 car manufacturer can take advantage of 000 00 knowledge of associations between attributes ● Our work We are looking for patterns in data: Associations discovery ONSERVAIOIRI 3rd IASC world conference on Computational Statistics Data Analysis, Limassol, Cyprus, 28-31 October, 2005 CEDRIC
3rd IASC world conference on Computational Statistics & Data Analysis, Limassol, Cyprus, 28-31 October, 2005 MOTIVATION ⚫ Motivation : decision-making aid ⚫ Always searching for a greater quality level, the car manufacturer can take advantage of knowledge of associations between attributes. ⚫ Industrial data : ⚫ A set of vehicles described by a large set of binary flags ⚫ Our work : ⚫ We are looking for patterns in data : Associations discovery Vehicles