Big Data Analysis and Mining Association Rule Qinpei Zhao赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall 2021/1/27
2021/1/27 1 Big Data Analysis and Mining Qinpei Zhao 赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall Association Rule
Frequent Pattern Analysis Frequent patten a pattern(a set of itemS,S subsequences, substructures, etc. )that occur frequently in a data set a First proposed by Agrawal, Imielinski, and Swami in the context of frequent itemsets and assocIation rule mining a Motivation Finding inherent regularities in data o What products were often purchased together? 口 Beer and diapers? What are the subsequent purchases after buying a PC?
Frequent Pattern Analysis 2 ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami in the context of frequent itemsets and association rule mining. ◼ Motivation: Finding inherent regularities in data ◆ What products were often purchased together? Beer and diapers? ◆ What are the subsequent purchases after buying a PC?
Association Rule Discovery a Supermarket shelf management- Market-basket model ■Goal o Identify items that are bought together by sufficiently many customers ■ Approach e Process the sales data collected with barcode scanners to find dependencies among items ■ a classic ru|e If someone buys diaper and milk, then he/she is likely to buy beer Don't be surprised if you find six-packs next to diapers!
Association Rule Discovery 3 ◼ Supermarket shelf management – Market-basket model ◼ Goal: ◆ Identify items that are bought together by sufficiently many customers ◼ Approach: ◆ Process the sales data collected with barcode scanners to find dependencies among items ◼ A classic rule: ◆ If someone buys diaper and milk, then he/she is likely to buy beer ◆ Don’t be surprised if you find six-packs next to diapers!
Applications-(1) a Items= products Baskets = sets of products someone bought in one trip to the store a Real market baskets: Chain stores keep TBs of data about what customers buy together o Tells how typical customers navigate stores, let them position tempting items Suggests tie-in tricks", e.g., run sale on diapers and raise the price of beer Need the rule to occur frequently a Amazon's people who bought X also bought Y
Applications – (1) 4 ◼ Items = products; Baskets = sets of products someone bought in one trip to the store ◼ Real market baskets: Chain stores keep TBs of data about what customers buy together ◆ Tells how typical customers navigate stores, let them position tempting items ◆ Suggests tie-in “tricks”, e.g., run sale on diapers and raise the price of beer ◆ Need the rule to occur frequently ◼ Amazon’s people who bought X also bought Y
ELSES PROPERTY PLAGIARISm ORK Applications-(2) Baskets sentences Items documents containing those sentences Items that appear together too often could represent plagiarism ◆ Notice items do not have to be“in” baskets Baskets= patients; Items drugs side-effects o has been used to detect combinations of drugs that result in particular side-effects But requires extension: Absence of an item needs to be observed as well as presence
Applications – (2) 5 ◼ Baskets = sentences; Items = documents containing those sentences ◆ Items that appear together too often could represent plagiarism ◆ Notice items do not have to be “in” baskets ◼ Baskets = patients; Items = drugs & side-effects ◆ Has been used to detect combinations of drugs that result in particular side-effects ◆ But requires extension: Absence of an item needs to be observed as well as presence