当前位置：和泉文库 > 计算机 > 浏览文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 6 Advanced Frequent Pattern Mining

◼ Pattern Mining: A Road Map ◼ Pattern Mining in Multi-Level, Multi-Dimensional Space ◼ Constraint-Based Frequent Pattern Mining ◼ Mining High-Dimensional Data and Colossal Patterns ◼ Mining Compressed or Approximate Patterns ◼ Pattern Exploration and Application ◼ Summary

文件格式：PPT，文件大小：2.25MB，售价：17.46元

共64页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约64页）

Static Discretization of quantitative Attributes Discretized prior to mining using concept hierarchy Numeric values are replaced by ranges In relational database, finding all frequent k-predicate sets will require k or k+1 table scans Data cube is well suited for mining The cells of an n-dimensional age (income) buys cuboid correspond to the predicate sets Mining from data cubes (age, income)(age, buys)(ineome, buys can be much faster (age, income, buys) 11

11 Static Discretization of Quantitative Attributes ◼ Discretized prior to mining using concept hierarchy. ◼ Numeric values are replaced by ranges ◼ In relational database, finding all frequent k-predicate sets will require k or k+1 table scans ◼ Data cube is well suited for mining ◼ The cells of an n-dimensional cuboid correspond to the predicate sets ◼ Mining from data cubes can be much faster (age) (income) () (buys) (age, income) (age,buys) (income,buys) (age,income,buys)

Quantitative Association Rules Based on Statistical Inference Theory [Aumann and LindelIODMKD'o3] Finding extraordinary and therefore interesting phenomena, e. g (Sex female)=> Wage: mean =$ /hr(overall mean =$9) LHS: a subset of the population RHS: an extraordinary behavior of this subset The rule is accepted only if a statistical test(e. g. Z-test)confirms the inference with high confidence Subrule: highlights the extraordinary behavior of a subset of the pop of the super rule E.g. ,(Sex female)( South yes)=> mean wage= $6.3/hr Two forms of rules Categorical = quantitative rules or Quantitative = quantitative rules E.g. Education in [14-18](yrs)=> mean wage = $11.64/hr Open problem Efficient methods for LHS containing two or more quantitative attributes

12 Quantitative Association Rules Based on Statistical Inference Theory [Aumann and Lindell@DMKD’03] ◼ Finding extraordinary and therefore interesting phenomena, e.g., (Sex = female) => Wage: mean=$7/hr (overall mean = $9) ◼ LHS: a subset of the population ◼ RHS: an extraordinary behavior of this subset ◼ The rule is accepted only if a statistical test (e.g., Z-test) confirms the inference with high confidence ◼ Subrule: highlights the extraordinary behavior of a subset of the pop. of the super rule ◼ E.g., (Sex = female) ^ (South = yes) => mean wage = $6.3/hr ◼ Two forms of rules ◼ Categorical => quantitative rules, or Quantitative => quantitative rules ◼ E.g., Education in [14-18] (yrs) => mean wage = $11.64/hr ◼ Open problem: Efficient methods for LHS containing two or more quantitative attributes

Negative and Rare Patterns Rare patterns: Very low support but interesting E.g. buying Rolex watches Mining: Setting individual-based or special group-based support threshold for valuable items Negative patterns Since it is unlikely that one buys ford expedition (an SUV car )and Toyota Prius (a hybrid car together, Ford Expedition and Toyota Prius are likely negatively correlated patterns Negatively correlated patterns that are infrequent tend to be more interesting than those that are frequent

13 Negative and Rare Patterns ◼ Rare patterns: Very low support but interesting ◼ E.g., buying Rolex watches ◼ Mining: Setting individual-based or special group-based support threshold for valuable items ◼ Negative patterns ◼ Since it is unlikely that one buys Ford Expedition (an SUV car) and Toyota Prius (a hybrid car) together, Ford Expedition and Toyota Prius are likely negatively correlated patterns ◼ Negatively correlated patterns that are infrequent tend to be more interesting than those that are frequent

Defining Negative Correlated Patterns( Definition 1(support-based) If itemsets X and Y are both frequent but rarely occur together, i.e sup(X U Y< sup(X) sup (Y Then X and y are negatively correlated Problem: a store sold two needle 100 packages a and b, only one transaction containing both a and B When there are in total 200 transactions we have s(AUB)=0.005,(A)*s(B)=0.25,S(AUB)<S(A)*S(B) When there are 105 transactions we have s(AUB)=1/105,S(A)*S(B)=1/103*1/103,S(AUB)>S(A)*S(B) Where is the problem?-Null transactions, i, e the support-based definition is not null-invariant 14

14 Defining Negative Correlated Patterns (I) ◼ Definition 1 (support-based) ◼ If itemsets X and Y are both frequent but rarely occur together, i.e., sup(X U Y) < sup (X) * sup(Y) ◼ Then X and Y are negatively correlated ◼ Problem: A store sold two needle 100 packages A and B, only one transaction containing both A and B. ◼ When there are in total 200 transactions, we have s(A U B) = 0.005, s(A) * s(B) = 0.25, s(A U B) < s(A) * s(B) ◼ When there are 105 transactions, we have s(A U B) = 1/105 , s(A) * s(B) = 1/103 * 1/103 , s(A U B) > s(A) * s(B) ◼ Where is the problem? —Null transactions, i.e., the support-based definition is not null-invariant!

Defining Negative Correlated Patterns (D) Definition 2(negative itemset-based) X is a negative itemset if (1X=AU B, where b is a set of positive items, and a is a set of negative items a> 1, and (2)()>u Itemsets X is negatively correlated, if X)<Is(i), where tiE X, and s(ci)is the support of ai This definition suffers a similar null-invariant problem Definition 3(Kulzynski measure-based If itemsets X and Y are frequent but(P(XY+ PYX/2< E, where e is a negative pattern threshold, then X and y are negatively correlated Ex. For the same needle package problem when no matter there are 200 or 105 transactions, ife=0.01, we have (P(A|B)+P(B|A)2=(0.01+0.01/2<E 15

15 Defining Negative Correlated Patterns (II) ◼ Definition 2 (negative itemset-based) ◼ X is a negative itemset if (1) X = Ā U B, where B is a set of positive items, and Ā is a set of negative items, |Ā|≥ 1, and (2) s(X) ≥ μ ◼ Itemsets X is negatively correlated, if ◼ This definition suffers a similar null-invariant problem ◼ Definition 3 (Kulzynski measure-based) If itemsets X and Y are frequent, but (P(X|Y) + P(Y|X))/2 < є, where є is a negative pattern threshold, then X and Y are negatively correlated. ◼ Ex. For the same needle package problem, when no matter there are 200 or 105 transactions, if є = 0.01, we have (P(A|B) + P(B|A))/2 = (0.01 + 0.01)/2 < є

点击进入文档下载页（PPT格式）

共64页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 5 Mining Frequent Patterns, Association and Correlations：Basic Concepts and Methods
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 4 OLAP - Data Warehousing and On-line Analytical Processing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 3 Data Preprocessing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 2 about data - Getting to Know Your Data
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 1 introduction
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第7章机器人规划
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第6章机器学习
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第5章搜索策略
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第4章智能计算（计算智能）
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第3章推理技术
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第2章知识表示
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_绪论、第1章人工智能概述
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 7 Classification：Basic Concepts
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 8 Cluster Analysis：Basic Concepts and Methods
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 9 Outlier Analysis
延安大学：《网页制作基础教程》课程教学资源_教学大纲
延安大学：《网页制作基础教程》学术论文_基于AJAX技术的Web模型在网站互动平台的应用研究
延安大学：《网页制作基础教程》学术论文_基于RIA技术的实验演示系统的设计与实现
延安大学：《网页制作基础教程》学术论文_服务器推技术在实验演示系统中的应用
延安大学：《网页制作基础教程》学术论文_用户行为驱动的网页布局自动调整的研究
《网页制作基础教程》参考书籍（PDF）：JavaScript 权威指南（第四版）
《网页制作基础教程》参考书籍（PDF）：Python学习手册（第3版，涵盖Pathon 2.5）
《网页制作基础教程》参考书籍：CSS Mastery 精通CSS书籍——高级WEB标准解决方案（人民邮电出版社）
延安大学：《网页制作基础教程》课程PPT教学课件_第一章网页结构（牛永洁）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录