当前位置：和泉文库 > 计算机 > 浏览文档

北京航空航天大学：《数据挖掘——概念和技术（Data Mining - Concepts and Techniques）》课程教学资源（PPT课件讲稿）Chapter 05 Mining Frequent Patterns, Association and Correlations

◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary

文件格式：PPT，文件大小：2.14MB，售价：22.19元

共96页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约96页）

Efficient Implementation of Apriori in SQL Hard to get good performance out of pure SQL (SQL-92) based approaches alone Make use of object-relational extensions like UDFS BLOBs Table functions etc Get orders of magnitude improvement S Sarawagi, s. Thomas, andR. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD98 February 4, 2021 Data Mining: Concepts and Techniques 18

February 4, 2021 Data Mining: Concepts and Techniques 18 Efficient Implementation of Apriori in SQL ◼ Hard to get good performance out of pure SQL (SQL-92) based approaches alone ◼ Make use of object-relational extensions like UDFs, BLOBs, Table functions etc. ◼ Get orders of magnitude improvement ◼ S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD’98

Challenges of frequent Pattern Mining Challenges Multiple scans of transaction database Huge number of candidates Tedious workload of support counting for candidates Improving apriori: general ideas Reduce passes of transaction database scans Shrink number of candidates Facilitate support counting of candidates February 4, 2021 Data Mining: Concepts and Techniques 19

February 4, 2021 Data Mining: Concepts and Techniques 19 Challenges of Frequent Pattern Mining ◼ Challenges ◼ Multiple scans of transaction database ◼ Huge number of candidates ◼ Tedious workload of support counting for candidates ◼ Improving Apriori: general ideas ◼ Reduce passes of transaction database scans ◼ Shrink number of candidates ◼ Facilitate support counting of candidates

Partition: Scan Database Only Twice Any itemset that is potentially frequent in db must be frequent in at least one of the partitions of db Scan 1: partition database and find local frequent patterns Scan 2: consolidate global frequent patterns A. Savasere, E. omiecinski, and s navathe. an efficient algorithm for mining association in large databases. In VLDB95 February 4, 2021 Data Mining: Concepts and Techniques

February 4, 2021 Data Mining: Concepts and Techniques 20 Partition: Scan Database Only Twice ◼ Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB ◼ Scan 1: partition database and find local frequent patterns ◼ Scan 2: consolidate global frequent patterns ◼ A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association in large databases. In VLDB’95

DHP: Reduce the number of candidates A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent Candidates: a, b, c,d,e Hash entries: ab, ad ae tbd, be de] Frequent 1-itemset: a, b d, e ab is not a candidate 2-itemset if the sum of count of fab, ad ae is below support threshold 1. Park, m, chen and p yu, an effective hash-based algorithm for mining association rules. In SIGMOD95 February 4, 2021 Data Mining: Concepts and Techniques 21

February 4, 2021 Data Mining: Concepts and Techniques 21 DHP: Reduce the Number of Candidates ◼ A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent ◼ Candidates: a, b, c, d, e ◼ Hash entries: {ab, ad, ae} {bd, be, de} … ◼ Frequent 1-itemset: a, b, d, e ◼ ab is not a candidate 2-itemset if the sum of count of {ab, ad, ae} is below support threshold ◼ J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD’95

Sampling for Frequent Patterns Select a sample of original database mine frequent patterns within sample using apriori Scan database once to verify frequent itemsets found in sample only borders of closure of frequent patterns are checked Example: check abcd instead of ab ac,., ei Scan database again to find missed frequent patterns H. Toivonen Sampling large databases for association rules. In VLDB96 February 4, 2021 Data Mining: Concepts and Techniques 22

February 4, 2021 Data Mining: Concepts and Techniques 22 Sampling for Frequent Patterns ◼ Select a sample of original database, mine frequent patterns within sample using Apriori ◼ Scan database once to verify frequent itemsets found in sample, only borders of closure of frequent patterns are checked ◼ Example: check abcd instead of ab, ac, …, etc. ◼ Scan database again to find missed frequent patterns ◼ H. Toivonen. Sampling large databases for association rules. In VLDB’96

点击进入文档下载页（PPT格式）

共96页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

计算机算法（PPT讲稿）禁忌搜索算法 Tabu Search
2019年《计算机网络》考试大纲
四川大学：《数据结构》课程教学资源（PPT课件讲稿）第五章树和二叉树 Tree & Binary Tree
佛山科学技术学院：《网络技术基础》课程教学资源（专业技能考试大纲）
《计算机操作系统》课程教学资源（PPT课件讲稿）第二章进程描述与控制 Process Concept & Process Control
香港城市大学：PROGRAMMING METHODOLOGY AND SOFTWARE ENGINEERING
《计算机网络》课程教学资源（PPT课件讲稿）第8章应用层
并行处理（PPT讲稿）Parallel Processing - Hypercubes and Their Algorithms
《计算机网络》课程电子教案（PPT课件讲稿）第2章数据通信的基础知识
《Excel高级应用》课程教学资源：课程教学大纲
新乡学院：《办公自动化》课程教学资源（教学大纲）
《视频制作》课程教学资源：课程教学大纲
电子科技大学：《计算机操作系统》课程教学资源（PPT课件讲稿）第二章进程与调度（Processes and Scheduling）
交互式数据语言（PPT讲稿）Basic IDL knowledge
江苏海洋大学（淮海工学院）：《Java面向对象程序设计》课程教学资源（PPT课件讲稿）全国二级Java考试的重点难点
长春工业大学：《Javascript 程序设计》课程教学资源（PPT课件讲稿）第8章网页特效 JavaScript
《计算机组成原理》课程教学资源（PPT课件讲稿）第三章 CPU子系统
南京大学：移动Agent系统支撑（PPT讲稿）Mobile Agent Communication——Software Agent
PROGRAMMING METHODOLOGY AND SOFTWARE ENGINEERING
《SQL Server 2000数据库教程》教学资源（PPT课件讲稿）第11章数据库安全性管理
白城师范学院：《数据库系统概论 An Introduction to Database System》课程教学资源（PPT课件讲稿）第五章数据库完整性
香港城市大学：《计算机图形学》课程教学资源（PPT课件讲稿）图的算法 Graph Algorithms
《The C++ Programming Language》课程教学资源（PPT课件讲稿）Lecture 07 Exception Handling
《C语言程序设计》课程教学资源（PPT课件讲稿）第9章用户自己建立数据类型

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录