当前位置：和泉文库 > 计算机 > 浏览文档

Data Mining Association Analysis——Basic Concepts and Algorithms Chapter 6 Introduction to Data Mining

文件格式：PPT，文件大小：1.73MB，售价：14.16元

文档详细内容（约65页）

The Apriori algorithm Example Database d itemset sup I itemset sup TID Items {1} 100134 {2} 200235 Scan d {2} {3} 23313 {3} 3001235 2333 5 40025 5 2 itemset sup c, Litemset L ,itemset sup {12} Scan d(1 2 {13}2 {13}2 {13} {23}2 {15}1 {15} 25}3 {23}2 {23} 35}2 {25}3 {25} 35}2 35 C3 itemset ScanD L3 litemset su 2351 23 35}2 O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› The Apriori Algorithm — Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 Scan D C1 L1 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 C2 C2 Scan D C3 L3 itemset {2 3 5} Scan D itemset sup {2 3 5} 2

Reducing Number of comparisons Candidate counting Scan the database of transactions to determine the support of each candidate itemset o reduce the number of comparisons, store the candidates in a hash structure Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets Transactions Hash Structure TID Items Bread. milk 2 Bread. Diaper. beer, eggs Milk, Diaper, Beer, Coke k 4Bread, Milk, Diaper, Beer 5Bread, Milk, Diaper,Coke Buckets n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Reducing Number of Comparisons Candidate counting: – Scan the database of transactions to determine the support of each candidate itemset – To reduce the number of comparisons, store the candidates in a hash structure ◆ Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions Hash Structure k Buckets

Generate Hash tree Suppose you have 15 candidate itemsets of length 3: {145}{124},{457{125},{458}1159}{136}{234}{567{45} 356}{357},{689},{367}{368} You need · Hash function Max leaf size: max number of itemsets stored in a leaf node(if number of candidate itemsets exceeds max leaf size, split the node Hash function 3.6 567 145 345356367 136 2.5.8 357368 124 689 457 25159 458 n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Generate Hash Tree 2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 1 2 5 4 5 8 1 5 9 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 1,4,7 2,5,8 3,6,9 Hash function Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8} You need: • Hash function • Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node)

Association Rule Discovery: Hash tree Hash function Candidate Hash tree 1.4.7 3,6,9 2,5,8 234 56 145 136 345 356 367 Hash on 357 368 1.40r7 124 125 15 689 457458 n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Discovery: Hash tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function Candidate Hash Tree Hash on 1, 4 or 7

Association Rule Discovery: Hash tree Hash function Candidate Hash tree 1,4,7 3,6,9 2,5,8 234 145 136 345 356 367 Hash on 357 368 2.508 124 689 125 457458 n Steinbach. Kumar Introduction to Data Mining 4/18/2004

点击进入文档下载页（PPT格式）

共65页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

《信息安全与管理》课程教学资源（PPT课件讲稿）第六章公开密钥设施PKI
《计算机应用基础》课程教学资源（PPT课件讲稿）第一章计算机基础知识
《Computer Networking：A Top Down Approach》英文教材教学资源（PPT课件讲稿，3rd edition）Chapter 5 Link Layer
西安电子科技大学：《微机原理与接口技术》课程教学资源（PPT课件讲稿）第六章存储器设计
《编译原理》课程教学资源（PPT课件讲稿）第五章类型检查
《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 10 Query expansion
北京师范大学现代远程教育：《计算机应用基础》课程教学资源（PPT课件讲稿）第一章计算机常识
中国科学技术大学：《网络信息安全 NETWORK SECURITY》课程教学资源（PPT课件讲稿）UNIX/LINUX 操作系统
哈尔滨工业大学：《语言信息处理》课程教学资源（PPT课件讲稿）机器翻译 I Machine Translation I（主讲：张宇）
《操作系统 Operating System》课程教学资源（PPT课件讲稿）概述 Overview
《计算机网络》课程教学大纲（计算机科学与技术、网络工程专业）
《计算机组装维修》课程PPT教学课件（实训教程）第3章主板
《计算机组成原理》课程教学资源（PPT课件讲稿）第五章存储器层次结构
电子科技大学：《Unix操作系统基础》课程教学资源（PPT课件）第一章 UNIX操作系统概述、第二章 UNIX使用入门
中国水利水电出版社：《单片机原理及应用》课程PPT教学课件（C语言版）第2章 MCS-51单片机基本结构
《数据结构》课程教学资源（PPT课件讲稿）第三章栈和队列
《网络安全 Network Security》教学资源（PPT讲稿）Topic 3 User Authentication
《C++语言基础教程》课程电子教案（PPT教学课件）教学资源（PPT课件）第2讲 C++语言基础
长春大学：《计算机应用基础》课程教学资源（PPT课件讲稿）第二章操作系统
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第二章线性表
浪潮公司：并行程序、编译与函数库简介、应用软件的调优
《C程序设计》课程电子教案（PPT课件讲稿）第二章基本数据类型及运算
安徽理工大学：《汇编语言》课程教学资源（PPT课件讲稿）第四章汇编语言程序格式
清华大学：《网络安全 Network Security》课程教学资源（PPT课件讲稿）Lecture 01 Introduction

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录