当前位置：和泉文库 > 计算机 > 浏览文档

Data Mining Association Analysis——Basic Concepts and Algorithms Chapter 6 Introduction to Data Mining

文件格式：PPT，文件大小：1.73MB，售价：14.16元

文档详细内容（约65页）

Frequent Itemset Generation Strategies Reduce the number of candidates(M) Complete search M=2d Use pruning techniques to reduce M Reduce the number of transactions(N Reduce size of n as the size of itemset increases Reduce the number of comparisons(NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation Strategies Reduce the number of candidates (M) – Complete search: M=2d – Use pruning techniques to reduce M Reduce the number of transactions (N) – Reduce size of N as the size of itemset increases Reduce the number of comparisons (NM) – Use efficient data structures to store the candidates or transactions – No need to match every candidate against every transaction

Reducing Number of Candidates Apriori principle If an itemset is frequent then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure Vx,y:(XcY)→(X)≥(Y) Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure: – Support of an itemset never exceeds the support of its subsets – This is known as the anti-monotone property of support X,Y :(X  Y )  s(X )  s(Y )

Illustrating Apriori Principle BD Found to be Infrequent BD)(ABE)(ACD)(ACE)(ADE(BCD(BCE)(BDE DE ABCD ABCE ABDE\(ACDE BCDE Pruned supersets ABCDE n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Found to be Infrequent null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Illustrating Apriori Principle null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Pruned supersets

Illustrating Apriori Principle Item Count Items(1-itemsets) Bread oke Milk Itemset Count Pairs(2-itemsets) Beer Diaper 424341 Bread, Milk) 3 Eggs bRead, Beer 2(No need to generate (Bread, Diaper) 3 candidates involving Coke MIlk, Beer 2 or Eggs) IMilk, Diaper] 3 [Beer, Diaper 3 Minimum Support =3 Triplets(3-itemsets) If every subset is considered Itemset Count 6C1+6C2+6C3=41 Bread, Milk, Diaper 3 With support-based pruning 6+6+1=13 n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Illustrating Apriori Principle Item Count Bread 4 Coke 2 Milk 4 Beer 3 Diaper 4 Eggs 1 Itemset Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Itemset Count {Bread,Milk,Diaper} 3 Items (1-itemsets) Pairs (2-itemsets) (No need to generate candidates involving Coke or Eggs) Triplets (3-itemsets) Minimum Support = 3 If every subset is considered, 6C1 + 6C2 + 6C3 = 41 With support-based pruning, 6 + 6 + 1 = 13

Apriori Algorithm Method Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identified Generate length(k+1)candidate itemsets from length k frequent itemsets if their first k-1 items are identical o Prune candidate itemsets containing subsets of length k that are infrequent o Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Apriori Algorithm Method: – Let k=1 – Generate frequent itemsets of length 1 – Repeat until no new frequent itemsets are identified ◆ Generate length (k+1) candidate itemsets from length k frequent itemsets if their first k-1 items are identical ◆ Prune candidate itemsets containing subsets of length k that are infrequent ◆ Count the support of each candidate by scanning the DB ◆ Eliminate candidates that are infrequent, leaving only those that are frequent

点击进入文档下载页（PPT格式）

共65页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

《信息安全与管理》课程教学资源（PPT课件讲稿）第六章公开密钥设施PKI
《计算机应用基础》课程教学资源（PPT课件讲稿）第一章计算机基础知识
《Computer Networking：A Top Down Approach》英文教材教学资源（PPT课件讲稿，3rd edition）Chapter 5 Link Layer
西安电子科技大学：《微机原理与接口技术》课程教学资源（PPT课件讲稿）第六章存储器设计
《编译原理》课程教学资源（PPT课件讲稿）第五章类型检查
《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 10 Query expansion
北京师范大学现代远程教育：《计算机应用基础》课程教学资源（PPT课件讲稿）第一章计算机常识
中国科学技术大学：《网络信息安全 NETWORK SECURITY》课程教学资源（PPT课件讲稿）UNIX/LINUX 操作系统
哈尔滨工业大学：《语言信息处理》课程教学资源（PPT课件讲稿）机器翻译 I Machine Translation I（主讲：张宇）
《操作系统 Operating System》课程教学资源（PPT课件讲稿）概述 Overview
《计算机网络》课程教学大纲（计算机科学与技术、网络工程专业）
《计算机组装维修》课程PPT教学课件（实训教程）第3章主板
《计算机组成原理》课程教学资源（PPT课件讲稿）第五章存储器层次结构
电子科技大学：《Unix操作系统基础》课程教学资源（PPT课件）第一章 UNIX操作系统概述、第二章 UNIX使用入门
中国水利水电出版社：《单片机原理及应用》课程PPT教学课件（C语言版）第2章 MCS-51单片机基本结构
《数据结构》课程教学资源（PPT课件讲稿）第三章栈和队列
《网络安全 Network Security》教学资源（PPT讲稿）Topic 3 User Authentication
《C++语言基础教程》课程电子教案（PPT教学课件）教学资源（PPT课件）第2讲 C++语言基础
长春大学：《计算机应用基础》课程教学资源（PPT课件讲稿）第二章操作系统
南京大学：《数据结构 Data Structures》课程教学资源（PPT课件讲稿）第二章线性表
浪潮公司：并行程序、编译与函数库简介、应用软件的调优
《C程序设计》课程电子教案（PPT课件讲稿）第二章基本数据类型及运算
安徽理工大学：《汇编语言》课程教学资源（PPT课件讲稿）第四章汇编语言程序格式
清华大学：《网络安全 Network Security》课程教学资源（PPT课件讲稿）Lecture 01 Introduction

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录