当前位置：和泉文库 > 计算机 > 浏览文档

《知识发现和数据挖掘 Knowledge Discovery and Data Mining》课程教学课件（PPT讲稿）Chapter 10. Cluster Analysis：Basic Concepts and Methods

◼ Cluster Analysis: Basic Concepts ◼ Partitioning Methods ◼ Hierarchical Methods ◼ Density-Based Methods ◼ Grid-Based Methods ◼ Evaluation of Clustering ◼ Summary

文件格式：PPTX，文件大小：1.69MB，售价：25.5元

共100页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约100页）

Quality: What Is Good clustering? A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters The guality of a clustering method depends on the similarity measure used by the method a its implementation, and Its ability to discover some or all of the hidden patterns 6

Quality: What Is Good Clustering? ◼ A good clustering method will produce high quality clusters ◼ high intra-class similarity: cohesive within clusters ◼ low inter-class similarity: distinctive between clusters ◼ The quality of a clustering method depends on ◼ the similarity measure used by the method ◼ its implementation, and ◼ Its ability to discover some or all of the hidden patterns 6

Measure the quality of clustering Dissimilarity/Similarity metric Similarity is expressed in terms of a distance function typically metric: d(,D The definitions of distance functions are usually rather different for interval-scaled boolean categorical ordinal ratio, and vector variables Weights should be associated with different variables based on applications and data semantics Quality of clustering There is usually a separate "quality function that measures the goodness" of a cluster It is hard to define similar enough"or "good enough The answer is typically highly subjective

Measure the Quality of Clustering ◼ Dissimilarity/Similarity metric ◼ Similarity is expressed in terms of a distance function, typically metric: d(i, j) ◼ The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables ◼ Weights should be associated with different variables based on applications and data semantics ◼ Quality of clustering: ◼ There is usually a separate “quality” function that measures the “goodness” of a cluster. ◼ It is hard to define “similar enough” or “good enough” ◼ The answer is typically highly subjective 7

Considerations for Cluster Analysis Partitioning criteria Single level vs. hierarchical partitioning(often, multi-level hierarchical partitioning is desirable Separation of clusters EXclusive(e.g, one customer belongs to only one region)Vs non exclusive(e.g, one document may belong to more than one class Similarity measure Distance-based(e.g, Euclidian, road network, vector)VS connectivity-based(e.g, density or contiguity) Clustering space Full space(often when low dimensional)vs subspaces(often in high-dimensional clustering 8

Considerations for Cluster Analysis ◼ Partitioning criteria ◼ Single level vs. hierarchical partitioning (often, multi-level hierarchical partitioning is desirable) ◼ Separation of clusters ◼ Exclusive (e.g., one customer belongs to only one region) vs. nonexclusive (e.g., one document may belong to more than one class) ◼ Similarity measure ◼ Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-based (e.g., density or contiguity) ◼ Clustering space ◼ Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering) 8

Requirements and challenges Scalability Clustering all the data instead of only on samples Ability to deal with different types of attributes Numerical, binary, categorical, ordinal, linked, and mixture of the lese Constraint-based clustering User may give inputs on constraints Use domain knowledge to determine input parameters Interpretability and usability Others Discovery of clusters with arbitrary shape Ability to deal with noisy data Incremental clustering and insensitivity to input order High dimensionalit

Requirements and Challenges ◼ Scalability ◼ Clustering all the data instead of only on samples ◼ Ability to deal with different types of attributes ◼ Numerical, binary, categorical, ordinal, linked, and mixture of these ◼ Constraint-based clustering ◼ User may give inputs on constraints ◼ Use domain knowledge to determine input parameters ◼ Interpretability and usability ◼ Others ◼ Discovery of clusters with arbitrary shape ◼ Ability to deal with noisy data ◼ Incremental clustering and insensitivity to input order ◼ High dimensionality 9

Major Clustering Approaches o Partitioning approach Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS Hierarchical approach Create a hierarchical decomposition of the set of data(or objects using some criterion Typical methods: Diana, Agnes, BIRCH, CAMELEON Density-based approach Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue Grid-based approach based on a multiple- level granularity structure Typical methods: STING, WaveCluster, CLIQUE 10

Major Clustering Approaches (I) ◼ Partitioning approach: ◼ Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors ◼ Typical methods: k-means, k-medoids, CLARANS ◼ Hierarchical approach: ◼ Create a hierarchical decomposition of the set of data (or objects) using some criterion ◼ Typical methods: Diana, Agnes, BIRCH, CAMELEON ◼ Density-based approach: ◼ Based on connectivity and density functions ◼ Typical methods: DBSACN, OPTICS, DenClue ◼ Grid-based approach: ◼ based on a multiple-level granularity structure ◼ Typical methods: STING, WaveCluster, CLIQUE 10

点击进入文档下载页（PPTX格式）

共100页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

《人工智能原理及应用》课程教学大纲 Artificial Intelligence Principles and Applications
西安电子科技大学：《接入网技术及其应用》课程教学资源（PPT课件讲稿）第6章接入网应用（徐展琦）
《管理信息系统原理及开发》课程教学资源（PPT课件讲稿）第3、4讲管理信息系统的系统设计
西安电子科技大学：《现代密码学》课程教学资源（PPT课件讲稿）第四章公钥密码（主讲：董庆宽）
河南中医药大学（河南中医学院）：《计算机文化》课程教学资源（PPT课件讲稿）第二章计算机的前世今生（主讲：许成刚）
《计算机软件及应用》课程教学资源（PPT课件讲稿）第2章 Photoshop CS入门基础
《大型机高级系统管理技术》课程教学资源（PPT课件讲稿）第4章作业控制子系统
上海交通大学：《软件工程 Software Engineering》课程教学资源（PPT课件讲稿）软件开发过程 Software Development Processes
中国水利水电出版社：《计算机组装与维护实训教程》课程教学资源（PPT课件讲稿，共九章）
《大学生计算机基础》课程教学资源（PPT讲稿）第三章字处理软件（Word 2003）
北京大学：《高级软件工程》课程教学资源（PPT课件讲稿）第六讲网络环境中的软件质量
《计算机数据恢复技术》课程教学资源（PPT课件讲稿）第1章数据恢复技术概述
中国科学技术大学：《信号与图像处理基础 Signal and Image Processing》课程教学资源（PPT课件讲稿）小波分析 Wavelet Analysis（主讲：曹洋）
《计算机网络 Computer Networking》课程教学资源（PPT课件讲稿）Chapter 6 无线和移动网络 Wireless and Mobile Networks
《UNIX操作系统基础》课程教学资源（PPT课件讲稿）第三章 UNIX的文件与目录
上海交通大学：并发理论（PPT课件诗篇）Concurrency Theory
南京大学：《Java语言程序设计》课程教学资源（PPT课件讲稿）第2章 Java语言语法基础
南京大学：使用失效数据来引导决定（PPT讲稿，计算机系：赵建华）
南京航空航天大学：《C++》课程电子教案（PPT课件讲稿）第3章类的基础部分（主讲：陈哲）
《软件工程导论》课程教学资源（PPT课件讲稿）第9章面向对象方法学
河南中医药大学（河南中医学院）：《计算机文化》课程教学资源（PPT课件讲稿）第一章计算机网络概述（主讲：阮晓龙）
《数据库原理》课程教学资源（PPT课件讲稿）第三章关系数据库标准查询语言SQL
Excel 2010高级使用技巧（PPT讲稿）
电子工业出版社：《计算机网络》课程教学资源（第五版，PPT课件讲稿）第二章物理层

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录