当前位置：和泉文库 > 计算机 > 浏览文档

《知识发现和数据挖掘 Knowledge Discovery and Data Mining》课程教学课件（PPT讲稿）Chapter 10. Cluster Analysis：Basic Concepts and Methods

◼ Cluster Analysis: Basic Concepts ◼ Partitioning Methods ◼ Hierarchical Methods ◼ Density-Based Methods ◼ Grid-Based Methods ◼ Evaluation of Clustering ◼ Summary

文件格式：PPTX，文件大小：1.69MB，售价：25.5元

共100页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约100页）

Major Clustering Approaches (D) Model-based A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other Typical methods: EM, SOM, COBWEB Frequent pattern-based: Based on the analysis of frequent patterns Typical methods: p-Cluster User-guided or constraint-based Clustering by considering user-specified or application-specific constraints Typical methods: COD (obstacles), constrained clustering Link-based clustering Objects are often linked together in various ways Massive links can be used to cluster objects: SimRank, Linkclus 11

Major Clustering Approaches (II) ◼ Model-based: ◼ A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other ◼ Typical methods: EM, SOM, COBWEB ◼ Frequent pattern-based: ◼ Based on the analysis of frequent patterns ◼ Typical methods: p-Cluster ◼ User-guided or constraint-based: ◼ Clustering by considering user-specified or application-specific constraints ◼ Typical methods: COD (obstacles), constrained clustering ◼ Link-based clustering: ◼ Objects are often linked together in various ways ◼ Massive links can be used to cluster objects: SimRank, LinkClus 11

Chapter 10. Cluster Analysis: Basic Concepts and Methods Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical methods Density-Based Methods Grid-Based methods Evaluation of clustering Summary 12

12 Chapter 10. Cluster Analysis: Basic Concepts and Methods ◼ Cluster Analysis: Basic Concepts ◼ Partitioning Methods ◼ Hierarchical Methods ◼ Density-Based Methods ◼ Grid-Based Methods ◼ Evaluation of Clustering ◼ Summary 12

Partitioning Algorithms: Basic Concept Partitioning method: Partitioning a database d of n objects into a set of k clusters, such that the sum of squared distances is minimized(where C is the centroid or medoid of cluster Ci) E=Xi=epec(d(p, ci ) Given k, find a partition of k clusters that optimizes the chosen partitioning criterion Global optimal: exhaustively enumerate all partitions Heuristic methods: k-means and k-medoids algorithms k-means MacQueen 67, Lloyd 57/82: Each cluster is represented by the center of the cluster k-medoids or PAM(Partition around medoids)( Kaufman Rousseeuw87 ): Each cluster is represented by one of the objects in the cluster 13

Partitioning Algorithms: Basic Concept ◼ Partitioning method: Partitioning a database D of n objects into a set of k clusters, such that the sum of squared distances is minimized (where ci is the centroid or medoid of cluster Ci ) ◼ Given k, find a partition of k clusters that optimizes the chosen partitioning criterion ◼ Global optimal: exhaustively enumerate all partitions ◼ Heuristic methods: k-means and k-medoids algorithms ◼ k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the center of the cluster ◼ k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is represented by one of the objects in the cluster 2 1 ( ( , )) p C i k i E d p c =  i =   13

The K-Means Clustering Method Given k, the k-means algorithm is implemented in four steps Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partitioning( the centroid is the center, i. e mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step 2, stop when the assignment does not change 14

The K-Means Clustering Method ◼ Given k, the k-means algorithm is implemented in four steps: ◼ Partition objects into k nonempty subsets ◼ Compute seed points as the centroids of the clusters of the current partitioning (the centroid is the center, i.e., mean point, of the cluster) ◼ Assign each object to the cluster with the nearest seed point ◼ Go back to Step 2, stop when the assignment does not change 14

An Example of K-Means Clustering K=2 Arbitrarily Update the partition cluster objects into centroids groups The initial data set Loop if Reassign_objects needed Partition objects into k nonempty subsets Repeat Compute centroid (i.e, mean Update the point)for each partition cluster centroids Assign each object to the cluster of its nearest centroid ■ Until no change 15

An Example of K-Means Clustering K=2 Arbitrarily partition objects into k groups Update the cluster centroids Update the cluster centroids Reassign objects Loop if needed 15 The initial data set ◼ Partition objects into k nonempty subsets ◼ Repeat ◼ Compute centroid (i.e., mean point) for each partition ◼ Assign each object to the cluster of its nearest centroid ◼ Until no change

点击进入文档下载页（PPTX格式）

共100页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

《人工智能原理及应用》课程教学大纲 Artificial Intelligence Principles and Applications
西安电子科技大学：《接入网技术及其应用》课程教学资源（PPT课件讲稿）第6章接入网应用（徐展琦）
《管理信息系统原理及开发》课程教学资源（PPT课件讲稿）第3、4讲管理信息系统的系统设计
西安电子科技大学：《现代密码学》课程教学资源（PPT课件讲稿）第四章公钥密码（主讲：董庆宽）
河南中医药大学（河南中医学院）：《计算机文化》课程教学资源（PPT课件讲稿）第二章计算机的前世今生（主讲：许成刚）
《计算机软件及应用》课程教学资源（PPT课件讲稿）第2章 Photoshop CS入门基础
《大型机高级系统管理技术》课程教学资源（PPT课件讲稿）第4章作业控制子系统
上海交通大学：《软件工程 Software Engineering》课程教学资源（PPT课件讲稿）软件开发过程 Software Development Processes
中国水利水电出版社：《计算机组装与维护实训教程》课程教学资源（PPT课件讲稿，共九章）
《大学生计算机基础》课程教学资源（PPT讲稿）第三章字处理软件（Word 2003）
北京大学：《高级软件工程》课程教学资源（PPT课件讲稿）第六讲网络环境中的软件质量
《计算机数据恢复技术》课程教学资源（PPT课件讲稿）第1章数据恢复技术概述
中国科学技术大学：《信号与图像处理基础 Signal and Image Processing》课程教学资源（PPT课件讲稿）小波分析 Wavelet Analysis（主讲：曹洋）
《计算机网络 Computer Networking》课程教学资源（PPT课件讲稿）Chapter 6 无线和移动网络 Wireless and Mobile Networks
《UNIX操作系统基础》课程教学资源（PPT课件讲稿）第三章 UNIX的文件与目录
上海交通大学：并发理论（PPT课件诗篇）Concurrency Theory
南京大学：《Java语言程序设计》课程教学资源（PPT课件讲稿）第2章 Java语言语法基础
南京大学：使用失效数据来引导决定（PPT讲稿，计算机系：赵建华）
南京航空航天大学：《C++》课程电子教案（PPT课件讲稿）第3章类的基础部分（主讲：陈哲）
《软件工程导论》课程教学资源（PPT课件讲稿）第9章面向对象方法学
河南中医药大学（河南中医学院）：《计算机文化》课程教学资源（PPT课件讲稿）第一章计算机网络概述（主讲：阮晓龙）
《数据库原理》课程教学资源（PPT课件讲稿）第三章关系数据库标准查询语言SQL
Excel 2010高级使用技巧（PPT讲稿）
电子工业出版社：《计算机网络》课程教学资源（第五版，PPT课件讲稿）第二章物理层

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录