当前位置：和泉文库 > 计算机 > 浏览文档

同济大学：《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源（PPT课件讲稿）Clustering Basics（主讲：赵钦佩）

◼ Cluster Basics ◼ Clustering algorithms  Hierarchical clustering  k-means  Expectation-Maximization (EM) ◼ Cluster Validity  determining the number of clusters  clustering evaluation

文件格式：PPTX，文件大小：1.34MB，售价：11.46元

文档详细内容（约48页）

Outline a Cluster basics Clustering algorithms a Hierarchical clustering a K-means a Expectation-Maximization(EM) a Cluster Validity n determining the number of clusters a Clustering evaluation

2 Outline ◼ Cluster Basics ◼ Clustering algorithms  Hierarchical clustering  k-means  Expectation-Maximization (EM) ◼ Cluster Validity  determining the number of clusters  clustering evaluation

Clustering Analysis ■ Definition 口物以类聚,人以群居 n Grouping the data with similar features It's a method of data exploration, a way of looking for patterns or structure in the V:"... data that are of interest a Properties: unsupervised parameter needed Application field: Machine learning, pattern recognition mage analysis, data mining information retrieval and K-means animation bioinformatics etc

3 Clustering Analysis ◼ Definition:  物以类聚，人以群居  Grouping the data with similar features ◼ It’s a method of data exploration, a way of looking for patterns or structure in the data that are of interest. ◼ Properties: unsupervised, parameter needed ◼ Application field: Machine learning, pattern recognition, image analysis, data mining, information retrieval and bioinformatics etc. K-means animation

Factors of Clustering What data could be used in clustering? a Large or small, Gaussian or non-Gaussian, etc a Which clustering algorithm?(cost function) Partition-based(e.g k-means n Model-based(e.g EM algorithm) a Density-based(e.g. DBSCAN) Genetic, spectral a Choosing(dis similarity measures-a critical step in clustering 口 Euclidean distance, a Pearson linear correlation a How to evaluate the clustering result?(cluster validity)

4 Factors of Clustering ◼ What data could be used in clustering?  Large or small, Gaussian or non-Gaussian, etc. ◼ Which clustering algorithm? (cost function)  Partition-based (e.g. k-means)  Model-based (e.g. EM algorithm)  Density-based (e.g. DBSCAN)  Genetic, spectral …… ◼ Choosing (dis)similarity measures – a critical step in clustering  Euclidean distance,…  Pearson Linear Correlation,… ◼ How to evaluate the clustering result? (cluster validity)

Quality: What Is Good Clustering? A good clustering method will produce high quality clusters with a high intra-class similarity a low inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation a The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

5 Quality: What Is Good Clustering? ◼ A good clustering method will produce high quality clusters with  high intra-class similarity  low inter-class similarity ◼ The quality of a clustering result depends on both the similarity measure used by the method and its implementation ◼ The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

Requirements of clustering in data mining(1) Scalability ability to deal with different types of attributes Discovery of clusters with arbitrary shape a Minimal requirements for domain knowledge to determine input parameters

◼ Scalability ◼ Ability to deal with different types of attributes ◼ Discovery of clusters with arbitrary shape ◼ Minimal requirements for domain knowledge to determine input parameters Requirements of clustering in data mining (1) 6

点击进入文档下载页（PPTX格式）

共48页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

东南大学：《C++语言程序设计》课程教学资源（PPT课件讲稿）Chapter 09 Classes A Deeper Look（Part 1）
贵州电子信息职业技术学院：常用办公技巧（PPT讲稿，主讲：刘忠华）
计算机软件技术基础：《Visual Basic6.0 程序设计》课程教学资源（PPT课件）第1章 Visual Basic（VB）概述
Dynamic Pricing in Spatial Crowdsourcing：A Matching-Based Approach
《Java Web应用开发基础》课程教学资源（PPT课件）第8章 EL、JSTL和Ajax技术
《计算机组装与维修》课程电子教案（PPT教学课件）第一章计算机系统维护维修基础
湖南生物机电职业技术学院：《电子商务概论》课程教学资源（PPT课件）第六章网上支付
清华大学出版社：《网络信息安全技术》教材电子教案（PPT课件讲稿）第2章密码技术
《网络系统集成技术》课程教学资源（PPT课件讲稿）第六章网络互联技术
数据库接口技术（PPT讲稿）开放式数据库联接 Open DataBase Connectivity——ODBC
《网络综合布线》课程教学资源（PPT讲稿）模块2 综合布线工程设计
《软件工程》课程教学资源（PPT课件讲稿）第4章软件总体设计
电子工业出版社：《计算机网络》课程教学资源（第五版，PPT课件讲稿）第三章数据链路层
上海交通大学：《网络安全技术》课程教学资源（PPT课件讲稿）比特币（主讲：刘振）
中国科学技术大学：《并行算法实践》课程教学资源（PPT课件讲稿）上篇并行程序设计导论单元II 并行程序编程指南第七章 OpenMP编程指南
Online Minimum Matching in Real-Time Spatial Data：Experiments and Analysis
《数字图像处理 Digital Image Processing》课程教学资源（各章要求及必做题参考答案）
北京航空航天大学：Graph Search & Social Networks
《C程序设计》课程电子教案（PPT课件讲稿）第四章数组和结构
西安电子科技大学：《信息系统安全》课程教学资源（PPT课件讲稿）第二章安全控制原理
南京航空航天大学：《数据结构》课程教学资源（PPT课件讲稿）第十章排序
四川大学：《计算机操作系统 Operating System Principles》课程教学资源（PPT课件讲稿）第9章文件管理
《多媒体教学软件设计》课程教学资源（PPT课件讲稿）第4章多媒体教学软件的图文演示设计
河南中医药大学（河南中医学院）：《计算机网络》课程教学资源（PPT课件讲稿）第三章数据链路层

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录