当前位置：和泉文库 > 计算机 > 浏览文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 8 Cluster Analysis：Basic Concepts and Methods

◼ Cluster Analysis: Basic Concepts ◼ Partitioning Methods ◼ Hierarchical Methods ◼ Density-Based Methods ◼ Grid-Based Methods ◼ Evaluation of Clustering ◼ Summary

文件格式：PPTX，文件大小：2.39MB，售价：24.87元

共97页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约97页）

Considerations for Cluster Analysis a Partitioning criteria Single level Vs. hierarchical partitioning (often, multi-level hierarchical partitioning is desirable) Separation of clusters EXclusive(e.g one customer belongs to only one region)Vs non exclusive(e.g, one document may belong to more than one class Similarity measure Distance-based(e. g, Euclidian, road network, vector)Vs connectivity-based( e.g., density or contiguity) Clustering space Full space(often when low dimensional)Vs subspaces(often in high-dimensional clustering) 11

Considerations for Cluster Analysis ◼ Partitioning criteria ◼ Single level vs. hierarchical partitioning (often, multi-level hierarchical partitioning is desirable) ◼ Separation of clusters ◼ Exclusive (e.g., one customer belongs to only one region) vs. nonexclusive (e.g., one document may belong to more than one class) ◼ Similarity measure ◼ Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-based (e.g., density or contiguity) ◼ Clustering space ◼ Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering) 11

Requirements and challenges ■ Scalability Clustering all the data instead of only on samples Ability to deal with different types of attributes Numerical, binary, categorical, ordinal, linked, and mixture of the ese Constraint-based clustering User may give inputs on constraints Use domain knowledge to determine input parameters Interpretability and usability Others Discovery of clusters with arbitrary shape Ability to deal with noisy data Incremental clustering and insensitivity to input order High dimensionality

Requirements and Challenges ◼ Scalability ◼ Clustering all the data instead of only on samples ◼ Ability to deal with different types of attributes ◼ Numerical, binary, categorical, ordinal, linked, and mixture of these ◼ Constraint-based clustering ◼ User may give inputs on constraints ◼ Use domain knowledge to determine input parameters ◼ Interpretability and usability ◼ Others ◼ Discovery of clusters with arbitrary shape ◼ Ability to deal with noisy data ◼ Incremental clustering and insensitivity to input order ◼ High dimensionality 12

Major Clustering Approaches o Partitioning approach Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS Hierarchical approach Create a hierarchical decomposition of the set of data(or objects) using some criterion Typical methods: Diana, Agnes, BIRCH, CAMELEON Density-based approach Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue Grid-based approach based on a multiple-level granularity structure Typical methods: STING, Wave Cluster, CLIQUE

Major Clustering Approaches (I) ◼ Partitioning approach: ◼ Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors ◼ Typical methods: k-means, k-medoids, CLARANS ◼ Hierarchical approach: ◼ Create a hierarchical decomposition of the set of data (or objects) using some criterion ◼ Typical methods: Diana, Agnes, BIRCH, CAMELEON ◼ Density-based approach: ◼ Based on connectivity and density functions ◼ Typical methods: DBSACN, OPTICS, DenClue ◼ Grid-based approach: ◼ based on a multiple-level granularity structure ◼ Typical methods: STING, WaveCluster, CLIQUE 13

Major Clustering Approaches (D) Model-based a model is hypothesized for each of the clusters and tries to find the best fit of that model to each other Typical methods: EM, SOM, COBWEB Frequent pattern-based Based on the analysis of frequent patterns Typical methods: p-Cluster User-guided or constraint-based Clustering by considering user-specified or application-specific constraints Typical methods: COD(obstacles), constrained clustering Link-based clustering: Objects are often linked together in various ways Massive links can be used to cluster objects SimRank LinkClus 14

Major Clustering Approaches (II) ◼ Model-based: ◼ A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other ◼ Typical methods: EM, SOM, COBWEB ◼ Frequent pattern-based: ◼ Based on the analysis of frequent patterns ◼ Typical methods: p-Cluster ◼ User-guided or constraint-based: ◼ Clustering by considering user-specified or application-specific constraints ◼ Typical methods: COD (obstacles), constrained clustering ◼ Link-based clustering: ◼ Objects are often linked together in various ways ◼ Massive links can be used to cluster objects: SimRank, LinkClus 14

Notion of a Cluster can be Ambiguous ◇◆口 How many clusters? Six Clusters △ △△△ ◇◇ Two Clusters Four Clusters 15

15 Notion of a Cluster can be Ambiguous How many clusters? Two Clusters Four Clusters Six Clusters

点击进入文档下载页（PPTX格式）

共97页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 7 Classification：Basic Concepts
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 6 Advanced Frequent Pattern Mining
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 5 Mining Frequent Patterns, Association and Correlations：Basic Concepts and Methods
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 4 OLAP - Data Warehousing and On-line Analytical Processing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 3 Data Preprocessing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 2 about data - Getting to Know Your Data
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 1 introduction
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第7章机器人规划
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第6章机器学习
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第5章搜索策略
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第4章智能计算（计算智能）
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第3章推理技术
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 9 Outlier Analysis
延安大学：《网页制作基础教程》课程教学资源_教学大纲
延安大学：《网页制作基础教程》学术论文_基于AJAX技术的Web模型在网站互动平台的应用研究
延安大学：《网页制作基础教程》学术论文_基于RIA技术的实验演示系统的设计与实现
延安大学：《网页制作基础教程》学术论文_服务器推技术在实验演示系统中的应用
延安大学：《网页制作基础教程》学术论文_用户行为驱动的网页布局自动调整的研究
《网页制作基础教程》参考书籍（PDF）：JavaScript 权威指南（第四版）
《网页制作基础教程》参考书籍（PDF）：Python学习手册（第3版，涵盖Pathon 2.5）
《网页制作基础教程》参考书籍：CSS Mastery 精通CSS书籍——高级WEB标准解决方案（人民邮电出版社）
延安大学：《网页制作基础教程》课程PPT教学课件_第一章网页结构（牛永洁）
延安大学：《网页制作基础教程》课程PPT教学课件_第二章网页头部
延安大学：《网页制作基础教程》课程PPT教学课件_第三章格式化

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录