Applications of Cluster Analysis ■ Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets Clustering precipitation in Australia 6
6 Applications of Cluster Analysis ◼ Understanding ◼ Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations ◼ Summarization ◼ Reduce the size of large data sets Clustering precipitation in Australia
Clustering: Rich Applications and Multidisciplinary Efforts Pattern Recognition Spatial data Analysis Create thematic maps in Gis by clustering feature spaces Detect spatial clusters or for other spatial mining tasks Image Processing Economic Science(especially market research) WWW Document classification Cluster Weblog data to discover groups of similar access patterns
7 Clustering: Rich Applications and Multidisciplinary Efforts ◼ Pattern Recognition ◼ Spatial Data Analysis ◼ Create thematic maps in GIS by clustering feature spaces ◼ Detect spatial clusters or for other spatial mining tasks ◼ Image Processing ◼ Economic Science (especially market research) ◼ WWW ◼ Document classification ◼ Cluster Weblog data to discover groups of similar access patterns
Quality: What s g。。 d clustering A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters The guality of a clustering method depends on the similarity measure used by the method its implementation, and Its ability to discover some or all of the hidden patterns
Quality: What Is Good Clustering? ◼ A good clustering method will produce high quality clusters ◼ high intra-class similarity: cohesive within clusters ◼ low inter-class similarity: distinctive between clusters ◼ The quality of a clustering method depends on ◼ the similarity measure used by the method ◼ its implementation, and ◼ Its ability to discover some or all of the hidden patterns 8
What is not Cluster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification Graph partitioning Some mutual relevance and synergy, but areas are not identical
9 What is not Cluster Analysis? ◼ Supervised classification ◼ Have class label information ◼ Simple segmentation ◼ Dividing students into different registration groups alphabetically, by last name ◼ Results of a query ◼ Groupings are a result of an external specification ◼ Graph partitioning ◼ Some mutual relevance and synergy, but areas are not identical
Measure the Quality of Clustering Dissimilarity/Similarity metric Similarity is expressed in terms of a distance function typically metric: diD The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical ordinal ratio, and vector variables Weights should be associated with different variables based on applications and data semantics Quality of clustering There is usually a separate " quality 'function that measures the goodness of a cluster. It is hard to define“ similar enough”or"“ good enough” The answer is typically highly subjective
Measure the Quality of Clustering ◼ Dissimilarity/Similarity metric ◼ Similarity is expressed in terms of a distance function, typically metric: d(i, j) ◼ The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables ◼ Weights should be associated with different variables based on applications and data semantics ◼ Quality of clustering: ◼ There is usually a separate “quality” function that measures the “goodness” of a cluster. ◼ It is hard to define “similar enough” or “good enough” ◼ The answer is typically highly subjective 10