Algorithms for Clustering Data Anil K.Jain Richard C.Dubes Michigan State University Prentice Hall Englewood Cliffs,New Jersey 07632
Contents PREFACE Xiⅷ 1 INTRODUCTION 1 2 DATA REPRESENTATION 7 2.1 Data Types and Data Scales 8 2.2 Proximity Indices 14 2.3 Normalization 23 2.4 Linear Projections 25 2.5 Nonlinear Projections 37 2.6 Intrinsic Dimensionality 42 2.7 Multidimensional Scaling 46 2.8 Summary 54 3 CLUSTERING METHODS AND ALGORITHMS 55 3.1 General Introduction 55 3.2 Hierarchical Clustering 58 i议
Contents 3.3 Partitional Clustering 89 3.4 Clustering Software 133 3.5 Clustering Methodology 135 3.6 Summary 141 4 CLUSTER VALIDITY 143 4.1 Background 143 4.2 Indices of Cluster Validity 160 4.3 Validity of Hierarchical Structures 165 4.4 Validity of Partitional Structures 172 4.5 Validity of Individual Clusters 188 4.6 Clustering Tendency 201 4.7 Summary 222 5 APPLICATIONS 223 5.1 Image Processing 224 5.2 Image Segmentation by Clustering 225 5.3 Segmentation of Textured Images 227 5.4 Segmentation of Range Images 232 5.5 Segmentation of Multispectral Images 235 5.6 Image Registration 237 5.7 Summary 240 A PATTERN RECOGNITION 241 B DISTRIBUTIONS 246 B.1 The Gaussian Distribution 246 B.2 The Hypergeometric Distribution 249 LINEAR ALGEBRA 252 D SCATTER MATRICES 258 E FACTOR ANALYSIS 260
Contents xi F MULTIVARIATE ANALYSIS OF VARIANCE 264 G GRAPH THEORY 266 G.1 Definitions 266 G.2 Trees 270 G.3 Random Graphs 272 H ALGORITHM FOR GENERATING CLUSTERED DATA 273 BIBLIOGRAPHY 275 AUTHOR INDEX 297 GENERAL INDEX 304
Preface Cluster analysis is an important technique in the rapidly growing field known as exploratory data analysis and is being applied in a variety of engineering and scientific disciplines such as biology,psychology,medicine,marketing,computer vision,and remote sensing.Cluster analysis organizes data by abstracting underly- ing structure either as a grouping of individuals or as a hierarchy of groups.The representation can then be investigated to see if the data group according to precon- ceived ideas or to suggest new experiments.Cluster analysis is a tool for exploring the structure of the data that does not require the assumptions common to most statistical methods.It is called"'unsupervised learning''in the literature of pattern recognition and artificial intelligence. This book will be useful for those in the scientific community who gather data and seek tools for analyzing and interpreting data.It will be a valuable reference for scientists in a variety of disciplines and can serve as a textbook for a graduate course in exploratory data analysis as well as a supplemental text in courses on research methodology,pattern recognition,image processing,and re- mote sensing.The book emphasizes informal algorithms for clustering data,and interpreting results.Graphical procedures and other tools for visually representing data are introduced both to evaluate the results of clustering and to explore data. Mathematical and statistical theory are introduced only when necessary. Most existing books on cluster analysis are written by mathematicians,numer- ical taxonomists,social scientists,and psychologists who emphasize either the methods that lend themselves to mathematical treatment or the applications in their particular area.Our book strives for a sense of completeness and for a balanced xiii