2 Dissimilarity Degree Calculation Many clustering algorithms are based on dissimilarity degree matrix,if the data is expressed in the form of a data matrix,it will often be transformed into the dissimilarity matrix. The specific calculation of dissimilarity degree D ( Dwill change because of the data types,common data types include: ●Inter scale variable ●Two variable Nominal type,ordinal type and proportional scale variable ●Mixed type variable ATA 16 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 16 2 Dissimilarity Degree Calculation Many clustering algorithms are based on dissimilarity degree matrix, if the data is expressed in the form of a data matrix, it will often be transformed into the dissimilarity matrix. The specific calculation of dissimilarity degree D (I, J)will change because of the data types, common data types include: Inter scale variable Two variable Nominal type, ordinal type and proportional scale variable Mixed type variable
3 Interval Scaling Variable(1) Interval scaling metric is a continuous measure of a rough linear scale,such as weight,height,etc. Selected measurement unit affects the result of o clustering analysis directly,so there need to be a standardization of measurement value realized convert the value of the original value to an entity free value,give a variable f metric,it can be transformed as following: s=(mmxmD 17 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 17 3 Interval Scaling Variable(1) Interval scaling metric is a continuous measure of a rough linear scale, such as weight, height, etc. Selected measurement unit affects the result of clustering analysis directly,so there need to be a standardization of measurement value realized , convert the value of the original value to an entity free value,give a variable f metric, it can be transformed as following: 1(| | | | ... | |) f 1f f 2 f f nf mf x m x m x n s = − + − + + −
3 Interval Scaling Variable(2) Calculate the absolute deviation: m=7y+x2f++Xf Among them: -m 2f= Calculate the measurement value of the standardization (z-score) It will be more robust to use average absolute deviation than standard deviation. ATA 18 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 18 3 Interval Scaling Variable(2) Calculate the absolute deviation: Among them: Calculate the measurement value of the standardization (z-score) It will be more robust to use average absolute deviation than standard deviation. ... ). 1 2 1 f f f nf (x x x n m = + + + f if f if s x m z − =
4 Similarity and Dissimilarity Between Objects(1) The similarity and dissimilarity between objects are calculated based on the distance between two objects. 欧氏距离(Euclidean distance) 9=2 曼哈顿距离(Manhattan distance) d(x,x)=∑kk-X q=1 g-1 闵可夫斯基距离 (Minkowski distance) dx,x=2x-x四 19 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 19 4 Similarity and Dissimilarity Between Objects(1) The similarity and dissimilarity between objects are calculated based on the distance between two objects
4 Similarity and Dissimilarity Between Objects(2) .Euclidean distance ●i=(cwx2 xi)and j.are two p dimensional data object 》a*n。为+.+wbf) .Manhattan distance d(i xjipxjp 20 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 20 4 Similarity and Dissimilarity Between Objects(2) Euclidean distance i=(xi1,xi2,…,xip)and j=(xj1,xj2,…,xjp)are two p dimensional data object Manhattan distance ( , ) (| | | | ... | | ) 2 2 2 2 2 1 1 p jp x i x j x i x j x i d i j = x − + − + + − ( , ) | | | | ... | | 1 1 2 2 p jp x i x j x i x j x i d i j = x − + − + + −