当前位置：和泉文库 > 计算机 > 浏览文档

香港科技大学：Clustering（PPT讲稿）

文件格式：PPT，文件大小：1.45MB，售价：16.74元

文档详细内容（约61页）

Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Yang @cs. ust. hk Thanks: J W. Han, I witten e frank

1 Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: J.W. Han, I. Witten, E. Frank

Essentials erminology Objects rows= records Variables attributes= features a good clustering method high on intra-class similarity and low on inter-class similarity What is similarity? Based on computation of distance Between two numerical attributes Between two nominal attributes Mixed attributes

2 Essentials ◼ Terminology: ◼ Objects = rows = records ◼ Variables = attributes = features ◼ A good clustering method ◼ high on intra-class similarity and low on inter-class similarity ◼ What is similarity? ◼ Based on computation of distance ◼ Between two numerical attributes ◼ Between two nominal attributes ◼ Mixed attributes

The database XX Object i p

3 The database                 n n p i i p p x x x x x x 1 1 1 1 1 ... ... ... ... ... ... ... ... Object i

Numerical attributes Distances are normally used to measure the similarity or dissimilarity between two data objects Euclidean distance d(1)=,(x2-x,}+x2-x,P+,+|x,-x2) 12J2 Jp Where/=(Ⅻ…,)andj=(场灬加)le two p-dimensional records, Manhattan distance d(i)=x.-x,+|x.-x,|+.+1x2-x

4 Numerical Attributes ◼ Distances are normally used to measure the similarity or dissimilarity between two data objects ◼ Euclideandistance: where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-dimensional records, ◼ Manhattan distance | | ... | | ) 2 ( , ) (| | 2 2 1 1 2 2 p jp x i x j x i x j x i d i j = x − + − + + − ( , ) | | | | ... | | 1 1 2 2 p jp x i x j x i x j x i d i j = x − + − + + −

Binary variables([0, 1], or [true, false]) contingency table for binary data Row 10 sum 6 a+b Row i d c+d sum a+c b+d Simple matching coefficient btc +6+c+d Invariant of coding of binary variable: if you assign 1 to pass"and 0 to fail or the other way around, you'll get the same distance value 5

5 Binary Variables ({0, 1}, or {true, false}) ◼ A contingency table for binary data ◼ Simple matching coefficient ◼ Invariant of coding of binary variable: if you assign 1 to “pass” and 0 to “fail”, or the other way around, you’ll get the same distance value. a b c d b c d i j + + + ( , )= + sum a c b d p c d c d a b a b sum + + + + 0 1 1 0 Row i Row j

点击进入文档下载页（PPT格式）

共61页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

电子科技大学：《计算机操作系统》课程教学资源（PPT课件讲稿）第三章处理机的调度和死锁
《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源（PPT课件讲稿）Chapter 11 Bundle adjustment Structure reconstruction SFM from N-frames
同济大学：《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源（PPT课件讲稿）关联规则 Association Rule
《程序设计基础》课程教学资源：实验教学大纲
白城师范学院：《数据库系统概论 An Introduction to Database System》课程教学资源（PPT课件讲稿）第二章关系数据库（2.4 关系代数 2.5 关系演算 2.6 小结）
安徽工贸职业技术学院：《计算机组装与维护》课程教学资源（PPT课件讲稿）项目五微型计算机维护
曙光：并行程序设计简介（PPT讲座）
《单片机原理与应用》课程教学资源（PPT课件讲稿）第7章显示与开关/键盘输入及微型打印机接口设计
数据结构与算法（PPT课件讲稿）Data Structures and Algorithms
四川大学：《计算机操作系统 Operating System Principles》课程教学资源（PPT课件讲稿）第5章死锁
四川大学：《Java面向对象编程》课程PPT教学课件（Object-Oriented Programming - Java）Unit 1.1 Java Applications 1.1.1 Applications in Java（熊运余）
厦门大学：《大数据技术原理与应用》课程教学资源（PPT课件讲稿，2016）第8章流计算
上海交通大学：TLS/SSL Security（PPT课件讲稿）
山东大学计算机学院：《人机交互技术》课程教学资源（PPT课件讲稿）第7章 Web界面设计
山东大学：《微机原理及单片机接口技术》课程教学资源（PPT课件讲稿）第三章 IAP15W4K58S4单片机的硬件结构
南京大学：《面向对象技术 OOT》课程教学资源（PPT课件讲稿）面向方面的编程 Aspect Oriented Programming
武昌首义学院：Word的基本操作与技巧（PPT讲稿，主讲：张旋子）
《VB程序设计》课程教学资源（PPT课件讲稿）第八章过程
湖南生物机电职业技术学院：《电子商务概论》课程教学资源（PPT课件）第五章网络信息搜索
《电子商务》课程教学资源（PPT课件讲稿）第十章网络营销
广西外国语学院：《计算机网络》课程教学资源（PPT课件讲稿）第7章传输层协议——TCP与UDP
九州大学（日本国立综合大学）：烟花算法爆炸因子分析及改良（艺术工学府：余俊）
图像视频编码与表达的理论与方法（PPT讲稿）图像压缩标准JPEG
中国科学技术大学：《计算机视觉》课程教学资源（PPT课件讲稿）第九章单幅图像深度重建 Depthmap Reconstruction Based on Monocular cues

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录