当前位置：和泉文库 > 计算机 > 浏览文档

同济大学：《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源（PPT课件讲稿）K-means & EM

文件格式：PPTX，文件大小：2.51MB，售价：8.31元

文档详细内容（约34页）

Termination conditions a Several possibilities, e. g .Afixed number of iterations e data partition unchanged o Centroid positions dont change Does this mean that the data in a cluster are unchanged?

Termination conditions ◼ Several possibilities, e.g., ◆ A fixed number of iterations. ◆ data partition unchanged. ◆ Centroid positions don’t change. Does this mean that the data in a cluster are unchanged?

Convergence Why should the K-means algorithm ever reach a fixed point? A state in which clusters dont change K-means is a special case of a general procedure known as the Expectation Maximization(EM algorithm o EM is known to converge o Number of iterations could be large But in practice usually isnt

Convergence ◼ Why should the K-means algorithm ever reach a fixed point? ◆ A state in which clusters don’t change. ◼ K-means is a special case of a general procedure known as the Expectation Maximization (EM) algorithm. ◆ EM is known to converge. ◆ Number of iterations could be large. ➢ But in practice usually isn’t

Time Complexity a Computing distance between two data points IS O(D)Where d is the dimensionality of the vectors Reassigning clusters: O(KN) distance computations, or O(KND Computing centroids: Each point gets added once to some centroid: O(ND) a assume these two steps are each done once for /iterations: O(KND)

Time Complexity ◼ Computing distance between two data points is O(D) where D is the dimensionality of the vectors. ◼ Reassigning clusters: O(KN) distance computations, or O(KND). ◼ Computing centroids: Each point gets added once to some centroid: O(ND). ◼ Assume these two steps are each done once for I iterations: O(IKND)

Strengths of K-means clustering Relatively scalable in processing large data sets a Relatively efficient: O(tkn), where n is #f objects, k is clusters, and t is iterations. Normally, k, t<< n a Often terminates at a local optimum; the global optimum may be found using techniques such as genetic algorithms

Strengths of K-means clustering ◼ Relatively scalable in processing large data sets ◼ Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. ◼ Often terminates at a local optimum; the global optimum may be found using techniques such as genetic algorithms

Weaknesses of K-means clustering a Applicable only when the mean of objects is defined Need to specify k, the number of clusters in advance a Unable to handle noisy data and outliers a Not suitable to discover clusters with non-convex shapes, or clusters of very different size

Weaknesses of K-means clustering ◼ Applicable only when the mean of objects is defined ◼ Need to specify k, the number of clusters, in advance ◼ Unable to handle noisy data and outliers ◼ Not suitable to discover clusters with non-convex shapes, or clusters of very different size

点击进入文档下载页（PPTX格式）

共34页，可试读12页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国医科大学计算机中心：《虚拟现实与增强现实技术概论》课程教学资源（PPT课件讲稿）第3章虚拟现实系统的输出设备
香港中文大学：XML for Interoperable Digital Video Library
上海交通大学：《计算机图形学 Computer Graphics》课程教学资源（PPT讲稿）CHAPTER 4 THE VISUALIZATION PIPELINE
《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 09 Evaluation
长春工业大学：《网页设计与制作》课程教学资源（PPT课件）第5章 Div+CSS布局技术
合肥工业大学：《计算机网络技术》课程教学资源（PPT课件讲稿）第4章交换网的运行
山东大学软件学院：非线性规划（PPT讲稿）一维搜索方法
《并发控制技术》课程教学资源（PPT课件讲稿）第7章事务管理 transaction management
北京师范大学现代远程教育：《计算机应用基础》课程教学资源（PPT课件讲稿）第1章计算机常识（主讲：马秀麟）
南京大学：《面向对象技术 OOT》课程教学资源（PPT课件讲稿）面向对象的分析与设计简介 OOA & OOD：An introduction
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）向量体系结构
中国科学技术大学：《现代密码学理论与实践》课程教学资源（PPT课件讲稿）第二部分公钥密码和散列函数第8章数论入门（苗付友）
北京大学：文本挖掘技术（PPT讲稿）文本分类 Text Categorization
《网页设计与制作》课程教学资源（PPT课件讲稿）第一章 HTML基础
清华大学：《计算机导论》课程电子教案（PPT教学课件）第1章计算机发展简史
《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 06 Index Compression
嵌入式交叉开发环境的建立（PPT实验讲稿）
西安交通大学：《微型计算机接口技术》课程教学资源（PPT课件讲稿）第五章输入/输出控制接口
《TCP/IP协议及其应用》课程教学资源（PPT课件讲稿）第3章 IP寻址与地址解析
中国医科大学：《计算机网络实用教程》课程教学资源（PPT讲稿）高速局域网技术、交换式局域网技术、虚拟局域网技术、主要的城域网技术
《大学计算机基础》课程教学资源：作业习题
《计算机网络》课程教学资源（PPT课件讲稿）第一章计算机网络概述
山西国际商务职业学院：《数据库应用程序设计》课程教学资源（PPT课件）第三章数据与数据运算
《C语言程序设计》课程电子教案（PPT课件讲稿）Chapter 02 用C语言编写程序

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录