当前位置：和泉文库 > 数学 > 浏览文档

《数学建模》美赛优秀论文：2007 A O A Cluster-Theoretic Approach to Political Districting

文件格式：PDF，文件大小：1.18MB，售价：5.61元

文档详细内容（约19页）

Control No. 1036 7of18 algorithm to a given data set Data clustering often reveals an internal structure that may not have been initially apparent It is often much easier to work with a small number of clusters than with a large number of raw data The philosophy of data clustering is that we should be able to divide our data into a(not necessarily fixed)number of clusters, and that the elements of a given clusters should be somehow similar. In general, data clustering is applied to problems that deal with a large number of variables. For instance, when data clustering is used to create an animal taxonomy, there are a myriad of variables mode of reproduction, mode of transportation, presence and type of spine ideal diet, preferred habitat, and so forth [And73! Because of this, it is usuall very difficult to determine the "proper"way to cluster data [AC84 In the case of attempting to draw up simple and fair congressional districts we can apply data clustering in the following w Split the state into small, discrete units. Our units correspond to geographic locations of census population measurements fIESIN Determine some partition of these units, such that the subsets of this partition can be viewed as clusters. Note that the only variables resent are the location and population of each unit After defining a method for ordering the preference of cluster partitions possible cluster partitions and choose the best one! However, this turns out to be not feasible. In [AS68, Abramowitz and Stegun give a proof of the fact that the number of ways of sorting n observations into m groups is a Stirling number of the second kind: 1 For instance, there are more than 10- ways to sort 25 objects into 5 groups. It clear that we need some sort of algorithmic process in order to determine an appropriate partition of clusters 3 The K-means algorithm 3.1 Standard Algorithm The K-means algorithm is an iterative method for data clustering. Let D [ C r be the data to be clustered, and let S=(siis be a set of seeds Suppose we desire D to be partitioned into K clusters; let the i-th cluster be

Control No. 1036 7 of 18 algorithm to a given data set: • Data clustering often reveals an internal structure that may not have been initially apparent. • It is often much easier to work with a small number of clusters than with a large number of raw data. The philosophy of data clustering is that we should be able to divide our data into a (not necessarily fixed) number of clusters, and that the elements of a given clusters should be somehow similar. In general, data clustering is applied to problems that deal with a large number of variables. For instance, when data clustering is used to create an animal taxonomy, there are a myriad of variables — mode of reproduction, mode of transportation, presence and type of spine, ideal diet, preferred habitat, and so forth [And73]! Because of this, it is usually very difficult to determine the “proper” way to cluster data [AC84]. In the case of attempting to draw up simple and fair congressional districts, we can apply data clustering in the following way: • Split the state into small, discrete units. Our units correspond to geographic locations of census population measurements [fIESIN]. • Determine some partition of these units, such that the subsets of this partition can be viewed as clusters. Note that the only variables present are the location and population of each unit. After defining a method for ordering the preference of cluster partitions, we may suppose we are done with the problem: all that is left is to look at all possible cluster partitions and choose the best one! However, this turns out to be not feasible. In [AS68], Abramowitz and Stegun give a proof of the fact that the number of ways of sorting n observations into m groups is a Stirling number of the second kind: S (n) m = 1 m! Xm k=0 (−1)m−k m k k n . For instance, there are more than 1015 ways to sort 25 objects into 5 groups. It is clear that we need some sort of algorithmic process in order to determine an appropriate partition of clusters. 3 The K-means Algorithm 3.1 Standard Algorithm The K-means algorithm is an iterative method for data clustering. Let D = {xj} N j=1 ⊂ R n be the data to be clustered, and let S = {sj} K j=1 be a set of seeds. Suppose we desire D to be partitioned into K clusters; let the i-th cluster be

点击进入文档下载页（PDF格式）

共19页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录