Control No. 1036 6of18 Luke onward Lockport igure 2: Current districts 8, 28, 26, and 21, from left to right and top to bottom, with compactness quotients of 0.097, 0.101, 0.406, and 0.498, respectively Unfair districting stems from either human biases or poorly de- igned algorithms Our computer simulations do not use any of this extraneous data. The only data that we have used is population density and the boundary of the state Therefore, the determination of districts is completely unbiased. While it may hold a district may be unfair on a local scale, in that it divides up a community with a common interest-for instance, a community of apple-growers may be split between two districts -on the national scale, such imbalances will even out. Because of this, there will be no pathological examples of disproportionate representation. 2 Applying the Theory of Data Clustering The theory of data clustering is the theory of classifying n observations(or objects) into m groups -for instance, placing two carts'full of groceries into into paper sacks. There are two main benefits of applying a cluster-theoretic
Control No. 1036 6 of 18 Figure 2: Current districts 8, 28, 26, and 21, from left to right and top to bottom, with compactness quotients of 0.097, 0.101, 0.406, and 0.498, respectively. • Unfair districting stems from either human biases or poorly designed algorithms. Our computer simulations do not use any of this extraneous data. The only data that we have used is population density and the boundary of the state. Therefore, the determination of districts is completely unbiased. While it may hold a district may be unfair on a local scale, in that it divides up a community with a common interest — for instance, a community of apple-growers may be split between two districts — on the national scale, such imbalances will even out. Because of this, there will be no pathological examples of disproportionate representation. 2 Applying the Theory of Data Clustering The theory of data clustering is the theory of classifying n observations (or objects) into m groups — for instance, placing two carts’ full of groceries into into paper sacks. There are two main benefits of applying a cluster-theoretic
Control No. 1036 7of18 algorithm to a given data set Data clustering often reveals an internal structure that may not have been initially apparent It is often much easier to work with a small number of clusters than with a large number of raw data The philosophy of data clustering is that we should be able to divide our data into a(not necessarily fixed)number of clusters, and that the elements of a given clusters should be somehow similar. In general, data clustering is applied to problems that deal with a large number of variables. For instance, when data clustering is used to create an animal taxonomy, there are a myriad of variables mode of reproduction, mode of transportation, presence and type of spine ideal diet, preferred habitat, and so forth [And73! Because of this, it is usuall very difficult to determine the "proper"way to cluster data [AC84 In the case of attempting to draw up simple and fair congressional districts we can apply data clustering in the following w Split the state into small, discrete units. Our units correspond to geographic locations of census population measurements fIESIN Determine some partition of these units, such that the subsets of this partition can be viewed as clusters. Note that the only variables resent are the location and population of each unit After defining a method for ordering the preference of cluster partitions possible cluster partitions and choose the best one! However, this turns out to be not feasible. In [AS68, Abramowitz and Stegun give a proof of the fact that the number of ways of sorting n observations into m groups is a Stirling number of the second kind: 1 For instance, there are more than 10- ways to sort 25 objects into 5 groups. It clear that we need some sort of algorithmic process in order to determine an appropriate partition of clusters 3 The K-means algorithm 3.1 Standard Algorithm The K-means algorithm is an iterative method for data clustering. Let D [ C r be the data to be clustered, and let S=(siis be a set of seeds Suppose we desire D to be partitioned into K clusters; let the i-th cluster be
Control No. 1036 7 of 18 algorithm to a given data set: • Data clustering often reveals an internal structure that may not have been initially apparent. • It is often much easier to work with a small number of clusters than with a large number of raw data. The philosophy of data clustering is that we should be able to divide our data into a (not necessarily fixed) number of clusters, and that the elements of a given clusters should be somehow similar. In general, data clustering is applied to problems that deal with a large number of variables. For instance, when data clustering is used to create an animal taxonomy, there are a myriad of variables — mode of reproduction, mode of transportation, presence and type of spine, ideal diet, preferred habitat, and so forth [And73]! Because of this, it is usually very difficult to determine the “proper” way to cluster data [AC84]. In the case of attempting to draw up simple and fair congressional districts, we can apply data clustering in the following way: • Split the state into small, discrete units. Our units correspond to geographic locations of census population measurements [fIESIN]. • Determine some partition of these units, such that the subsets of this partition can be viewed as clusters. Note that the only variables present are the location and population of each unit. After defining a method for ordering the preference of cluster partitions, we may suppose we are done with the problem: all that is left is to look at all possible cluster partitions and choose the best one! However, this turns out to be not feasible. In [AS68], Abramowitz and Stegun give a proof of the fact that the number of ways of sorting n observations into m groups is a Stirling number of the second kind: S (n) m = 1 m! Xm k=0 (−1)m−k m k k n . For instance, there are more than 1015 ways to sort 25 objects into 5 groups. It is clear that we need some sort of algorithmic process in order to determine an appropriate partition of clusters. 3 The K-means Algorithm 3.1 Standard Algorithm The K-means algorithm is an iterative method for data clustering. Let D = {xj} N j=1 ⊂ R n be the data to be clustered, and let S = {sj} K j=1 be a set of seeds. Suppose we desire D to be partitioned into K clusters; let the i-th cluster be