Control 1421 Page 9 out of 35 area and Long Island evinces the fallibility of the equidistant seed methow We have presented our scheme for growing emerging distric we should also discuss the heuristic by which we rank candidate moves. It has two components: a population score and a county score. 3.2 Population score Even thought the MSGM is only an rough start for our optimization we would like to minimize egregious disparities in population between districts. We would much prefer if the MSGM produces a result where the largest district has twice the population of the smallest rather than 100 times the population Clearly, the population component of our heuristic should give the highest score to a district when pi= p. Additionally we want to penalize large deviations from the optimal population level so our function should be concave down Admittedly, choosing a heuristic is somewhat arbitrary but this does not bother us sinc the results from MSGM are only a baseline. Let f(pi) be the population heuristic score for a district with population Di. We use a piecewise definition for f: P≤p f(P2)= M-2(Pi-P)2, ifp: >p Notice that f is steeper for values p i >p because we do not want growing districts to engulf too much population nalize deviations above p worse than deviations below p. (We also consider some "nicer"functions, like a Beta distributions, but we opted for a computationally simpler implementation. Figure 3 shows the function f Figure 3: MSGM heuristic for population 10000 65000 District populati
Control # 1421 Page 9 out of 35 area and Long Island evinces the fallibility of the equidistant initial seed method. We have presented our scheme for growing emerging districts, but we should also discuss the heuristic by which we rank candidate moves. It has two components: a population score and a county score. 3.2 Population score Even thought the MSGM is only an rough start for our optimization we would like to minimize egregious disparities in population between districts. We would much prefer if the MSGM produces a result where the largest district has twice the population of the smallest rather than 100 times the population. Clearly, the population component of our heuristic should give the highest score to a district when pi = ¯p. Additionally we want to penalize large deviations from the optimal population level so our function should be concave down. Admittedly, choosing a heuristic is somewhat arbitrary but this does not bother us since the results from MSGM are only a baseline. Let f(pi) be the population heuristic score for a district with population Di . We use a piecewise definition for f: f(pi) = ( M qpi p¯ , ifpi ≤ p¯ M − 4M p 2 i (pi − p¯) 2 , ifpi > p¯ (1) Notice that f is steeper for values pi > p¯ because we do not want growing districts to engulf too much population; we penalize deviations above ¯p worse than deviations below p¯. (We also consider some “nicer” functions, like a Beta distributions, but we opted for a computationally simpler implementation.) Figure 3 shows the function f. Figure 3: MSGM heuristic for population
Control 1421 Page 10 out of 35 3.3 County preservation score For a given district Di, we measure its county preservation score in terms of the percent of counties that it completes on a population basis. To encourage growing districts to add remaining tracts in nearly complete counties the marginal value adding these should increase with the fraction of the population already contained in that district. To accomplish this we use the square of the proportion contained in a county. The county score, g, for a district -TEDinci P(Ti) g(D2)= p(Ci) For instance, if a district completely contains one county and contains 30% of each of two other counties' populations then its score would be(12+32+32)=1.18. Figure 4 shows a plot of the county score a district receives based on what percent of a counties population said district contains Figure 4: MSGM heuristic for county completeness 800 0.8 Percent of county poulation contained 4 The Partition Optimization Model Now that we have constructed a crude, approximate solution to the districting problem by using MSGM, we refine the solution through a process of local search. We define our loca search by our objective function, and our neighborhood function and search space 4.1 The objective function For our optimization function, the only characteristics of each district and each county we will use are the populations p(P)=ipilsisn, the compactness measures c(p)=clsisn nd the fractions p(P)={n,1≤i≤m,1≤r≤e} of the population of county r which is
Control # 1421 Page 10 out of 35 3.3 County preservation score For a given district Di , we measure its county preservation score in terms of the percent of counties that it completes on a population basis. To encourage growing districts to add remaining tracts in nearly complete counties the marginal value adding these should increase with the fraction of the population already contained in that district. To accomplish this we use the square of the proportion contained in a county. The county score, g, for a district Di is: g(Di) = X Cj∈Λ P Tl∈Di∩Cj p(Tl) p(Cj ) !2 (2) For instance, if a district completely contains one county and contains 30% of each of two other counties’ populations then its score would be (12 + .3 2 + .3 2 ) = 1.18. Figure 4 shows a plot of the county score a district receives based on what percent of a counties population said district contains. Figure 4: MSGM heuristic for county completeness 4 The Partition Optimization Model Now that we have constructed a crude, approximate solution to the districting problem by using MSGM, we refine the solution through a process of local search. We define our local search by our objective function, and our neighborhood function and search space. 4.1 The objective function For our optimization function, the only characteristics of each district and each county we will use are the populations p(P) = {pi}1≤i≤n, the compactness measures c(P) = {ci}1≤i≤n, and the fractions ρ(P) = {ρi,r|1 ≤ i ≤ n, 1 ≤ r ≤ c} of the population of county r which is
Control 1421 Page 11 out of 35 our score function s(P)=s(p(P), c(P),p(P)to have the following properties. 32 contained in district i. Based on our analysis of desired properties of districts, we would like 1. the score function should be unimodal as a function of pi, with mode at p i=p 2. The score should increase more by adding tracts which lie in x(Di), so that we prefer having as few districts as possible in a given county The score should increase by more by adding tracts which increase the sum of all compactness measures by the greatest amoun When considering these three constraints, they suggest that we should consider the three vectors P(P),c(P), p(P)independently of each other in the score function, and then compare the scores of each when deciding on how to make tradeoffs between population equality compactness, and county unity. In other words, we would like our score function to be a separable function of these three vectors, i.e. s has the form s(P)=f(p(P)+g(c(P))+h(p(P) where f, g, h are functions 4.1.1 One(wo)man, one vote Based upon the first criterion, we only require a globally concave down function whose maximum is attained at Pi= p for all P:: api lp=p=0, 0p,2<0. The simplest functional form which satisfies this constraint is f(p(P)=-an∑n- where ap is some constant. That is, the score attributable to population differences is actually a constant multiple of the population variance across districts(once all tracts are assigned to a district) The MSGM creates districts with approximate population equality by penalizing extreme variation away from p but equality is generally pretty weak. In one, more or less typical run of MSGM the districts created vary from 600,000 to 700,000, an unacceptable difference for a final districting plan By far, the most important constraint in determining district lines is that the popula tions within each district are very similar. Note that, this criterion is based on the general population within districts not the voting-age population or the population of likely voters Recall that our State has total population P and an average population of p= P/n per district. Letting p; be the population in district i we consider three potential metrics for the population variance between districts 1. Variance: Var(p 2. Maximum deviation: maxllpi-pll
Control # 1421 Page 11 out of 35 contained in district i. Based on our analysis of desired properties of districts, we would like our score function s(P) = s(p(P), c(P), ρ(P)) to have the following properties: 1. the score function should be unimodal as a function of pi , with mode at pi = ¯p; 2. The score should increase more by adding tracts which lie in χ(Di), so that we prefer having as few districts as possible in a given county. 3. The score should increase by more by adding tracts which increase the sum of all compactness measures by the greatest amount. When considering these three constraints, they suggest that we should consider the three vectors p(P), c(P), ρ(P) independently of each other in the score function, and then compare the scores of each when deciding on how to make tradeoffs between population equality, compactness, and county unity. In other words, we would like our score function to be a separable function of these three vectors, i.e. s has the form s(P) = f(p(P)) + g(c(P)) + h(ρ(P)) where f, g, h are functions. 4.1.1 One (wo)man, one vote Based upon the first criterion, we only require a globally concave down function whose maximum is attained at pi = ¯p for all pi : ∂s ∂pi |pi=¯p = 0, ∂2s ∂pi2 < 0. The simplest functional form which satisfies this constraint is: f(p(P)) = −αp Xn i=1 (pi − p¯) 2 where αp is some constant. That is, the score attributable to population differences is actually a constant multiple of the population variance across districts (once all tracts are assigned to a district). The MSGM creates districts with approximate population equality by penalizing extreme variation away from ¯p but equality is generally pretty weak. In one, more or less typical run of MSGM the districts created vary from 600,000 to 700,000, an unacceptable difference for a final districting plan. By far, the most important constraint in determining district lines is that the populations within each district are very similar. Note that, this criterion is based on the general population within districts not the voting-age population or the population of likely voters. Recall that our State has total population P and an average population of ¯p = P/n per district. Letting pi be the population in district i we consider three potential metrics for the population variance between districts. 1. Variance: V ar(p1, p2, . . . , pn) 2. Maximum deviation: max{|pi − p¯|}
Control 1421 Page 12 out of 35 Maximum difference: maxipil-minpil For all of these measures lower values are preferable and the minimum value is 0. We submit that choice number 1, variance, is the superior alternative. To see why, consider two possible population distributions between districts Situation A- one district has a population of 1.05p, one is 95p, and all of the others are Situation B-half of the districts have population 1.05p and half have 95p(any left over odd district has p) In Situation A only two districts are different from the ideal population level, p, but in Situation B very few districts have population p so a good metric should rank B worse than A. Clearly, the variance of populations is higher in B than in A, so variance passes this test The maximum difference test gives 05p for both A and B and the maximum difference gives 1p for both We see that variance is the best measure of similarity since it factors in the pair wise dif- ference in all district populations. We use variance as our measure of populational inequality between districts 4.1 Compactness To measure the compactness of a district we would ideally use our compactness measure Area(Di) Ci Perimeter(Di Such that 9(c(P)=B∑q where B is some constant Unfortunately, try as we might, we were unable to calculate the perimeter of tracts on the aggregate-the C++ library we used to interact with our census data shapefiles exhibited a variety of disturbing characteristics for different methods we used for calculating perimeters including massive memory leaks for large-scale union operations, questionable accuracy for pairwise unions, and seemingly arbitrary calculations of intersection length Yet it is a poor craftsman that blames his tools and so undaunted, we adopted a different measure of compactness. Called the clustering coefficient, it provides a rough approximation for compactness. We define it as e)nen,Tk∈Dk~TH such that 9(c(P)=B∑c(D)
Control # 1421 Page 12 out of 35 3. Maximum difference: max{pi} − min{pi} For all of these measures lower values are preferable and the minimum value is 0. We submit that choice number 1, variance, is the superior alternative. To see why, consider two possible population distributions between districts: • Situation A - one district has a population of 1.05¯p, one is .95¯p, and all of the others are ¯p • Situation B - half of the districts have population 1.05¯p and half have .95¯p (any left over odd district has ¯p) In Situation A only two districts are different from the ideal population level, ¯p, but in Situation B very few districts have population ¯p so a good metric should rank B worse than A. Clearly, the variance of populations is higher in B than in A, so variance passes this test. The maximum difference test gives .05¯p for both A and B and the maximum difference gives .1¯p for both. We see that variance is the best measure of similarity since it factors in the pair wise difference in all district populations. We use variance as our measure of populational inequality between districts. 4.1.2 Compactness To measure the compactness of a district we would ideally use our compactness measure: ci = Area(Di) P erimeter(Di) 2 Such that: g(c(P)) = β Xn i=1 ci where β is some constant. Unfortunately, try as we might, we were unable to calculate the perimeter of tracts on the aggregate - the C++ library we used to interact with our census data shapefiles exhibited a variety of disturbing characteristics for different methods we used for calculating perimeters, including massive memory leaks for large-scale union operations, questionable accuracy for pairwise unions, and seemingly arbitrary calculations of intersection length. Yet it is a poor craftsman that blames his tools and so undaunted, we adopted a different measure of compactness. Called the clustering coefficient, it provides a rough approximation for compactness. We define it as: cc(Di) = P Tl∈Di |{Tk ∈ Di |Tk ∼ Tj}| mi 2 such that: g(c(P)) = β Xn i=1 cc(Di)
Control 1421 Page 13 out of 35 where B is some constant. Our clustering coefficient thus provides a ratio of the total number of inter-district boundaries to the maximum possible number of inter-district boundaries Note that if all tracts were uniformly shaped, this measure would prize square- and circle- shaped districts, while winding, single tract-width districts would be penalized. However given the asymmetry of tract shapes, this measure does little to reflect negatively upon district shapes such as the dumbbell, two circular clusters of tracts connected by a narrow band of tracts. In general however, the clustering coefficient will value adding to districts tracts that are "close"and removing from districts those tracts that are auxiliary 4.1.3 County preservation We adopt the same county preservation measure used in the MSGM, defined in equation 2 with the option of adding a scaling factor to the entire function to refine empirical perfor mance 4.2 Search method and neighborhood function In order to refine our solution from MSGM. we must move tracts between districts. Yet the space of all possible contiguous moves is too large to run effectively. We solve this problem considering a range of possible moves with respect to only one district, its boundary and frontier, and performing the best move on this dramatically reduced state space By selecting our target district at random at each iteration, our strategy is best described as stochastic domain hill climbing. It is a method that combines the best aspects of both random and deterministic local search methods-we perform optimal moves while avoiding getting stuck trying to only increase the score of a single district. After determining that imple first-order moves on the district level, that is, adding or removing individual tracts were incapable of reducing our variance metric to the extremely low standard that was our charge, we expanded our search to include second-order moves, that is, "swaps", a combined move that includes both an add and remove within a single operation If we assume that the maximum connectedness of any tract on the graph is k, checking for all adds and removes separately for district D; involves considering O(kaDi+FiD) O(kmi) possible moves, while looking at all swaps involves considering O(kaDIlFD O(km2) possible moves. One might contend, then, that the operation of checking every district for first-order moves might be a better algorithm, as it would take O(is, kmi) O(nkmi) heuristic evaluations. One could even supplement such an algorithm with a degree of randomness, to avoid being caught in a possible loop of futility, by employing simulated annealing, stochastic hill climbing, or tabu search on the resulting list of possible future states. In practice, however, we found that checking for second-order moves provided far better empirical results with acceptable time performance, while an algorithm enumerating all the possible second-order states, requiring Oi kmi)=O(nkmi) heuristic evaluations was too slow to be effective The true heart of POM is the following algorithm. For simplicity and readability, we let Mada(Di) be the set of all moves in which we add a frontier tract to Di, and Remove (di)to be the set of all moves in which we remove a border tract from Di, and M- the move that is the inverse of M, such that applying both M and M- in turn has no effect. Recall also that our heuristic scores partition P as s(P)
Control # 1421 Page 13 out of 35 where β is some constant. Our clustering coefficient thus provides a ratio of the total number of inter-district boundaries to the maximum possible number of inter-district boundaries. Note that if all tracts were uniformly shaped, this measure would prize square- and circleshaped districts, while winding, single tract-width districts would be penalized. However, given the asymmetry of tract shapes, this measure does little to reflect negatively upon district shapes such as the dumbbell, two circular clusters of tracts connected by a narrow band of tracts. In general however, the clustering coefficient will value adding to districts tracts that are “close” and removing from districts those tracts that are auxiliary. 4.1.3 County preservation We adopt the same county preservation measure used in the MSGM, defined in equation 2 with the option of adding a scaling factor to the entire function to refine empirical performance. 4.2 Search method and neighborhood function In order to refine our solution from MSGM, we must move tracts between districts. Yet the space of all possible contiguous moves is too large to run effectively. We solve this problem considering a range of possible moves with respect to only one district, its boundary and frontier, and performing the best move on this dramatically reduced state space. By selecting our target district at random at each iteration, our strategy is best described as stochastic domain hill climbing. It is a method that combines the best aspects of both random and deterministic local search methods - we perform optimal moves while avoiding getting stuck trying to only increase the score of a single district. After determining that simple first-order moves on the district level, that is, adding or removing individual tracts, were incapable of reducing our variance metric to the extremely low standard that was our charge, we expanded our search to include second-order moves, that is, “swaps”, a combined move that includes both an add and remove within a single operation. If we assume that the maximum connectedness of any tract on the graph is k, checking for all adds and removes separately for district Di involves considering O(k|∂Di | + |Fi |) = O(kmi) possible moves, while looking at all swaps involves considering O(k|∂Di ||Fi |) = O(km2 i ) possible moves. One might contend, then, that the operation of checking every district for first-order moves might be a better algorithm, as it would take O( Pn i=1 kmi) = O(nkmi) heuristic evaluations. One could even supplement such an algorithm with a degree of randomness, to avoid being caught in a possible loop of futility, by employing simulated annealing, stochastic hill climbing, or tabu search on the resulting list of possible future states. In practice, however, we found that checking for second-order moves provided far better empirical results with acceptable time performance, while an algorithm enumerating all the possible second-order states, requiring O( Pn i=1 km2 i ) = O(nkm2 i ) heuristic evaluations, was too slow to be effective. The true heart of POM is the following algorithm. For simplicity and readability, we let Madd(Di) be the set of all moves in which we add a frontier tract to Di , and Mremove(Di) to be the set of all moves in which we remove a border tract from Di , and M−1 the move that is the inverse of M, such that applying both M and M−1 in turn has no effect. Recall also that our heuristic scores partition P as s(P)