《数学建模》美赛优秀论文：2007 A O What to Feed a Gerrymander.pdf_P11-P15

Control 1421 Page 11 out of 35 our score function s(P)=s(p(P), c(P),p(P)to have the following properties. 32 contained in district i. Based on our analysis of desired properties of districts, we would like 1. the score function should be unimodal as a function of pi, with mode at p i=p 2. The score should increase more by adding tracts which lie in x(Di), so that we prefer having as few districts as possible in a given county The score should increase by more by adding tracts which increase the sum of all compactness measures by the greatest amoun When considering these three constraints, they suggest that we should consider the three vectors P(P),c(P), p(P)independently of each other in the score function, and then compare the scores of each when deciding on how to make tradeoffs between population equality compactness, and county unity. In other words, we would like our score function to be a separable function of these three vectors, i.e. s has the form s(P)=f(p(P)+g(c(P))+h(p(P) where f, g, h are functions 4.1.1 One(wo)man, one vote Based upon the first criterion, we only require a globally concave down function whose maximum is attained at Pi= p for all P:: api lp=p=0, 0p,2<0. The simplest functional form which satisfies this constraint is f(p(P)=-an∑n- where ap is some constant. That is, the score attributable to population differences is actually a constant multiple of the population variance across districts(once all tracts are assigned to a district) The MSGM creates districts with approximate population equality by penalizing extreme variation away from p but equality is generally pretty weak. In one, more or less typical run of MSGM the districts created vary from 600,000 to 700,000, an unacceptable difference for a final districting plan By far, the most important constraint in determining district lines is that the popula tions within each district are very similar. Note that, this criterion is based on the general population within districts not the voting-age population or the population of likely voters Recall that our State has total population P and an average population of p= P/n per district. Letting p; be the population in district i we consider three potential metrics for the population variance between districts 1. Variance: Var(p 2. Maximum deviation: maxllpi-pll

Control # 1421 Page 11 out of 35 contained in district i. Based on our analysis of desired properties of districts, we would like our score function s(P) = s(p(P), c(P), ρ(P)) to have the following properties: 1. the score function should be unimodal as a function of pi , with mode at pi = ¯p; 2. The score should increase more by adding tracts which lie in χ(Di), so that we prefer having as few districts as possible in a given county. 3. The score should increase by more by adding tracts which increase the sum of all compactness measures by the greatest amount. When considering these three constraints, they suggest that we should consider the three vectors p(P), c(P), ρ(P) independently of each other in the score function, and then compare the scores of each when deciding on how to make tradeoffs between population equality, compactness, and county unity. In other words, we would like our score function to be a separable function of these three vectors, i.e. s has the form s(P) = f(p(P)) + g(c(P)) + h(ρ(P)) where f, g, h are functions. 4.1.1 One (wo)man, one vote Based upon the first criterion, we only require a globally concave down function whose maximum is attained at pi = ¯p for all pi : ∂s ∂pi |pi=¯p = 0, ∂2s ∂pi2 < 0. The simplest functional form which satisfies this constraint is: f(p(P)) = −αp Xn i=1 (pi − p¯) 2 where αp is some constant. That is, the score attributable to population differences is actually a constant multiple of the population variance across districts (once all tracts are assigned to a district). The MSGM creates districts with approximate population equality by penalizing extreme variation away from ¯p but equality is generally pretty weak. In one, more or less typical run of MSGM the districts created vary from 600,000 to 700,000, an unacceptable difference for a final districting plan. By far, the most important constraint in determining district lines is that the populations within each district are very similar. Note that, this criterion is based on the general population within districts not the voting-age population or the population of likely voters. Recall that our State has total population P and an average population of ¯p = P/n per district. Letting pi be the population in district i we consider three potential metrics for the population variance between districts. 1. Variance: V ar(p1, p2, . . . , pn) 2. Maximum deviation: max{|pi − p¯|}

Control # 1421 Page 12 out of 35 3. Maximum difference: max{pi} − min{pi} For all of these measures lower values are preferable and the minimum value is 0. We submit that choice number 1, variance, is the superior alternative. To see why, consider two possible population distributions between districts: • Situation A - one district has a population of 1.05¯p, one is .95¯p, and all of the others are ¯p • Situation B - half of the districts have population 1.05¯p and half have .95¯p (any left over odd district has ¯p) In Situation A only two districts are different from the ideal population level, ¯p, but in Situation B very few districts have population ¯p so a good metric should rank B worse than A. Clearly, the variance of populations is higher in B than in A, so variance passes this test. The maximum difference test gives .05¯p for both A and B and the maximum difference gives .1¯p for both. We see that variance is the best measure of similarity since it factors in the pair wise difference in all district populations. We use variance as our measure of populational inequality between districts. 4.1.2 Compactness To measure the compactness of a district we would ideally use our compactness measure: ci = Area(Di) P erimeter(Di) 2 Such that: g(c(P)) = β Xn i=1 ci where β is some constant. Unfortunately, try as we might, we were unable to calculate the perimeter of tracts on the aggregate - the C++ library we used to interact with our census data shapefiles exhibited a variety of disturbing characteristics for different methods we used for calculating perimeters, including massive memory leaks for large-scale union operations, questionable accuracy for pairwise unions, and seemingly arbitrary calculations of intersection length. Yet it is a poor craftsman that blames his tools and so undaunted, we adopted a different measure of compactness. Called the clustering coefficient, it provides a rough approximation for compactness. We define it as: cc(Di) = P Tl∈Di |{Tk ∈ Di |Tk ∼ Tj}| mi 2 such that: g(c(P)) = β Xn i=1 cc(Di)

Control 1421 Page 13 out of 35 where B is some constant. Our clustering coefficient thus provides a ratio of the total number of inter-district boundaries to the maximum possible number of inter-district boundaries Note that if all tracts were uniformly shaped, this measure would prize square- and circle- shaped districts, while winding, single tract-width districts would be penalized. However given the asymmetry of tract shapes, this measure does little to reflect negatively upon district shapes such as the dumbbell, two circular clusters of tracts connected by a narrow band of tracts. In general however, the clustering coefficient will value adding to districts tracts that are "close"and removing from districts those tracts that are auxiliary 4.1.3 County preservation We adopt the same county preservation measure used in the MSGM, defined in equation 2 with the option of adding a scaling factor to the entire function to refine empirical perfor mance 4.2 Search method and neighborhood function In order to refine our solution from MSGM. we must move tracts between districts. Yet the space of all possible contiguous moves is too large to run effectively. We solve this problem considering a range of possible moves with respect to only one district, its boundary and frontier, and performing the best move on this dramatically reduced state space By selecting our target district at random at each iteration, our strategy is best described as stochastic domain hill climbing. It is a method that combines the best aspects of both random and deterministic local search methods-we perform optimal moves while avoiding getting stuck trying to only increase the score of a single district. After determining that imple first-order moves on the district level, that is, adding or removing individual tracts were incapable of reducing our variance metric to the extremely low standard that was our charge, we expanded our search to include second-order moves, that is, "swaps", a combined move that includes both an add and remove within a single operation If we assume that the maximum connectedness of any tract on the graph is k, checking for all adds and removes separately for district D; involves considering O(kaDi+FiD) O(kmi) possible moves, while looking at all swaps involves considering O(kaDIlFD O(km2) possible moves. One might contend, then, that the operation of checking every district for first-order moves might be a better algorithm, as it would take O(is, kmi) O(nkmi) heuristic evaluations. One could even supplement such an algorithm with a degree of randomness, to avoid being caught in a possible loop of futility, by employing simulated annealing, stochastic hill climbing, or tabu search on the resulting list of possible future states. In practice, however, we found that checking for second-order moves provided far better empirical results with acceptable time performance, while an algorithm enumerating all the possible second-order states, requiring Oi kmi)=O(nkmi) heuristic evaluations was too slow to be effective The true heart of POM is the following algorithm. For simplicity and readability, we let Mada(Di) be the set of all moves in which we add a frontier tract to Di, and Remove (di)to be the set of all moves in which we remove a border tract from Di, and M- the move that is the inverse of M, such that applying both M and M- in turn has no effect. Recall also that our heuristic scores partition P as s(P)

Control # 1421 Page 13 out of 35 where β is some constant. Our clustering coefficient thus provides a ratio of the total number of inter-district boundaries to the maximum possible number of inter-district boundaries. Note that if all tracts were uniformly shaped, this measure would prize square- and circleshaped districts, while winding, single tract-width districts would be penalized. However, given the asymmetry of tract shapes, this measure does little to reflect negatively upon district shapes such as the dumbbell, two circular clusters of tracts connected by a narrow band of tracts. In general however, the clustering coefficient will value adding to districts tracts that are “close” and removing from districts those tracts that are auxiliary. 4.1.3 County preservation We adopt the same county preservation measure used in the MSGM, defined in equation 2 with the option of adding a scaling factor to the entire function to refine empirical performance. 4.2 Search method and neighborhood function In order to refine our solution from MSGM, we must move tracts between districts. Yet the space of all possible contiguous moves is too large to run effectively. We solve this problem considering a range of possible moves with respect to only one district, its boundary and frontier, and performing the best move on this dramatically reduced state space. By selecting our target district at random at each iteration, our strategy is best described as stochastic domain hill climbing. It is a method that combines the best aspects of both random and deterministic local search methods - we perform optimal moves while avoiding getting stuck trying to only increase the score of a single district. After determining that simple first-order moves on the district level, that is, adding or removing individual tracts, were incapable of reducing our variance metric to the extremely low standard that was our charge, we expanded our search to include second-order moves, that is, “swaps”, a combined move that includes both an add and remove within a single operation. If we assume that the maximum connectedness of any tract on the graph is k, checking for all adds and removes separately for district Di involves considering O(k|∂Di | + |Fi |) = O(kmi) possible moves, while looking at all swaps involves considering O(k|∂Di ||Fi |) = O(km2 i ) possible moves. One might contend, then, that the operation of checking every district for first-order moves might be a better algorithm, as it would take O( Pn i=1 kmi) = O(nkmi) heuristic evaluations. One could even supplement such an algorithm with a degree of randomness, to avoid being caught in a possible loop of futility, by employing simulated annealing, stochastic hill climbing, or tabu search on the resulting list of possible future states. In practice, however, we found that checking for second-order moves provided far better empirical results with acceptable time performance, while an algorithm enumerating all the possible second-order states, requiring O( Pn i=1 km2 i ) = O(nkm2 i ) heuristic evaluations, was too slow to be effective. The true heart of POM is the following algorithm. For simplicity and readability, we let Madd(Di) be the set of all moves in which we add a frontier tract to Di , and Mremove(Di) to be the set of all moves in which we remove a border tract from Di , and M−1 the move that is the inverse of M, such that applying both M and M−1 in turn has no effect. Recall also that our heuristic scores partition P as s(P)