Control 1421 Page 4 out of 35 2.2 Geography and similar characteristics The US Census Bureau provides a great deal of data on legal, natural, and man-made boundaries as well as socio-economic similarity of regions. In each census, the United States is broken up into several degrees of accuracy, the smallest of which are: blocks(40 people on average), block groups(1500 people), and tracts(4500 people) We follow the practice in Young(1988) by districting based on a maximum level of resolution which in our Case Study( Section 5) is census tracts. Notational note: we refer to the smallest unit of division generally as a tract A reference from the Caliper Corporation describes tracts in the following quotation Census tract boundaries normally follow visible features, but may follow go ernmental unit boundaries and other non-visible features, and they always nest within counties. Census tracts are designed to be relatively homogenous units at the time the users established thenr? s, economic status, and living conditions For these reasons we believe that units at the tracts size(or less)are acceptably small and nous to use as a base unit. Further, tracts are completely contained within counties so we can easily check whether or not a district breaks county integrity 2. 3 Notation Define m to be the number of census tracts. and n the number of districts We denote our districts by D,1≤j≤m, and our tracts by T,1≤l≤m. Denote the set of all tracts by I=TihI<I<m; we call this a State. Denote the set of all districts at a particular time by A=(DiJ1sisn. We call this a partition for the State 2.3.1 Adjacency Define the symmetric relation Tp N Tg for tract pairs(Tp, Tq) which are adjacent. Define the function d(Ti) to be the district to which the tract Ti belongs. We also naturally extend the definition of d to sets of tracts Define the neighbor set of tract Ti by ar(T1)=TpErIT Tp) to be the set of all census acts neighboring Ti, and define ap(ti)=d(ar(i) to be the set of all districts containing ghbors of Ti. Every tract borders at least one other tracts, so ar(Ti) and ap(li) have 2.3.2 Borders Define the border of district D; as aDi=iT E Dilap(Ti)+(Di which is the set tracts in Di that are adjacent to at least one district other than D. The interior of district D; is Ii= DiaDi, the set of census tracts in D; whose neighbors are all in D;. Denote the total number of tracts in district D; as mi= Dil the number of border tracts as b;=laDl The frontier of D; is denoted Fi=UTED, ar(TD)\Di, i.e. the set of all tracts outside of D i that border the boundry tracts of D
Control # 1421 Page 4 out of 35 2.2 Geography and similar characteristics The US Census Bureau provides a great deal of data on legal, natural, and man-made boundaries as well as socio-economic similarity of regions. In each census, the United States is broken up into several degrees of accuracy, the smallest of which are: blocks (40 people on average), block groups (1500 people), and tracts (4500 people). We follow the practice in Young (1988) by districting based on a maximum level of resolution which in our Case Study (Section 5) is census tracts. Notational note: we refer to the smallest unit of division generally as a tract. A reference from the Caliper Corporation describes tracts in the following quotation: Census tract boundaries normally follow visible features, but may follow governmental unit boundaries and other non-visible features, and they always nest within counties. Census tracts are designed to be relatively homogenous units with respect to population characteristics, economic status, and living conditions at the time the users established them. For these reasons we believe that units at the tracts size (or less) are acceptably small and homogenous to use as a base unit. Further, tracts are completely contained within counties so we can easily check whether or not a district breaks county integrity. 2.3 Notation Define m to be the number of census tracts, and n the number of districts. We denote our districts by Di , 1 ≤ j ≤ n, and our tracts by Tl , 1 ≤ l ≤ m. Denote the set of all tracts by Γ = {Tl}1≤l≤m; we call this a State. Denote the set of all districts at a particular time by ∆ = {Di}1≤j≤n. We call this a partition for the State. 2.3.1 Adjacency Define the symmetric relation Tp ∼ Tq for tract pairs (Tp, Tq) which are adjacent. Define the function d(Tl) to be the district to which the tract Tl belongs. We also naturally extend the definition of d to sets of tracts. Define the neighbor set of tract Tl by aT (Tl) = {Tp ∈ Γ|Tl ∼ Tp} to be the set of all census tracts neighboring Tl , and define aD(Tl) = d(aT (Tl)) to be the set of all districts containing neighbors of Tl . Every tract borders at least one other tracts, so aT (Tl) and aD(Tl) have cardinality at least one for all Tl . 2.3.2 Borders Define the border of district Di as ∂Di = {Tl ∈ Di |aD(Tl) 6= {Di}} which is the set tracts in Di that are adjacent to at least one district other than Di . The interior of district Di is Ii = Di\∂Di , the set of census tracts in Di whose neighbors are all in Di . Denote the total number of tracts in district Di as mi = |Di | the number of border tracts as bi = |∂Di |. The frontier of Di is denoted Fi = (∪Tl∈DiaT (Tl))\Di , i.e. the set of all tracts outside of Di that border the boundry tracts of Di
Control 1421 Page 5 out of 35 2.3.3 Counties We denote a county as C; and the set of all counties as A. Districts can(and often do) break county boundaries but tracts are contained entirely within counties so we can think of a county as a set of districts. Districts are also sets of tracts so we interpret the set intersection Di n C as the set of tracts in both district Di and county C;. From this, we define c(Di)=C lDinCit0 to be the set of counties which overlap with D 2.3.4 Population Let the population of our State be P and we denote the optimal district size, 5, as p. We use the function p( to generally denote the population of an object, for instance P(Ti) and P(Ci) are the populations of tract Ti and county C;, respectively. Due to frequent use, we use the shorthand p;=P(Di)for the population of districts Table 1 is a useful reference of these numerous definitions Table 1: Variables and their meanings Variable Definition Number of congressional districts D The in district(1≤i≤m) Set of all districts in a State, a partition Number of census tracts T The ltn tractin(1≤l≤m) Set of all tracts in a State d(Ti) District to which tract Ti belongs Tracts Tp and Tq are adjacent (T) Set of districts containing tracts neighboring Ti OD Border of d tracts that neighbor another district Ii Interior of Di, tracts with do not neighbor another district Number of tracts in D b Number of tracts in aD F Set of all tracts outside of D, that border aD C he jth county C(Ti) The county to which tract Ti belongs C(Di) The set of counties containing district D; Total population of the State P Average population of a district p() Population of an arbitrary object Shorthand for P(Di), population of district D
Control # 1421 Page 5 out of 35 2.3.3 Counties We denote a county as Cj and the set of all counties as Λ. Districts can (and often do) break county boundaries but tracts are contained entirely within counties so we can think of a county as a set of districts. Districts are also sets of tracts so we interpret the set intersection Di ∩ Cj as the set of tracts in both district Di and county Cj . From this, we define c(Di) = {Cj |Di ∩ Cj 6= ∅} to be the set of counties which overlap with Di . 2.3.4 Population Let the population of our State be P and we denote the optimal district size, P n , as ¯p. We use the function p(·) to generally denote the population of an object, for instance p(Tl) and p(Cj ) are the populations of tract Tl and county Cj , respectively. Due to frequent use, we use the shorthand pi = p(Di) for the population of districts. Table 1 is a useful reference of these numerous definitions. Table 1: Variables and their meanings Variable Definition n Number of congressional districts Di The ith district (1 ≤ i ≤ n) ∆ Set of all districts in a State, a partition m Number of census tracts Tl The lth tractfin (1 ≤ l ≤ m) Γ Set of all tracts in a State d(Tl) District to which tract Tl belongs Tp ∼ Tq Tracts Tp and Tq are adjacent aT (Tl) Set of tracts adjacent to tract Tl aD(Tl) Set of districts containing tracts neighboring Tl ∂Di Border of Di , tracts that neighbor another district Ii Interior of Di , tracts with do not neighbor another district mi Number of tracts in Di bi Number of tracts in ∂Di Fi Set of all tracts outside of Di that border ∂Di Cj The jth county c(Tl) The county to which tract Tl belongs c(Di) The set of counties containing district Di P Total population of the State p¯ Average population of a district p(·) Population of an arbitrary object pi Shorthand for p(Di), population of district Di
Control 1421 Page 6 out of 35 2. 4 Past models Prior to explaining our modeling approach we would like discuss some previous work in the literature on congressional districting and gerrymandering. We used these papers as guides as we thought about and further refined our algorithm and implementation Cirincione et al.(2000)judge the quality of a districting plan based on equal population preservation of county integrity, and district area compactness. They require that district populations differ by no more than 1% from exact equality in the number of constituents and require point contiguity of the districts. The algorithm constructs districts by picking a random block group(their unit size), then adding additional block groups to the new district until the district population reaches p. At this point they repeat the process starting with a new random block group. Compactness of districts is based on their minimum bounding rectangles and county integrety is encouraged by "randomly"selecting new block groups with a preference for block groups in already inhabited counties Mehrotra et al.(1998) and Garfinkel and Nemhauser(1970) implement a"branch-and- price"method in the optimization step. They first obtain a districting, and optimize over their constraints such that population values are allowed to vary in the final solution of the optimization step. In a final step they split up population units to ensure population equality. They define compactness in a graph-theoretical manner where connected nodes are adjacent tracts. They define the "center"of a district to be the node(tract)with the lowest maximum distance to another other tract. They consider a graph(district )more compact when sum of distances from each node to the center is small We do not use this measure, as it does not uniquely define the center of a graph, and contrary to their claims, does allow for oddly-shaped districts, such as a district whose graph is a star-shaped tree with one tract in the center and many non-contiguous paths emanating from it. In our case study simulations, prior to the incorporation of a compactness factor in the objective function, we often obtain such a tree structure, which is one of the salient features of gerrymandering We also do not use a"branch-and-price"method of optimization. Following suggestions of Nagel(1965) and Kaiser(1966), we employ a local search algorithm in which tracts are swapped between existing districts to maximize some objective function. We describe this process in detail in Section 4 2.5 Measuring compactness The notion of compactness of a planar region has no uniformly accepted definition and research done by Young(1988) suggests that any reasonable measure of compactness fails to work well for certain geographic configurations. He further suggests that any good measure of compactness in such problems should consider the population units(census tracts in our case study) as indivisible units, and therefore that the measure of compactness should be made independently of the predetermined shapes of the population units. We follow this directive in our definition of compactness In fact, the compactness measures given in Young(1998)are not reasonable in the first instance, and do not include any notion of the area of a district, or comparing it to the perimeter. The measures include the maximum total perimeter of a district in a districting, determining the relative height and width of the district, and finding the moment of inertia
Control # 1421 Page 6 out of 35 2.4 Past Models Prior to explaining our modeling approach we would like discuss some previous work in the literature on congressional districting and gerrymandering. We used these papers as guides as we thought about and further refined our algorithm and implementation. Cirincione et al. (2000) judge the quality of a districting plan based on equal population, preservation of county integrity, and district area compactness. They require that district populations differ by no more than 1% from exact equality in the number of constituents, and require point contiguity of the districts. The algorithm constructs districts by picking a random block group (their unit size), then adding additional block groups to the new district until the district population reaches ¯p. At this point they repeat the process starting with a new random block group. Compactness of districts is based on their minimum bounding rectangles and county integrety is encouraged by “randomly” selecting new block groups with a preference for block groups in already inhabited counties. Mehrotra et al. (1998) and Garfinkel and Nemhauser (1970) implement a “branch-andprice” method in the optimization step. They first obtain a districting, and optimize over their constraints such that population values are allowed to vary in the final solution of the optimization step. In a final step they split up population units to ensure population equality. They define compactness in a graph-theoretical manner where connected nodes are adjacent tracts. They define the “center” of a district to be the node (tract) with the lowest maximum distance to another other tract. They consider a graph (district) more compact when sum of distances from each node to the center is small. We do not use this measure, as it does not uniquely define the center of a graph, and, contrary to their claims, does allow for oddly-shaped districts, such as a district whose graph is a star-shaped tree with one tract in the center and many non-contiguous paths emanating from it. In our case study simulations, prior to the incorporation of a compactness factor in the objective function, we often obtain such a tree structure, which is one of the salient features of gerrymandering. We also do not use a “branch-and-price” method of optimization. Following suggestions of Nagel (1965) and Kaiser (1966), we employ a local search algorithm in which tracts are swapped between existing districts to maximize some objective function. We describe this process in detail in Section 4. 2.5 Measuring compactness The notion of compactness of a planar region has no uniformly accepted definition and research done by Young (1988) suggests that any reasonable measure of compactness fails to work well for certain geographic configurations. He further suggests that any good measure of compactness in such problems should consider the population units (census tracts in our case study) as indivisible units, and therefore that the measure of compactness should be made independently of the predetermined shapes of the population units. We follow this directive in our definition of compactness. In fact, the compactness measures given in Young (1998) are not reasonable in the first instance, and do not include any notion of the area of a district, or comparing it to the perimeter. The measures include the maximum total perimeter of a district in a districting, determining the relative height and width of the district, and finding the moment of inertia
Control 1421 Page 7 out of 35 of the district. All of these measures fail to consider both perimeter and area simultaneously, which seems to be a reasonable requirement of a good compactness measure The Isoperimetric Theorem, first proved (non-rigorously) by J. Steiner in 1838, states that the quantity A/P2, given by the ratio of the area A of a planar region(not necessarily continuous) to the square of its perimeter is maximized when the region is circular. The maximum achievable compactness, that of a circle with radius r, is given by 472r2 =4 so we define compactness of a region as the ratio(4T A)/P2. This ratio is bounded within(0, 1] where higher values indicate greater compactness We believe this serves as a good measure of the broadly defined "regularity "of a region which is so important to the study of Congressional districting and gerrymandering. Specifi cally, any shear of factor s applied to a circle decreases the compactness by a factor of s, and any concave region has lower compactness than does its convex hull. It is easy to see that we can make an even stronger statement: the convex hull of a concave region has greater area and smaller perimeter Observe that a square gets close to the optimum, with a compactness of tr N 0.785. This implies that the set of possible compactness values for rectangles is(0, 0.785) since a square is the most compact rectangle 3 The Multi-Seeded growth Model We take a two-stage approach to finding the best districts for a given State. In the multi- Seeded growth Model. referred to as MSgm hereafter. we find an initial allocation of n districts so that the partition has modest levels of population equality and county preser- vation. Our more precise Partition Optimization Model, or POM, edits and improves the rough sketch from MSGM into until it becomes, hopefully, a work of art The reason that our model runs in two phases is simple: speed. Our knee-jerk reaction to the problem was to randomly allocate tracts to the n districts and then optimize by swapping tracts trying to improve some objective function. However, a random initial configuration is so far from the global maximum that the search might take millions of years The MSGM generates a very crude coloring of a State that ensures district contiguity and tries, but does not guarantee, to achieve population equality and county preservation The districts created by MSGM are completely unacceptable for an actual plan but save enormous amounts of computing time for our solution 3. 1 How it works At first, our task seems daunting. How do we allocate n districts equally, even to a rough approximation? Our solution is to grow the n districts simultaneously until they cover the State We start by allocating the entire State to a blank, dummy district Do, and then al- locating n tracts that serve as the initial "seeds" for the final districts. such that each Di, iEl,.,n begins as only a single tract. Now while Dol>0, we take the set S of all
Control # 1421 Page 7 out of 35 of the district. All of these measures fail to consider both perimeter and area simultaneously, which seems to be a reasonable requirement of a good compactness measure. The Isoperimetric Theorem, first proved (non-rigorously) by J. Steiner in 1838, states that the quantity A/P2 , given by the ratio of the area A of a planar region (not necessarily continuous) to the square of its perimeter is maximized when the region is circular. The maximum achievable compactness, that of a circle with radius r, is given by πr2 4π2r 2 = 1 4π so we define compactness of a region as the ratio (4πA)/P2 . This ratio is bounded within (0, 1] where higher values indicate greater compactness. We believe this serves as a good measure of the broadly defined “regularity” of a region which is so important to the study of Congressional districting and gerrymandering. Specifi- cally, any shear of factor s applied to a circle decreases the compactness by a factor of s, and any concave region has lower compactness than does its convex hull. It is easy to see that we can make an even stronger statement: the convex hull of a concave region has greater area and smaller perimeter. Observe that a square gets close to the optimum, with a compactness of 4π 16 ≈ 0.785. This implies that the set of possible compactness values for rectangles is (0, 0.785) since a square is the most compact rectangle. 3 The Multi-Seeded Growth Model We take a two-stage approach to finding the best districts for a given State. In the MultiSeeded Growth Model, referred to as MSGM hereafter, we find an initial allocation of n districts so that the partition has modest levels of population equality and county preservation. Our more precise Partition Optimization Model, or POM, edits and improves the rough sketch from MSGM into until it becomes, hopefully, a work of art. The reason that our model runs in two phases is simple: speed. Our knee-jerk reaction to the problem was to randomly allocate tracts to the n districts and then optimize by swapping tracts trying to improve some objective function. However, a random initial configuration is so far from the global maximum that the search might take millions of years. The MSGM generates a very crude coloring of a State that ensures district contiguity and tries, but does not guarantee, to achieve population equality and county preservation. The districts created by MSGM are completely unacceptable for an actual plan but save enormous amounts of computing time for our solution. 3.1 How it works At first, our task seems daunting. How do we allocate n districts equally, even to a rough approximation? Our solution is to grow the n districts simultaneously until they cover the State. We start by allocating the entire State to a blank, dummy district D0, and then allocating n tracts that serve as the initial “seeds” for the final districts, such that each Di , i ∈ {1, . . . , n} begins as only a single tract. Now while |D0| > 0, we take the set S of all
Control 1421 Page 8 out of 35 possible moves which involve taking a district from Do while preserving contiguity. That is Where M(T, Di, Di)represents a move of tract Ti from D; to Di, corresponding to the exit of Ti from Di and the entrance of Ti into D;. We then sort the moves in S by our heuristic function y (D1,., D,)-R, a function increasing in the desirability of our prospectiv partition. Each move is scored by the heuristic value that would result if we were to accept only that move. We then conclude by performing the moves corresponding to the top 3% of the scored moves in S. Note that this method preserves contiguity, because by definition any Ti E Fi must be contiguous with Di, and thus the Di are contiguous at each step Had we but world enough, and time, we would only perform the best possible move found in S before recalculating the frontier. Even though in the msgm we do not consider moves between two "true"districts (rather, we consider only moves between a true district and the dummy district), the value of a move does not exist in isolation. Consider two distinct districts D; and D;, and two tracts Ti E FinFi and Tk E FinF. The acceptance of M(Tk, Do, Di)alters the heuristic value of every move associated with Fi, which could poten- tially affect the optimality of further moves with Di, such as the acceptance of M(Ti, Do, Di rather than M(Ti, Do, Di). Furthermore, the acceptance of M(Ti, Do, Di) likely expands the size of Fi. Perhaps there is an optimal move opened up in this new frontier that we do not even consider, because we have not even calculated its value It would be better to only perform the best move, but such a strategy was found to be too computationally intensive. We compromise by taking only a small, elite fraction of the moves in each step before recalculating S and the values of its associated moves. In this respect, our approach is analogous to the strategy of modified policy iteration for solving a Markov decision problem. And just as modified policy iteration excels in practice, we found that the tradeoff of possible inefficiency is more than compensated for by the speed gains of the algorithm, especially considering that the solution obtained by MSGM will be further refined by POM In true modified policy iteration, k rounds of value iteration are made in-between policy iterations such that k is fixed. Our MSGM scheme uses a variable number of moves in- between recalculating the value of the frontier. We selected our scheme because it causes us to be delicate in our selections of tract allocations, making moves virtually one-at-a- time, at the beginning and end of the MsGm. By focusing on the beginning and end of the problem, we attempt to avoid having a single district grow too large through possible inefficient allocation Unlike Cirincione(2000)we use random initial seeds weighted by population rather than seeds that are equally spaced around the State. The process works as follows: while there are still random seeds to be selected, we find a candidate initial seed tract Ti in Do. Letting the largest tract in our State have population p, we accept Ti as an initial seed with probability P(Ti/p. We thus select tracts in linear proportion to their population. We found that the MSGm algorithm produces the best initial results when all the districts have the same amount of population, rather than the same number of tracts around which to grow. The geographically optimal placement of five, or fewer, starting seeds in the NYC Metropolitan
Control # 1421 Page 8 out of 35 possible moves which involve taking a district from D0 while preserving contiguity. That is: S = [n i=1 [ Tl∈Fi M(Tl , D0, Di) Where M(Tl , Di , Dj ) represents a move of tract Tl from Di to Dj , corresponding to the exit of Tl from Di and the entrance of Tl into Dj . We then sort the moves in S by our heuristic function Ψ(D1, . . . , Dn) → R, a function increasing in the desirability of our prospective partition. Each move is scored by the heuristic value that would result if we were to accept only that move. We then conclude by performing the moves corresponding to the top 3% of the scored moves in S. Note that this method preserves contiguity, because by definition any Tl ∈ Fi must be contiguous with Di , and thus the Di are contiguous at each step. Had we but world enough, and time, we would only perform the best possible move found in S before recalculating the frontier. Even though in the MSGM we do not consider moves between two “true” districts (rather, we consider only moves between a true district and the dummy district), the value of a move does not exist in isolation. Consider two distinct districts Di and Dj , and two tracts Tl ∈ Fi ∩Fj and Tk ∈ Fi ∩F c j . The acceptance of M(Tk, D0, Di) alters the heuristic value of every move associated with Fi , which could potentially affect the optimality of further moves with Di , such as the acceptance of M(Tl , D0, Di) rather than M(Tl , D0, Dj ). Furthermore, the acceptance of M(Tl , D0, Di) likely expands the size of Fi . Perhaps there is an optimal move opened up in this new frontier that we do not even consider, because we have not even calculated its value. It would be better to only perform the best move, but such a strategy was found to be too computationally intensive. We compromise by taking only a small, elite fraction of the moves in each step before recalculating S and the values of its associated moves. In this respect, our approach is analogous to the strategy of modified policy iteration for solving a Markov decision problem. And just as modified policy iteration excels in practice, we found that the tradeoff of possible inefficiency is more than compensated for by the speed gains of the algorithm, especially considering that the solution obtained by MSGM will be further refined by POM. In true modified policy iteration, k rounds of value iteration are made in-between policy iterations, such that k is fixed. Our MSGM scheme uses a variable number of moves inbetween recalculating the value of the frontier. We selected our scheme because it causes us to be delicate in our selections of tract allocations, making moves virtually one-at-atime, at the beginning and end of the MSGM. By focusing on the beginning and end of the problem, we attempt to avoid having a single district grow too large through possible inefficient allocation. Unlike Cirincione (2000) we use random initial seeds weighted by population rather than seeds that are equally spaced around the State. The process works as follows: while there are still random seeds to be selected, we find a candidate initial seed tract Tl in D0. Letting the largest tract in our State have population ˆp, we accept Tl as an initial seed with probability p(Tl)/pˆ. We thus select tracts in linear proportion to their population. We found that the MSGM algorithm produces the best initial results when all the districts have the same amount of population, rather than the same number of tracts around which to grow. The geographically optimal placement of five, or fewer, starting seeds in the NYC Metropolitan