GETTING THE VOTES TO COUNT: A GENETIC ALGORITHM FOR NON-PARTISAN LEGISLATIVE DISTRICTING abStract or decades, district gerrymandering has given incumbents an unfair advan- tage over their opponents, resulting in artificially high reelection rates. This gerrymandering has become so severe that lowa [1 has implemented a nor partisan district-drawing method, and other states are sure to follow. Such efforts may have many different ob jectives in creating districts. In this paper we describe a redistricting algorithm which is flexible enough to accommodate a wide variety of district criteria. We use a genetic algorithm, wherein an arbi- trary user-selected number of districts compete for population on a map of the state region. By adjusting the probability formulae for district reassignment events, we can tailor our algorithm to create districts satisfying a wide range of criteria. This algorithm is then tested on the state of New York, by requiring it to produce district shapes that maximize the interior regions of the districts in proportion to their boundaries. The resulting district divisions produced by After presenting the results of our tests, we will discuss the merits and defects of our chosen algorithm, along with its possibility for extension. Date: February 11, 2007
GETTING THE VOTES TO COUNT: A GENETIC ALGORITHM FOR NON-PARTISAN LEGISLATIVE DISTRICTING Abstract. For decades, district gerrymandering has given incumbents an unfair advantage over their opponents, resulting in artificially high reelection rates. This gerrymandering has become so severe that Iowa [1] has implemented a nonpartisan district-drawing method, and other states are sure to follow. Such efforts may have many different objectives in creating districts. In this paper we describe a redistricting algorithm which is flexible enough to accommodate a wide variety of district criteria. We use a genetic algorithm, wherein an arbitrary user-selected number of districts compete for population on a map of the state region. By adjusting the probability formulae for district reassignment events, we can tailor our algorithm to create districts satisfying a wide range of criteria. This algorithm is then tested on the state of New York, by requiring it to produce district shapes that maximize the interior regions of the districts in proportion to their boundaries. The resulting district divisions produced by the algorithm clearly exhibit this property. After presenting the results of our tests, we will discuss the merits and defects of our chosen algorithm, along with its possibility for extension. Date: February 11, 2007. 1
CoNTENTS 1. Introduction 2.1. Population Specifications 2.2. Geographic Simplicit 3. Our Method 3.1. Source for data 3.2. Other possibilities for district-drawing 23334467 4. Data 5. Evaluation of our Method 12 5.1. Defects 5.2. Merits 13 6. Appendix 1-Probability factors 7. Appendix 2-Other data R pretences 1. INTRODUCTION Gerrymandering is one of the oldest and most successful methods of disenfran chisement available to Congress. This much-maligned practice is essential to main- taining the 98% re-election rate currently enjoyed by incumbents 5. Obviously many people arent happy with the status quo, as it can render an individuals vote meaningless as his representation is determined not by popular sentiment, but by the incumbents drawing up the districts. What is needed is a new method for determining legislative districts, one out of the hands of incumbents What should an ideal district-drawing algorithm accomplish? This is certainly a vexing political question. Should it take the truly unbiased approach and ignore ny and all political considerations? We can just randomly create districts, blind to both political ambitions and political fairness. Others may feel that steps should be taken to ensure that elected representatives fairly represent their constituents Should we set up the districts so that a 60% Republican state will likely have 60% of its districts be majority Republican? Perhaps this just returns us to a different type of gerrymandering, done for the cause of"fair representation, but gerrymandering nonetheless This dilemma can only be solved by political scientists, not mathematicians. How- ever, a mathematician should be able to create a district-drawing algorithm which can accomodate certain 'preferences'in drawing districts, whatever they may be It is then up to social scientists to ascertain what these preferences should be. Our goal is to produce just such an algorithm. We will then test the algorithm by having it select districts for New York that are 'geographically simple'in a specific sense to be defined later I Average re-election rate of House incumbents over election cycles 2000, 2002, and 2004
Team 1035 2 Contents 1. Introduction 2 2. Specifications 3 2.1. Population Specifications 3 2.2. Geographic Simplicity 3 3. Our Method 4 3.1. Source for data 4 3.2. Other possibilities for district-drawing 6 4. Data 7 5. Evaluation of our Method 12 5.1. Defects 12 5.2. Merits 13 6. Appendix 1 - Probability factors 13 7. Appendix 2 - Other data 14 References 15 1. Introduction Gerrymandering is one of the oldest and most successful methods of disenfranchisement available to Congress. This much-maligned practice is essential to maintaining the 98% re-election rate currently enjoyed by incumbents1 [5]. Obviously, many people aren’t happy with the status quo, as it can render an individual’s vote meaningless as his representation is determined not by popular sentiment, but by the incumbents drawing up the districts. What is needed is a new method for determining legislative districts, one out of the hands of incumbents. What should an ideal district-drawing algorithm accomplish? This is certainly a vexing political question. Should it take the truly unbiased approach and ignore any and all political considerations? We can just randomly create districts, blind to both political ambitions and political fairness. Others may feel that steps should be taken to ensure that elected representatives fairly represent their constituents. Should we set up the districts so that a 60% Republican state will likely have 60% of its districts be majority Republican? Perhaps this just returns us to a different type of gerrymandering, done for the cause of ‘fair representation’, but gerrymandering nonetheless. This dilemma can only be solved by political scientists, not mathematicians. However, a mathematician should be able to create a district-drawing algorithm which can accomodate certain ‘preferences’ in drawing districts, whatever they may be. It is then up to social scientists to ascertain what these preferences should be. Our goal is to produce just such an algorithm. We will then test the algorithm by having it select districts for New York that are ‘geographically simple’ in a specific sense to be defined later. 1Average re-election rate of House incumbents over election cycles 2000, 2002, and 2004. 2
2. SPECIFICATIONS We require our district-drawing algorithm to satisfy the following criteria: 2.1. Population Specifications In the 1962 decision of Baker v. Carr, the Us Supreme Court asserted the abil ity to decide on the constitutionality of state district divisions 2. Although th Supreme Court has not specified the exact terms of what is acceptable, past rulings have suggested the following: " congressional redistricting plans with overall range percentage variances of up to. 73 percent have been approved based upon identi- fiable state objectives"[1. Here, the overall range percentage variance is defined by the maximum difference in population between any pair of districts, divided by the " ideal district population(which is just the population of the state divided by the number of districts). We would like to note, however, that a lower deviation percentage of 0.69% has been rejected in the past [ 1]. Pending further legal clarifi cation, we set a range variance of less than or equal to 0.7 percent as our goal for opulation range variance 2.2. Geographic Simplicity. We will require our districts to satisfy two criteria. The first requirement is ab- solute each district must be connected. we will later describe our method for enforcing this. Our second requirement will be to have each district contain as many interior points as possible. Here, an interior point means a grid point which does not neighbor a grid point of a different region on either the east, west, north or south. As with any notion of geometric simplicity, this one is motivated by esthetic reasons: intuitively, discretized versions of squares and circles contain a arge number of interior grid points The recognition that others may not agree with our particular definition of simplic ty is an important reason why we wish our algorithm to accommodate different district division criteria easily Let us be more specific about what a district-drawing algorithm should do. We will be given a region(which represents the state map), along with a population density function defined on the state. In our algorithm the region and density function will be discretized, but a priori one thinks of them as approximately continuous. We will discuss the validity of the discretization later In addition, the aforementioned social scientists may find other variables to be of nterest in drawing up districts. There are many ways in which such data can be naturally presented. For example, we may wish to have our districts follow county lines when possible. In this case, county lines should be included in our naturally presen ap. On the other hand. information about political leanings is state boundary d as a density function Pp on the region (.e, for each point on the state, pd(a)dA is the percentage of people living in differential area dA sur rounding a who are Democrats, with analogous functions PR, pI for Republicans nd Independents). We will later show how these possibilities and others can be
Team 1035 3 2. Specifications We require our district-drawing algorithm to satisfy the following criteria: 2.1. Population Specifications. In the 1962 decision of Baker v. Carr, the US Supreme Court asserted the ability to decide on the constitutionality of state district divisions [2]. Although the Supreme Court has not specified the exact terms of what is acceptable, past rulings have suggested the following: “congressional redistricting plans with overall range percentage variances of up to .73 percent have been approved based upon identi- fiable state objectives” [1]. Here, the overall range percentage variance is defined by the maximum difference in population between any pair of districts, divided by the ‘ideal’ district population (which is just the population of the state divided by the number of districts). We would like to note, however, that a lower deviation percentage of 0.69% has been rejected in the past [1]. Pending further legal clarifi- cation, we set a range variance of less than or equal to 0.7 percent as our goal for population range variance. 2.2. Geographic Simplicity. We will require our districts to satisfy two criteria. The first requirement is absolute: each district must be connected. We will later describe our method for enforcing this. Our second requirement will be to have each district contain as many interior points as possible. Here, an interior point means a grid point which does not neighbor a grid point of a different region on either the east, west, north, or south. As with any notion of geometric simplicity, this one is motivated by aesthetic reasons: intuitively, discretized versions of squares and circles contain a large number of interior grid points. The recognition that others may not agree with our particular definition of simplicity is an important reason why we wish our algorithm to accommodate different district division criteria easily. Let us be more specific about what a district-drawing algorithm should do. We will be given a region (which represents the state map), along with a population density function defined on the state. In our algorithm the region and density function will be discretized, but a priori one thinks of them as approximately continuous. We will discuss the validity of the discretization later. In addition, the aforementioned social scientists may find other variables to be of interest in drawing up districts. There are many ways in which such data can be naturally presented. For example, we may wish to have our districts follow county lines when possible. In this case, county lines should be included in our state boundary map. On the other hand, information about political leanings is naturally presented as a density function ρD on the region (i.e., for each point x on the state, ρD(x)dA is the percentage of people living in differential area dA surrounding x who are Democrats, with analogous functions ρR, ρI for Republicans and Independents). We will later show how these possibilities and others can be 3
Team 1035 incorporated into our general method. 3. OUR METHOD In broad overview, our algorithm for dividing up the district is a genetic algorithm wherein the districts compete for population on the state map. The algorithm takes as input an integer k(the number of districts in which to divide the state) and a representation of the population density as a function of location within the state This information is given as a rectangular matrix(or grid), with each grid point corresponding to a small square on the map. Each grid point is then assigned the population contained within the corresponding square region of the state. The grid as a whole is rectangular, so it does not match up with the state's borders exactly Any spot on the grid outside of the state's borders is assigned zero population density, and will be declared out of bounds in the simulation to come. A visual representation of the resulting population density function is seen in the figure on page 8. In our specific example for New York state, this grid is 67 squares wide and 46 squares tall 1. Source for data. Our data for population density was obtained from [4 in particular the map seen below. We gridded the map by hand, and then ap- proximate the density of each square based on the scale given. This approach to acquiring numerical density data certainly isn't ideal, but there does not seem to be a readily available digital source for gridded population data for the state of New York PERSONS PER S。 UARE MILE REGION IV FIGURE 1. Source for population density figures, New York State Once the data has been fe e program, the simulatio on can begin. We first initialize the regions by randomly picking k grid points within the state boundaries, and assigning each of these k points to a district, so that at the beginning, each
Team 1035 4 incorporated into our general method. 3. Our Method In broad overview, our algorithm for dividing up the district is a genetic algorithm, wherein the districts compete for population on the state map. The algorithm takes as input an integer k (the number of districts in which to divide the state) and a representation of the population density as a function of location within the state. This information is given as a rectangular matrix (or grid), with each grid point corresponding to a small square on the map. Each grid point is then assigned the population contained within the corresponding square region of the state. The grid as a whole is rectangular, so it does not match up with the state’s borders exactly. Any spot on the grid outside of the state’s borders is assigned zero population density, and will be declared ‘out of bounds’ in the simulation to come. A visual representation of the resulting population density function is seen in the figure on page 8. In our specific example for New York state, this grid is 67 squares wide and 46 squares tall. 3.1. Source for data. Our data for population density was obtained from [4], in particular the map seen below. We gridded the map by hand, and then approximated the density of each square based on the scale given. This approach to acquiring numerical density data certainly isn’t ideal, but there does not seem to be a readily available digital source for gridded population data for the state of New York. Figure 1. Source for population density figures, New York State Once the data has been fed into the program, the simulation can begin. We first initialize the regions by randomly picking k grid points within the state boundaries, and assigning each of these k points to a district, so that at the beginning, each 4
Team 1035 FIGURE 2. Source for population density figures, New York City district consists of 1 grid point. The rest of the map is not initially assigned a district. Now we loop through all grid points on the map, performing the following tasks for each point If the grid point does not belong to a district, go on to the next grid point If it does belong to a district, say district A, consider each of its neighboring grid points to the west, east, north, and south. If a neighboring grid point does not belong to a district, assign it to district A. This way unused grid oints on the map are quickly taken up If a neighboring grid point does belong to a district, say district B, calculate the total population of districts A and B, as well as the potential change in total number of interior grid points if the neighboring grid point were to be re-assigned to district B. Calculate a probability for re-assignment based on these two numbers, and "Hip a coin'based on this probability to determine whether or not the neighboring grid point gets re-assigned to district A See Appendix 1 for the actual probability formula used After the above steps have been performed for each grid point, repeat the whole loop again and again, until(hopefully) the population numbers stabilize. In reali our discretization prevents the populations of the regions from becoming exactly equal, so we must introduce a cut-off. In our algorithm, this is done as follows: when calculating a potential re-assignment of a grid point(r, y) from district A to district B, look at the relative population difference A PaB between regions A and B, given by APAB-(PA+PB, where PA, PB are the current populations of districts A and B. If this number is below 0.01, our probability formula gives zero probability for re-assignment. This way, theoretically, our simulation should stop hen the relative population differences of neighboring districts is smaller than 2% Recall that our goal for range variance was 0.7%, so we are essentially giving up this goal. This was a result of recognizing that our discretization was simply not fine
Team 1035 5 Figure 2. Source for population density figures, New York City district consists of 1 grid point. The rest of the map is not initially assigned a district. Now we loop through all grid points on the map, performing the following tasks for each point: • If the grid point does not belong to a district, go on to the next grid point. • If it does belong to a district, say district A, consider each of its neighboring grid points to the west, east, north, and south. If a neighboring grid point does not belong to a district, assign it to district A. This way unused grid points on the map are quickly taken up. • If a neighboring grid point does belong to a district, say district B, calculate the total population of districts A and B, as well as the potential change in total number of interior grid points if the neighboring grid point were to be re-assigned to district B. Calculate a probability for re-assignment based on these two numbers, and ‘flip a coin’ based on this probability to determine whether or not the neighboring grid point gets re-assigned to district A. See Appendix 1 for the actual probability formula used. After the above steps have been performed for each grid point, repeat the whole loop again and again, until (hopefully) the population numbers stabilize. In reality, our discretization prevents the populations of the regions from becoming exactly equal, so we must introduce a cut-off. In our algorithm, this is done as follows: when calculating a potential re-assignment of a grid point (x, y) from district A to district B, look at the relative population difference ∆PAB between regions A and B, given by ∆PAB = PA−PB (PA+PB) , where PA, PB are the current populations of districts A and B. If this number is below 0.01, our probability formula gives zero probability for re-assignment. This way, theoretically, our simulation should stop when the relative population differences of neighboring districts is smaller than 2%. Recall that our goal for range variance was 0.7%, so we are essentially giving up on this goal. This was a result of recognizing that our discretization was simply not fine 5