# **Gerrymandering HW**

**Instructions**


Implement a dynamic programming solution to the Gerrymandering Problem as defined in class and in accompanying presentation. Test code on synthetic and real data set(s) as indicated in exercises below. Include the names and UVA IDs of all persons in your group. 

---

A special thanks to
Robbie Hott,
Alexander DeLuca,
Kelly Farrell,
Samy Kebaish,
Grant Redfield,
Matthew Sachs,
Anita Taucher,

### Storage
For data storage and retrieval SQLite is used.  Here, we establish a connection to the database and define a cursor to be used throughout.

In [4]:
import sqlite3 # https://docs.python.org/3/library/sqlite3.html
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats  as stats
import math
import numpy as np

## Establish a connection to our database
conn = sqlite3.connect('gerrymander.db')

## Create a cursor to execute commands through the connection
cursor = conn.cursor()


In [5]:
## When recreate is True,  drop all database tables and recreate them for an updated, clean deployment.

recreate = True

if recreate == True:

  cursor.execute("DROP TABLE IF EXISTS precinct")
  cursor.execute("DROP TABLE IF EXISTS party")
  cursor.execute("DROP VIEW IF EXISTS for_algo")
  conn.commit()

  # Quick verification to make sure everything was dropped
  cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
  cursor.fetchall()



### Data and scripts on GitHub
The scripts for building the database, including the data and schema, are in a github repository. urllib3 library is used to communicate over https.  

In [7]:
## SQL Scripts are in Github
## prepare to read from github
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
gitread = urllib3.PoolManager()


## 1) Provide an Introduction (10 pts)


Provide a Formal Problem Statement, define all variables needed, and *state all assumptions.* 

*Problem Statement: Gerrymandering*

The purpose of this problem is to study Dynamic Programming by considering a Gerrymandering case study. Gerrymandering is the manipulation of electoral district boundaries to favor one's political party over other political parties. Gerrymandering is redistricting to benefit one or more political parties. A "Gerrymander" was depicted in an 1812 political cartoon after Governor Elbridge Gerry signed a bill that redistricted Massachusetts to benefit the Democratic-Republican Party. Virginia's fifth district looks like the original Gerrymander. Gerrymandering in the United States may be unconstitutional. According to [Politico](https://www.politico.com/story/2017/10/03/supreme-court-gerrymandering-wisconsin-arguments-243401), in 2017 Supreme Court Associate Justice Anthony Kennedy believed that extreme partisan Gerrymandering might violate the Constitution. On 09/03/2019, in Common Cause v. Lewis, the Wake County Superior Court ruled that a state legislative map violated the North Carolina Constitution. In hearings after Bethune-Hill v. Virginia State Board of Elections was remanded on 03/01/2017, the United States District Court for the Eastern District of Virginia held that redistricting in 2011 involved unconstitutional racial Gerrymandering. On 06/27/2019, in Rucho v. Common Cause, the Supreme Court of the United States ruled that federal courts cannot review allegations of partisan Gerrymandering.

Gerrymandering works by political parties maximizing the number of districts in a state with a majority of voters favoring that party.  Districts in a state have roughly the same number of voters. States and districts are composed of precincts. All precincts have the same number of voters.

We can conduct Gerrymandering as follows. Consider a set of precincts $P = \{p_1, p_2, ..., p_n\}$ representing all voters in a state. Let a precinct $p$ contain $m$ voters. The state has $s = mn$ voters. Define a district $D$ as a proper subset of $P$. Determine $d$ districts $D_1, D_2, ..., D_d$ that represent all voters in a state and meet the following criteria. $d = 2$. The number of precincts in $D_1$ is equal to the number of precincts in $D_2$. By extension, the number of voters in $D_1$ is equal to the number of voters in $D_2$. The sum of the number of voters $R(D_1)$ and $R(D_2)$ in $D_1$ and $D_2$ who favor party $R$ is greater than $\frac{s}{2}$. By extension, the number of voters $R(D_1)$ or $R(D_2)$ in $D_1$ or $D_2$ who favor party $R$ is greater than $\frac{s}{2d}$. Note if such districts cannot be determined. 

# 2) Dynamic Programming Solution. (20 pts)

Formally define the solution and state the recurrence used. Identify how it employs dynamic programming and clearly explain and justify. 

Solution must 1) Determine if Gerrymandering is possible and if gerrymandering is possible 2) provide the associated precinct re-assignment. Be clear and explain how.

Dynamic Programming is a way of solving complex problems by dividing them into similar sub-problems, and then combining the solutions of sub-problems to achieve an overall "optimal" solution. The results of sub-problems are memoized; i.e., stored to avoid working on the same sub-problem again and to eliminate unnecessary repetition. Dynamic Programming seeks to solve each sub-problem only once.

Dynamic Programming requires "optimal substructure"; the solution to a larger problem must contain the solutions to smaller problems.

To conduct Dynamic Programming, we will identify a recursive structure of our problem. We will select a good order for solving subproblems. We may solve each problem in a "top down" manner or recursively. We may solve each problem in a "bottom up" manner or iteratively from smallest problem to largest problem. We will save the solution to each subproblem in memory.

# 3) Implement your Gerrymandering Algorithm (code) (40 pts)

Provide ample comments and justify each line of code. You may wish to use or implement a sparse matrix (or something similar) to store the "memos". 

In [36]:

def isGerrymanderPossible(df):
  '''
  Determines if gerrymandering is possible given a dataframe that contains REP voting and Total votes for princints in two neighboring districts.
  It returns True or False, and if True, Prints out the Precinct split and voter split.  

  Provide more details about your implementation here ... 
  '''


  #\< insert code here \>



# 4) Algorithmic Analysis (10pts)
Provide a time complexity analysis of your algorithms in terms of the size and /or parameters of the input. Be clear and precise. Provide comprehensive justification and state all assumptions. 



# 5) Test your algorithm (5 pts)

Run your algorithm on the example data set below. Is gerrymandering possible?
Create two other synthtetic data sets (dataframes ... like the one below): one where gerrymandering is possible and one where gerrymandering is not possible. Confirm your hypothesis using your implementation. 

In [37]:
precinct_data = pd.DataFrame()
precinct_data = precinct_data.append(pd.DataFrame({"PRECINCT":"DUMMY ROW","District": 0,"REP_VOTES":0, "DEM_VOTES": 0, "Total_Votes": 0},index=[0]))
precinct_data = precinct_data.append(pd.DataFrame({"PRECINCT":"92","District": 1,"REP_VOTES":65, "DEM_VOTES": 35, "Total_Votes": 100},index=[0]))
precinct_data = precinct_data.append(pd.DataFrame({"PRECINCT":"93","District": 1,"REP_VOTES":60, "DEM_VOTES": 40, "Total_Votes": 100},index=[0]))
precinct_data = precinct_data.append(pd.DataFrame({"PRECINCT":"94","District": 2,"REP_VOTES":45, "DEM_VOTES": 55, "Total_Votes": 100},index=[0]))
precinct_data = precinct_data.append(pd.DataFrame({"PRECINCT":"95","District": 2,"REP_VOTES":47, "DEM_VOTES": 53, "Total_Votes": 100},index=[0]))
precinct_data.reset_index(inplace = True)    
precinct_data.drop('index',axis=1,inplace=True)


LetsRun = isGerrymanderPossible(precinct_data)

if LetsRun:
    print("GerryMandering is possible")
else:
    print("GerryMandering is not possible")

# \< insert code here. your output should confirm the result\>


GerryMandering is not possible


# 6) Real-world Data Trials (15 pts) 





There are voter data from 5 states available herein: Alaska, Arizona, Kentucky, North Carolina, and Rhode Island. For this question you are asked to analyze Arizona and Kentucky Data. 

Note: In the example below the data is "preprocessed" to match our assumptions and downsized for reasonable experimental runtimes. 

### Notes about the tables

The create statements are stored in scripts in github including tables.sql.

Two tables in the schema:  

*  Precinct:  Holds all data for precincts, districts, and number of voter registrations by party.  There is a row for every party in each precinct, so precinct is not a unique key.  Additionally, within states, precinct is not unique, it must be used with district.

* Party:  An id and party name, just to keep the party data consistent within our database - party names and abbreviations change between states, but here we want them to be consistent.  Party can be joined with precinct on precinct.party = party.id


In [38]:
## Build the table structure
## We have two tables:  party and precinct

## The github url for the tables script
create_tables = 'https://raw.githubusercontent.com/boltonvandy/gerrymander/main/State_Data/tables.sql'

## GET contents of the tables.sql script from github
dat = gitread.request("GET", create_tables)

## Execute the table creation commands 
cursor.executescript(dat.data.decode("utf-8"))

## Preprocess for algorithm to use
view_def = ''' 
CREATE VIEW for_algo AS
SELECT * FROM
((SELECT STATE, PRECINCT, DISTRICT, VOTERS as REP_VOTES
FROM precinct WHERE PARTY = 'REP') NATURAL JOIN (
SELECT STATE, PRECINCT, DISTRICT, SUM(VOTERS) as Total_Votes
FROM precinct
WHERE (PARTY = 'REP' OR PARTY = 'DEM') 
GROUP BY STATE, PRECINCT, DISTRICT))
'''
    
cursor.execute(view_def)


## Commit Schema Changes
conn.commit()

## Confirm the names of the tables we built
ourtables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

if ourtables:
  print('\nTables in the Gerrymander Database\n')
  for atable in ourtables:
    print("\t"+atable[0])



Tables in the Gerrymander Database

	precinct
	party


##Example usage: Arizona

Here,the data from Arizona is loaded into the database.  

[Original Arizona Data on Kaggle](https://www.kaggle.com/arizonaSecofState/arizona-voter-registration-by-precinct)

In [39]:
## Arizona
cursor.execute("DELETE FROM precinct WHERE STATE = 'AZ'")
conn.commit()

az_url = 'https://raw.githubusercontent.com/boltonvandy/gerrymander/main/State_Data/az/az.insert.sql'

## GET contents of the script from a github url 
dat = gitread.request("GET", az_url)

## INSERT Data using statements from the github insert script
cursor.executescript(dat.data.decode("utf-8"))
conn.commit()

## Quick verification that data was loaded for this state
cursor.execute("SELECT count(*) from precinct")
verify = cursor.fetchone()[0]

cursor.execute("SELECT sum(voters), party from precinct where state = 'AZ' group by party order by 1 DESC")
print(verify, cursor.fetchall())

7270 [(1308384, 'REP'), (1251984, 'OTH'), (1169259, 'DEM'), (32096, 'LBT'), (6535, 'GRN')]


## 6a) Arizona Districts 1,2,&3   (5 out of 15 pts)

In this example, assume Districts 1/2 and 2/3 are neighboring and that precincts can be reassigned between them. Confirm (both using your code and manually) that Gerrymandering is possible between districts 2 & 3, but not 1 & 2 (given the preprocessing steps, assumptions, and downsampling done below). For the former, what is the Precinct breakdown? Your answer should be shown as code output. 


In [44]:
import math
# To inspect the raw data see here: https://github.com/boltonvandy/gerrymander/tree/main/State_Data

# Using top 4 precincts only for each district 
# Districts 1 and 2 are not gerrymanderable
# Districts 2 and 3 are gerrymanderable 
# Feel free to use the following preprocessing steps
#   and downsampling scheme for all experimental trials 
# Here we assume only 2 parties (Rep and Dem), all voters vote along party lines, and data is 
#   rescaled to 100 total voters per precinct.


# First query database by district and state, take top 4 
#   precincts, and append both districts into one dataframe

sql = '''
SELECT * from for_algo where state = 'AZ' AND ( DISTRICT = 2) 
'''
Arizona_di = pd.read_sql_query(sql, conn)
Arizona_di = Arizona_di.head(4)

sql = '''
SELECT * from for_algo where state = 'AZ' AND ( DISTRICT = 3) 
'''
Arizona_dj = pd.read_sql_query(sql, conn)
Arizona_dj = Arizona_dj.head(4)

Arizona = Arizona_di.append(Arizona_dj)
Arizona = Arizona.reset_index(drop=True)

# Rescale data to match our assumptions (for these trials)

Arizona["REP_VOTES"] = Arizona["REP_VOTES"] / Arizona["Total_Votes"] 
Arizona["REP_VOTES"] = pd.Series([math.ceil(Arizona["REP_VOTES"][x]*100) for x in range(len(Arizona.index))])
Arizona["Total_Votes"] = pd.Series([100 for x in range(len(Arizona.index))])

#Arizona.sort_values(by=['REP_VOTES'], ascending=False ,inplace=True)

print(Arizona)

if isGerrymanderPossible(Arizona):
  print("GerryMandering Possible In Arizona District")
else:
  print("GerryMandering Not Possible In Arizona District")

  STATE PRECINCT DISTRICT  REP_VOTES  Total_Votes
0    AZ   CH0001        2         65          100
1    AZ   CH0002        2         75          100
2    AZ   CH0003        2         63          100
3    AZ   CH0004        2         18          100
4    AZ   MC0016        3         36          100
5    AZ   MC0029        3         76          100
6    AZ   MC0037        3         26          100
7    AZ   MC0062        3         53          100
GerryMandering Not Possible In Arizona District


### 6b) Kentucky Districts   (10 out of 15 pts)

In this example, find two districts that are gerrymanderable and two that are not. Perform similar preprocessing steps as done in the Arizona data set, eg select 4 precincts, downsample and rescale. Confirm both district pairs using your code and manually. For the district pair that is gerrymanderable, what is the Precinct breakdown? Your answer should be shown as code output. 


In [41]:
## Kentucky!
# NOTE: the Kentucky Districts are stored as Strings. Be sure to build your query correctly :)
# See here: https://github.com/boltonvandy/gerrymander/tree/main/State_Data

cursor.execute("DELETE FROM precinct WHERE STATE = 'KY'")
conn.commit()

ky_url = 'https://raw.githubusercontent.com/boltonvandy/gerrymander/main/State_Data/ky/ky.insert.sql'

## GET contents of the script from a github url 
dat = gitread.request("GET", ky_url)

## INSERT Data using statements from the github insert script
cursor.executescript(dat.data.decode("utf-8"))
conn.commit()

## Quick verification that data was loaded for this state
cursor.execute("SELECT count(*) from precinct")
verify = cursor.fetchone()[0]

cursor.execute("SELECT sum(voters), party from precinct where state = 'KY' group by party order by 1 DESC")
print(verify, cursor.fetchall())

40498 [(1649790, 'DEM'), (1576259, 'REP'), (184839, 'OTH'), (131242, 'IND'), (14326, 'LBT'), (2014, 'GRN'), (1012, 'CONST'), (322, 'SOCWK'), (157, 'REFORM')]


In [43]:
#Kentucky

#<Insert code answer here>