# 117th Congress Data

This notebook is meant to call, clean, and examine data from the 2022 redistricting process, to compare to previous years and examine fairness across maps.

It feeds into a larger project about fairness in redistricting; for instance, if an R+15 district is unattainable for a Democrat to win, then the district can be wholly classified as "safe," and should be bucketed with R+30 district. When data from previous years is projected onto this map, hopefully it will generate a picture of the relative fairness of these maps in context with their previous counterparts.

## Upload data from 538

538 has generously provided a tracker for redistricting, along with a dataset containing a list of ALL proposed maps. The current version of this file was sourced on 2.11.2022, after the Alabama Congressional map was permitted by the Supreme Court. At that time, Florida, Louisiana, Minnesota, Missouri, New Hampshire, North Carolina, Ohio, Pennsylvania, Rhode Island, and Wisconsin did not have approved maps. 

Source Link: https://projects.fivethirtyeight.com/redistricting-2022-maps/

Updated 2.24.2022 after Rhode Island, Minnesota, North Carolina, and Pennsylvania have all passed their new maps. FL, LA, MO, NH, and Ohio still do not have approved maps

North Carolina's data was delayed in the pull

Source Link: https://projects.fivethirtyeight.com/redistricting-2022-maps/

In [1]:
import requests
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:
#read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021.csv")
#read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021_2.csv")
#read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021_3.csv")
read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021_4.csv")

## Clean and Organize the District Data

In [3]:
import numpy as np
import plotnine as p9
from plotnine import ggplot, aes, facet_grid, labs, geom_point, geom_smooth
from sklearn.linear_model import LinearRegression as lm
import warnings
warnings.filterwarnings('ignore')

In [4]:
#check the data
pvi_118 = read_538
print(pvi_118.head(n=8))
print(pvi_118.shape)

  state       map   district          metric      value  map_approved
0    AK       117         01     competitive   0.000000         False
1    AK       117         AL             pvi -14.620280         False
2    AK       117  statewide  efficiency_gap -39.476448         False
3    AK       117  statewide          median   0.000000         False
4    AK  approved         AL             pvi -14.620280         False
5    AL       117         01     competitive   0.000000         False
6    AL       117         01             pvi -31.938510         False
7    AL       117         02     competitive   0.000000         False
(25973, 6)


In [5]:
#eliminate unapproved and dated maps
pvi_118 = pvi_118[pvi_118["map_approved"] == True]
#import previously disposed of maps
XOH = read_538[read_538["map"] == "senate_gop_proposal_2"]
XOH.state = "XOH"
XNC = read_538[read_538["map"] == "cst_13"]
XNC.state = "XNC"
pvi_118 = pd.concat([pvi_118,XNC,XOH])
#import New OH and WI Maps not yet uploaded
WI = read_538[read_538["map"] == "governor_least_change"]
OH = read_538[read_538["map"] == "revised_republican_proposal"]
pvi_118 = pd.concat([pvi_118,WI,OH])
#limit to only the pvi rows (it includes several other types of data per proposed district)
pvi_118 = pvi_118[pvi_118["metric"] == "pvi"]
print(pvi_118.shape)

(414, 6)


In [6]:
#remove zeroes from the district number to match formats
pvi_118["district"] = pvi_118['district'].str.lstrip("0")
pvi_118.head(n=8)

Unnamed: 0,state,map,district,metric,value,map_approved
22,AL,committee_proposal_1,1,pvi,-31.7838,True
29,AL,committee_proposal_1,2,pvi,-33.584348,True
36,AL,committee_proposal_1,3,pvi,-39.114921,True
43,AL,committee_proposal_1,4,pvi,-64.918896,True
50,AL,committee_proposal_1,5,pvi,-32.14665,True
57,AL,committee_proposal_1,6,pvi,-35.985448,True
64,AL,committee_proposal_1,7,pvi,28.640375,True
557,AR,sb743,1,pvi,-43.697592,True


In [7]:
#create an ST column
pvi_118 = pvi_118.rename(columns={"state": "ST"})
#create the district code variable
pvi_118["ST#"] = pvi_118["ST"] + pvi_118["district"]

In [8]:
#possibly unneccessary
#pull out district lean
pvi_118["lean"] = pvi_118["value"]
pvi_118.loc[pvi_118['value'] <= 0, 'lean'] = 'R' 
pvi_118.loc[pvi_118['value'] > 0, 'lean'] = 'D' 
pvi_118["lean"].unique()

array(['R', 'D'], dtype=object)

In [9]:
#create a standard PVI column and a rounded PVI Value column
pvi_118["pvi_value"] = round((abs(pvi_118["value"])),0).map(str).str.rstrip(".0")
pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value
pvi_118.head(7)

Unnamed: 0,ST,map,district,metric,value,map_approved,ST#,lean,pvi_value,PVI
22,AL,committee_proposal_1,1,pvi,-31.7838,True,AL1,R,32,R+32
29,AL,committee_proposal_1,2,pvi,-33.584348,True,AL2,R,34,R+34
36,AL,committee_proposal_1,3,pvi,-39.114921,True,AL3,R,39,R+39
43,AL,committee_proposal_1,4,pvi,-64.918896,True,AL4,R,65,R+65
50,AL,committee_proposal_1,5,pvi,-32.14665,True,AL5,R,32,R+32
57,AL,committee_proposal_1,6,pvi,-35.985448,True,AL6,R,36,R+36
64,AL,committee_proposal_1,7,pvi,28.640375,True,AL7,D,29,D+29


In [10]:
#rename the metric column to match the other datasets
#the metric is a decimal representation  of PVI from 0 to 1
pvi_118["metric"] = ((-1*(round(pvi_118['value']/2))) + 50) / 100
pvi_118.metric.unique()

array([0.66, 0.67, 0.7 , 0.82, 0.68, 0.36, 0.72, 0.59, 0.64, 0.53, 0.58,
       0.28, 0.49, 0.62, 0.54, 0.37, 0.61, 0.34, 0.43, 0.33, 0.25, 0.46,
       0.14, 0.12, 0.29, 0.23, 0.27, 0.32, 0.65, 0.42, 0.45, 0.38, 0.44,
       0.35, 0.31, 0.39, 0.18, 0.3 , 0.52, 0.19, 0.26, 0.48, 0.47, 0.57,
       0.63, 0.51, 0.4 , 0.6 , 0.69, 0.2 , 0.73, 0.24, 0.15, 0.71, 0.8 ,
       0.13, 0.55, 0.5 , 0.22, 0.78, 0.21, 0.17, 0.77, 0.11, 0.75, 0.56,
       0.41, 0.16, 0.74])

In [11]:
#add the single district states into the dataframe
state_118 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/state_pvi/state_118.csv")
#extract the Single District States
sds_rows = state_118[state_118["ST"].isin(["VT", "DE", "WY", "ND", "SD", "AK"])]
sds_rows = sds_rows.drop(columns="year")
sds_rows["ST#"] =  sds_rows["ST"] + "AL"
pvi_118 = pd.concat([pvi_118,sds_rows]).sort_values("ST")
pvi_118.tail()

Unnamed: 0,ST,map,district,metric,value,map_approved,ST#,lean,pvi_value,PVI
19644,XOH,senate_gop_proposal_2,11,0.22,55.053566,False,XOH11,D,55,D+55
19651,XOH,senate_gop_proposal_2,12,0.68,-35.316645,False,XOH12,R,35,R+35
19658,XOH,senate_gop_proposal_2,13,0.52,-4.45936,False,XOH13,R,4,R+4
19574,XOH,senate_gop_proposal_2,1,0.52,-3.323891,False,XOH1,R,3,R+3
19665,XOH,senate_gop_proposal_2,14,0.58,-15.607607,False,XOH14,R,16,R+16


In [12]:
#add static points
pvi_118["year"] = 2022
pvi_118["congress"] = 118

In [13]:
pvi_118

Unnamed: 0,ST,map,district,metric,value,map_approved,ST#,lean,pvi_value,PVI,year,congress
1,AK,,,0.59,,,AKAL,,,R+9,2022,118
22,AL,committee_proposal_1,1.0,0.66,-31.7838,True,AL1,R,32.0,R+32,2022,118
29,AL,committee_proposal_1,2.0,0.67,-33.584348,True,AL2,R,34.0,R+34,2022,118
36,AL,committee_proposal_1,3.0,0.7,-39.114921,True,AL3,R,39.0,R+39,2022,118
43,AL,committee_proposal_1,4.0,0.82,-64.918896,True,AL4,R,65.0,R+65,2022,118
50,AL,committee_proposal_1,5.0,0.66,-32.14665,True,AL5,R,32.0,R+32,2022,118
57,AL,committee_proposal_1,6.0,0.68,-35.985448,True,AL6,R,36.0,R+36,2022,118
64,AL,committee_proposal_1,7.0,0.36,28.640375,True,AL7,D,29.0,D+29,2022,118
557,AR,sb743,1.0,0.72,-43.697592,True,AR1,R,44.0,R+44,2022,118
564,AR,sb743,2.0,0.59,-17.248995,True,AR2,R,17.0,R+17,2022,118


## Export clean versions of the data

In [14]:
#create a dataset solely to correlate pvi with the holder of the seat
pure_118 = pvi_118[["year","metric"]]
pure_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/pure_datasets/pure_118.csv",index=False)
#create a more detailed dataset for greater uses
data_118 = pvi_118[["year","congress","ST","ST#","PVI","metric"]]
data_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/full_districts/data_118.csv",index=False)

# Create a Dataset for Incomplete States

In [15]:
#create a dataset exclusively for unfinished state maps
prev_states = ["FL","LA","NH","MO"]
unfinished = read_538[read_538["state"].isin(prev_states)]
NH = unfinished[unfinished["map"] == "house_gop_proposal"]
MO = unfinished[unfinished["map"] == "sb5_amended"]
LA = unfinished[unfinished["map"] == "senate_amendment_3"]
FL = unfinished[unfinished["map"] == "H000C8019"]
previews = pd.concat([NH,MO,LA,FL])
previews = previews[previews['metric'] == 'pvi']
#clean the data as we did for the whole set
previews["district"] = previews['district'].str.lstrip("0")
#create an ST column
previews = previews.rename(columns={"state": "ST"})
#create the district code variable
previews["ST#"] = previews["ST"] + previews["district"]
previews.loc[previews['value'] <= 0, 'lean'] = 'R' 
previews.loc[previews['value'] > 0, 'lean'] = 'D' 
#create a standard PVI column and a rounded PVI Value column
previews["pvi_value"] = round((abs(previews["value"])),0).map(str).str.rstrip(".0")
previews["PVI"] = previews.lean + "+" + previews.pvi_value
#rename the metric column to match the other datasets
#the metric is a decimal representation  of PVI from 0 to 1
previews["metric"] = ((-1*(round(previews['value']/2))) + 50) / 100
#add static points
previews["year"] = 2022
previews["congress"] = 118
pre_maps = previews[["year","congress","ST","ST#","PVI","metric"]]
pre_maps.ST = pre_maps.ST + " (Anticipated)"
#now that the dataset reflects the original, we can exportit very
pre_maps.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/full_districts/previews_118.csv",index=False)