# 117th Congress Data

This notebook is meant to call, clean, and examine data from the 2022 redistricting process, to compare to previous years and examine fairness across maps.

It feeds into a larger project about fairness in redistricting; for instance, if an R+15 district is unattainable for a Democrat to win, then the district can be wholly classified as "safe," and should be bucketed with R+30 district. When data from previous years is projected onto this map, hopefully it will generate a picture of the relative fairness of these maps in context with their previous counterparts.

## Upload data from 538

This project originally pulled all data from 538; however, after noticing inconsistencies with PVI (likely caused by too much rounding by 538 and then by me), this has become a secondary data source for gaps in the primary dataset collected through Dave's Redistricting

Source Link: https://projects.fivethirtyeight.com/redistricting-2022-maps/

The most recent version of this dataset was pulled in March 2022, prior to the release of maps in FL, LA, MO, and NH

In [1]:
import requests
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:

read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021_4.csv")

## Clean and Organize the District Data

In [3]:
import numpy as np
import plotnine as p9
import warnings
warnings.filterwarnings('ignore')

In [4]:
#check the data
pvi_118 = read_538
print(pvi_118.head(n=8))
print(pvi_118.shape)

  state       map   district          metric      value  map_approved
0    AK       117         01     competitive   0.000000         False
1    AK       117         AL             pvi -14.620280         False
2    AK       117  statewide  efficiency_gap -39.476448         False
3    AK       117  statewide          median   0.000000         False
4    AK  approved         AL             pvi -14.620280         False
5    AL       117         01     competitive   0.000000         False
6    AL       117         01             pvi -31.938510         False
7    AL       117         02     competitive   0.000000         False
(25973, 6)


In [5]:
#eliminate unapproved and dated maps
pvi_118 = pvi_118[pvi_118["map_approved"] == True]
#import previously disposed of maps
XOH = read_538[read_538["map"] == "senate_gop_proposal_2"]
XOH.state = "XOH"
XNC = read_538[read_538["map"] == "cst_13"]
XNC.state = "XNC"
pvi_118 = pd.concat([pvi_118,XNC,XOH])
#import New OH and WI Maps not categorized like the others
WI = read_538[read_538["map"] == "governor_least_change"]
OH = read_538[read_538["map"] == "revised_republican_proposal"]
pvi_118 = pd.concat([pvi_118,WI,OH])
#limit to only the pvi rows (it includes several other types of data per proposed district)
pvi_118 = pvi_118[pvi_118["metric"] == "pvi"]
pvi_118.shape

(414, 6)

In [6]:
#remove zeroes from the district number to match formats
pvi_118["district"] = pvi_118['district'].str.lstrip("0")
#create an ST column
pvi_118 = pvi_118.rename(columns={"state": "ST"})
#create the district code variable
pvi_118["ST#"] = pvi_118["ST"] + pvi_118["district"]

In [7]:
#pull out district lean
pvi_118["lean"] = np.where(pvi_118["value"] <= 0, "R", "D")
pvi_118["lean"].unique()
#create a standard PVI column and a rounded PVI Value column
#pvi_118["pvi_value"] = round((abs(pvi_118["value"])),0).map(str).str.rstrip(".0")
#pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value
#the metric is a decimal representation  of PVI from 0 to 1
#pvi_118["metric"] = ((-1*(round(pvi_118['value']/2))) + 50) / 100

array(['R', 'D'], dtype=object)

In [8]:
#add static datapoints
pvi_118["year"] = 2022
pvi_118["congress"] = 118

In [9]:
#create a standard PVI column and a rounded PVI Value column
pvi_118["pvi_value"] = round((abs(pvi_118["value"]/2)),0)
pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value.map(str)
pvi_118["PVI"] = pvi_118["PVI"].str.split(".").str[0]
pvi_118["PVI"] = np.where(pvi_118["PVI"].str[-1] == "+0", 'EVEN', pvi_118["PVI"])
#the metric is a decimal representation  of PVI from 0 to 1
pvi_118["pvi_value"] = np.where(pvi_118["value"] < 0, -1*pvi_118["pvi_value"],pvi_118["pvi_value"])
pvi_118["metric"] = ((-1*(round(pvi_118['pvi_value']))) + 50) / 100

In [10]:
#add the single district states into the dataframe
state_118 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/state_pvi/state_118.csv")
#extract the Single District States
sds = ["VT", "DE", "WY", "ND", "SD", "AK"]
sds_rows = state_118[state_118["ST"].isin(sds)]
sds_rows = sds_rows.drop(columns="year")
sds_rows["ST#"] =  sds_rows["ST"] + "AL"
sds_rows["lean"] = np.where(sds_rows["ST"].isin(["DE","VT"]), "D","R")
pvi_118 = pd.concat([pvi_118,sds_rows]).sort_values("ST")

## Export clean versions of the data

In [11]:
#create exclusion for OV maps
ov_maps = ["XOH","XNC"]
#export the 538 Whole dataset
data_118 = pvi_118[-pvi_118["ST"].isin(ov_maps)]
data_118 = data_118[["year","congress","ST","ST#","PVI","metric"]]
data_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/data_118_538.csv",index=False)

In [12]:
#export OV maps seperatley
overturned_maps = pvi_118[pvi_118["ST"].isin(ov_maps)]
overturned_maps = overturned_maps[["year","congress","ST","ST#","PVI","metric"]]
overturned_maps.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/ovs_118.csv",index=False)

# Create a Dataset for Incomplete States

In [13]:
#create a dataset exclusively for unfinished state maps
prev_states = ["FL","LA","NH","MO"]
unfinished = read_538[read_538["state"].isin(prev_states)]
NH = unfinished[unfinished["map"] == "house_gop_proposal"]
MO = unfinished[unfinished["map"] == "sb5_amended"]
LA = unfinished[unfinished["map"] == "senate_amendment_3"]
FL = unfinished[unfinished["map"] == "H000C8019"]
previews = pd.concat([NH,MO,LA,FL])
previews = previews[previews['metric'] == 'pvi']
#clean the data as we did for the whole set
previews["district"] = previews['district'].str.lstrip("0")
#create an ST column
previews = previews.rename(columns={"state": "ST"})
#create the district code variable
previews["ST#"] = previews["ST"] + previews["district"]
previews["lean"] = np.where(previews["value"] <= 0, "R", "D")

In [14]:
#create a standard PVI column and a rounded PVI Value column
previews["pvi_value"] = round((abs(previews["value"]/2)),0)
previews["PVI"] = previews.lean + "+" + previews.pvi_value.map(str)
previews["PVI"] = previews["PVI"].str.split(".").str[0]
previews["PVI"] = np.where(previews["PVI"].str[-1] == "+0", 'EVEN', previews["PVI"])
#the metric is a decimal representation  of PVI from 0 to 1
previews["pvi_value"] = np.where(previews["value"] < 0, -1*previews["pvi_value"],previews["pvi_value"])
previews["metric"] = ((-1*(round(previews['pvi_value']))) + 50) / 100

In [15]:
#pull out district lean
#previews["lean"] = np.where(previews["value"] <= 0, "R", "D")
#previews["lean"].unique()
#create a standard PVI column and a rounded PVI Value column
#previews["pvi_value"] = round((abs(previews["value"])),0).map(str).str.rstrip(".0")
#previews["PVI"] = previews.lean + "+" + previews.pvi_value
#the metric is a decimal representation  of PVI from 0 to 1
#previews["metric"] = ((-1*(round(previews['value']/2))) + 50) / 100

In [16]:

#rename the metric column to match the other datasets
#the metric is a decimal representation  of PVI from 0 to 1
previews["metric"] = ((-1*(round(previews['pvi_value']))) + 50) / 100
#add static points
previews["year"] = 2022
previews["congress"] = 118
#simplify
pre_maps = previews[["year","congress","ST","ST#","PVI","metric"]]
#pre_maps.ST = pre_maps.ST + " (Anticipated)"
#now that the dataset reflects the original, we can exportit very
pre_maps.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/previews_118.csv",index=False)

In [17]:
previews.PVI.unique()

array(['R+4', 'D+5', 'R+20', 'D+28', 'R+21', 'R+17', 'R+11', 'D+26',
       'R+10', 'R+15', 'R+18', 'D+9', 'R+24', 'R+31', 'R+19', 'R+8',
       'D+3', 'R+13', 'R+5', 'R+12', 'D+8', 'D+11', 'R+1', 'D+2', 'R+2',
       'D+25', 'R+7', 'D+7', 'D+4', 'D+0'], dtype=object)

In [18]:
previews


Unnamed: 0,ST,map,district,metric,value,map_approved,ST#,lean,pvi_value,PVI,year,congress
17058,NH,house_gop_proposal,1,0.54,-8.852442,False,NH1,R,-4.0,R+4,2022,118
17065,NH,house_gop_proposal,2,0.45,9.759909,False,NH2,D,5.0,D+5,2022,118
11132,LA,sb5_amended,1,0.7,-40.379179,False,LA1,R,-20.0,R+20,2022,118
11139,LA,sb5_amended,2,0.22,55.521668,False,LA2,D,28.0,D+28,2022,118
11146,LA,sb5_amended,3,0.71,-42.953529,False,LA3,R,-21.0,R+21,2022,118
11153,LA,sb5_amended,4,0.67,-33.992578,False,LA4,R,-17.0,R+17,2022,118
11160,LA,sb5_amended,5,0.67,-34.297269,False,LA5,R,-17.0,R+17,2022,118
11167,LA,sb5_amended,6,0.61,-22.927649,False,LA6,R,-11.0,R+11,2022,118
15100,MO,senate_amendment_3,1,0.24,51.265477,False,MO1,D,26.0,D+26,2022,118
15107,MO,senate_amendment_3,2,0.6,-19.786188,False,MO2,R,-10.0,R+10,2022,118
