# 118th Congress Data

This notebook is meant to call, clean, and examine data from the 2022 redistricting process, to compare to previous years and examine fairness across maps.

It feeds into a larger project about fairness in redistricting; for instance, if an R+15 district is unattainable for a Democrat to win, then the district can be wholly classified as "safe," and should be bucketed with R+30 district. When data from previous years is projected onto this map, hopefully it will generate a picture of the relative fairness of these maps in context with their previous counterparts.

## Upload data from 538

This project originally pulled all data from 538; however, after noticing inconsistencies with PVI (likely caused by too much rounding by 538 and then by me), this has become a secondary data source for gaps in the primary dataset collected through Dave's Redistricting

Source Link: https://projects.fivethirtyeight.com/redistricting-2022-maps/

The most recent version of this dataset was pulled in March 2022, prior to the release of MO and NH

In [1]:
import requests
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:
read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021_6.csv")

In [3]:
#correct florida
read_538["map_approved"] = np.where(read_538["map"]=="P000C0109", True, read_538["map_approved"])
#maryland new map is "SB1012"

## Clean and Organize the District Data

In [4]:
import plotnine as p9
import warnings
warnings.filterwarnings('ignore')

In [5]:
#check the data
pvi_118 = read_538
print(pvi_118.head(n=8))
print(pvi_118.shape)

  state  map   district          metric      value  map_approved
0    AK  117         AL             pvi -14.620280          True
1    AK  117         AL     competitive   0.000000          True
2    AK  117  statewide  efficiency_gap -39.476448          True
3    AK  117  statewide          median   0.000000          True
4    AL  117         01     competitive   0.000000         False
5    AL  117         01             pvi -31.938510         False
6    AL  117         02     competitive   0.000000         False
7    AL  117         02             pvi -33.012810         False
(26611, 6)


In [6]:
#eliminate unapproved and dated maps
pvi_118 = pvi_118[pvi_118["map_approved"] == True]
#import previously disposed of maps
XOH = read_538[read_538["map"] == "senate_gop_proposal_2"]
XOH.state = "XOH"
XNC = read_538[read_538["map"] == "cst_13"]
XNC.state = "XNC"
XMD = read_538[(read_538["map"] == "final_plan") & (read_538["state"] == "MD")]
XMD.state = "XMD"
#pvi_118 = pd.concat([pvi_118,XNC,XOH,XMD])
#limit to only the pvi rows (it includes several other types of data per proposed district)
pvi_118 = pvi_118[pvi_118["metric"] == "pvi"]
pvi_118.shape

(425, 6)

In [7]:
pvi_118.state.unique().shape
#Missing NH and MO, who have yet to update their maps

(48,)

In [8]:
#remove zeroes from the district number to match formats
pvi_118["district"] = pvi_118['district'].str.lstrip("0")
#create an ST column
pvi_118 = pvi_118.rename(columns={"state": "ST"})
#create the district code variable
pvi_118["ST#"] = pvi_118["ST"] + pvi_118["district"]

In [9]:
#pull out district lean
pvi_118["lean"] = np.where(pvi_118["value"] <= 0, "R", "D")
pvi_118["lean"].unique()
#create a standard PVI column and a rounded PVI Value column
#pvi_118["pvi_value"] = round((abs(pvi_118["value"])),0).map(str).str.rstrip(".0")
#pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value
#the metric is a decimal representation  of PVI from 0 to 1
#pvi_118["metric"] = ((-1*(round(pvi_118['value']/2))) + 50) / 100

array(['R', 'D'], dtype=object)

In [10]:
#add static datapoints
pvi_118["year"] = 2022
pvi_118["congress"] = 118

In [11]:
#create a standard PVI column and a rounded PVI Value column
pvi_118["pvi_value"] = round((abs(pvi_118["value"]/2)),0)
pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value.map(str)
pvi_118["PVI"] = pvi_118["PVI"].str.split(".").str[0]
pvi_118["PVI"] = np.where(pvi_118["PVI"].str[-1] == "+0", 'EVEN', pvi_118["PVI"])
#the metric is a decimal representation  of PVI from 0 to 1
pvi_118["pvi_value"] = np.where(pvi_118["value"] < 0, -1*pvi_118["pvi_value"],pvi_118["pvi_value"])
pvi_118["metric"] = ((-1*(round(pvi_118['pvi_value']))) + 50) / 100

## Export clean versions of the data

In [12]:
#create exclusion for OV maps
ov_maps = ["XOH","XNC","XMD","XNY","XKS"]
#export the 538 Whole dataset
data_118 = pvi_118[-pvi_118["ST"].isin(ov_maps)]
data_118 = data_118[["year","congress","ST","ST#","PVI","metric"]]
data_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/data_118_538.csv",index=False)

In [13]:
#export OV maps seperatley
overturned_maps = pvi_118[pvi_118["ST"].isin(ov_maps)]
overturned_maps = overturned_maps[["year","congress","ST","ST#","PVI","metric"]]
overturned_maps.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/ovs_118.csv",index=False)

# Create a Dataset for Incomplete States

In [14]:
#create a dataset exclusively for unfinished state maps
prev_states = ["NH","MO"]
unfinished = read_538[read_538["state"].isin(prev_states)]
NH = unfinished[unfinished["map"] == "governors_proposal"] #alt "house_gop_proposal"
MO = unfinished[unfinished["map"] == "senate_amendment_6"]
previews = pd.concat([NH,MO])
previews = previews[previews['metric'] == 'pvi']
#clean the data as we did for the whole set
previews["district"] = previews['district'].str.lstrip("0")
#create an ST column
previews = previews.rename(columns={"state": "ST"})
#create the district code variable
previews["ST#"] = previews["ST"] + previews["district"]
previews["lean"] = np.where(previews["value"] <= 0, "R", "D")

In [15]:
#create a standard PVI column and a rounded PVI Value column
previews["pvi_value"] = round((abs(previews["value"]/2)),0)
previews["PVI"] = previews.lean + "+" + previews.pvi_value.map(str)
previews["PVI"] = previews["PVI"].str.split(".").str[0]
previews["PVI"] = np.where(previews["PVI"].str[-1] == "+0", 'EVEN', previews["PVI"])
#the metric is a decimal representation  of PVI from 0 to 1
previews["pvi_value"] = np.where(previews["value"] < 0, -1*previews["pvi_value"],previews["pvi_value"])
previews["metric"] = ((-1*(round(previews['pvi_value']))) + 50) / 100

In [16]:
#pull out district lean
#previews["lean"] = np.where(previews["value"] <= 0, "R", "D")
#previews["lean"].unique()
#create a standard PVI column and a rounded PVI Value column
#previews["pvi_value"] = round((abs(previews["value"])),0).map(str).str.rstrip(".0")
#previews["PVI"] = previews.lean + "+" + previews.pvi_value
#the metric is a decimal representation  of PVI from 0 to 1
#previews["metric"] = ((-1*(round(previews['value']/2))) + 50) / 100

In [17]:

#rename the metric column to match the other datasets
#the metric is a decimal representation  of PVI from 0 to 1
previews["metric"] = ((-1*(round(previews['pvi_value']))) + 50) / 100
#add static points
previews["year"] = 2022
previews["congress"] = 118
#simplify
pre_maps = previews[["year","congress","ST","ST#","PVI","metric"]]
#pre_maps.ST = pre_maps.ST + " (Anticipated)"
#now that the dataset reflects the original, we can exportit very
pre_maps.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/previews_118.csv",index=False)

In [18]:
previews.ST.unique()

array(['NH', 'MO'], dtype=object)