# 117th Congress Data

This notebook is meant to call, clean, and examine data from the 2022 redistricting process, to compare to previous years and examine fairness across maps.

It feeds into a larger project about fairness in redistricting; for instance, if an R+15 district is unattainable for a Democrat to win, then the district can be wholly classified as "safe," and should be bucketed with R+30 district. When data from previous years is projected onto this map, hopefully it will generate a picture of the relative fairness of these maps in context with their previous counterparts.

## Upload data from 538

538 has generously provided a tracker for redistricting, along with a dataset containing a list of ALL proposed maps. The current version of this file was sourced on 2.11.2022, after the Alabama Congressional map was permitted by the Supreme Court. At that time, Florida, Louisiana, Minnesota, Missouri, New Hampshire, North Carolina, Ohio, Pennsylvania, Rhode Island, and Wisconsin did not have approved maps. 

Source Link: https://projects.fivethirtyeight.com/redistricting-2022-maps/

In [1]:
import requests
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:
read_538 = pd.read_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/redistricting_data_2021.csv")

## Clean and Organize the District Data

In [3]:
import numpy as np
import plotnine as p9
from plotnine import ggplot, aes, facet_grid, labs, geom_point, geom_smooth
from sklearn.linear_model import LinearRegression as lm
import warnings
warnings.filterwarnings('ignore')

In [4]:
#check the data
pvi_118 = read_538
print(pvi_118.head(n=8))
print(pvi_118.shape)

  state       map   district          metric      value  map_approved
0    AK       117         01     competitive   0.000000         False
1    AK       117         AL             pvi -14.620280         False
2    AK       117  statewide  efficiency_gap -39.476448         False
3    AK       117  statewide          median   0.000000         False
4    AK  approved         AL             pvi -14.620280         False
5    AL       117         01     competitive   0.000000         False
6    AL       117         01             pvi -31.938510         False
7    AL       117         02     competitive   0.000000         False
(24160, 6)


In [5]:
#eliminate unapproved and dated maps
pvi_118 = pvi_118[pvi_118["map_approved"] == True]
#limit to only the pvi rows (it includes several other types of data per proposed district)
pvi_118 = pvi_118[pvi_118["metric"] == "pvi"]
print(pvi_118.shape)

(302, 6)


In [6]:
#remove zeroes from the district number to match formats
pvi_118["district"] = pvi_118['district'].str.lstrip("0")
pvi_118.head(n=8)

Unnamed: 0,state,map,district,metric,value,map_approved
22,AL,committee_proposal_1,1,pvi,-31.7838,True
29,AL,committee_proposal_1,2,pvi,-33.584348,True
36,AL,committee_proposal_1,3,pvi,-39.114921,True
43,AL,committee_proposal_1,4,pvi,-64.918896,True
50,AL,committee_proposal_1,5,pvi,-32.14665,True
57,AL,committee_proposal_1,6,pvi,-35.985448,True
64,AL,committee_proposal_1,7,pvi,28.640375,True
557,AR,sb743,1,pvi,-43.697592,True


In [7]:
#create an ST column
pvi_118 = pvi_118.rename(columns={"state": "ST"})
#create the district code variable
pvi_118["ST#"] = pvi_118["ST"] + pvi_118["district"]

In [8]:
#possibly unneccessary
#pull out district lean
pvi_118["lean"] = pvi_118["value"]
pvi_118.loc[pvi_118['value'] <= 0, 'lean'] = 'R' 
pvi_118.loc[pvi_118['value'] > 0, 'lean'] = 'D' 
pvi_118["lean"].unique()

array(['R', 'D'], dtype=object)

In [9]:
#create a standard PVI column and a rounded PVI Value column
pvi_118["pvi_value"] = round((abs(pvi_118["value"])),0).map(str).str.rstrip(".0")
pvi_118["PVI"] = pvi_118.lean + "+" + pvi_118.pvi_value
pvi_118.head(7)

Unnamed: 0,ST,map,district,metric,value,map_approved,ST#,lean,pvi_value,PVI
22,AL,committee_proposal_1,1,pvi,-31.7838,True,AL1,R,32,R+32
29,AL,committee_proposal_1,2,pvi,-33.584348,True,AL2,R,34,R+34
36,AL,committee_proposal_1,3,pvi,-39.114921,True,AL3,R,39,R+39
43,AL,committee_proposal_1,4,pvi,-64.918896,True,AL4,R,65,R+65
50,AL,committee_proposal_1,5,pvi,-32.14665,True,AL5,R,32,R+32
57,AL,committee_proposal_1,6,pvi,-35.985448,True,AL6,R,36,R+36
64,AL,committee_proposal_1,7,pvi,28.640375,True,AL7,D,29,D+29


In [10]:
#rename the metric column to match the other datasets
#the metric is a decimal representation  of PVI from 0 to 1
pvi_118["metric"] = ((-1*(round(pvi_118['value']))) + 50) / 100
pvi_118["year"] = 2022
pvi_118["congress"] = 118

## Export clean versions of the data

In [11]:
#create a dataset solely to correlate pvi with the holder of the seat
pure_118 = pvi_118[["metric"]]
pure_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/pure_datasets/pure_118.csv",index=False)
#create a more detailed dataset for greater uses
data_118 = pvi_118[["year","congress","ST","ST#","PVI","metric"]]
data_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/full_districts/data_118.csv",index=False)