# Upload Data from Dave's Redistricting

Due to discrepancies in 538's data and untraceable sourcing, I have pulled together public data from Dave's Redistricting App on 3.20.2022.

At the point of initial pull, Missouri and New Hampshire still had incomplete maps. NY and KS were in legal flux, and KY and WV have bad data.

Also, Kentucky and West Virginia, for some reason, did not yet have data for the 2020 Presidential election yet. Ohio's new map was also not included. For these states, we'll still have to use 538's less traceable data.

In [1]:
import requests
import pandas as pd
import os
import glob
import numpy as np
import math
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:
#load the files into a single dataframe
path = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/daves_pvis"
all_files = sorted(glob.glob(path + "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0).reset_index()
    df["ST"] = filename[-6:-4]
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=False)

In [3]:
#cut the frame and fix the columns
df = frame[["ST","index","Devation","Dem"]]
df.columns = ["ST","district","dem_pct","gop_pct"]
#remove nonumeric rows
possible_numbers = list([str(i) for i in range(53)])
df = df[df["district"].isin(possible_numbers)]
df.shape[0]

411

We have 11 states that should be missing:
- 3 incomplete maps, FL, NH, MO, with 38 between them
- 2 bad data states, KY and WV with 8 between them
- 6 single district states

In [4]:
#check if the # of districts should be 0
435-(df.shape[0]+(38+8+6))

-28

# Clean the Dave's data into our standardized format

In [5]:
#Input standard text values
df["year"] = "2022"
df["congress"] = "118"
df["ST#"] = df["ST"] + df["district"]
#compute metric
df["raw_metric"] = (.50 + df["gop_pct"] - .4831) * 100
df["round_metric"] = df["raw_metric"].apply(np.around)
df["metric"] = round(df["raw_metric"]/100,2)
#compute PVI
df["lean"] = np.where(df["metric"] > .5, "R", "D")
df['PVI'] = df['lean'] + "+" + ((((df['round_metric']-50)).abs())).fillna(1000).astype(int).astype(str)
df['PVI'] = df['PVI'].replace("D+0","EVEN")
#df['PVI'] = np.where(df['PVI'].str[-1]=="+","EVEN",df['PVI'])
#df["metric"] = df['PVI'].str.split("+").str[1]
#df['metric'] = np.where(df['PVI']=="EVEN","50",df['metric'])
#df["metric"] = df["metric"].astype(int)
#df["metric"] = np.where(df["lean"]=="D", (df["metric"]/100))
#checker = df[["raw_metric","round_metric","metric","PVI"]]
#checker

In [6]:
#df[abs(df["round_metric"] - df["raw_metric"]) > .4]
#df

In [7]:
#create a clean version of the data to match other outputs
pvi_118 = df[["year","congress","ST","ST#","PVI","metric"]]

# Load in 538 Data to fill in gaps

In [8]:
#load in filepaths for 538's data
path_538 = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/data_118_538.csv"
path_pre = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/previews_118.csv"
path_st =  "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/state_pvi/state_118.csv"

In [9]:
#grab states with maps not available in Dave's
no_data_states = ["KY","WV"]
load_538 = pd.read_csv(path_538)
no_data_rows = load_538[load_538["ST"].isin(no_data_states)]
#pull in maps not yet passed
pre_rows = pd.read_csv(path_pre)
#pull in single states
state_118 = pd.read_csv(path_st)
#extract the Single District States
sds = ["VT", "DE", "WY", "ND", "SD", "AK"]
sds_rows = state_118[state_118["ST"].isin(sds)]
sds_rows = sds_rows.drop(columns="year")
sds_rows["ST#"] =  sds_rows["ST"] + "AL"
sds_rows["lean"] = np.where(sds_rows["ST"].isin(["DE","VT"]), "D","R")
sds_rows["year"] = 2022
sds_rows["congress"] = 118

In [10]:
pvi_118 = pd.concat([pvi_118,no_data_rows,pre_rows,sds_rows]).sort_values("ST")
print(pvi_118.shape)
pvi_118.ST.unique()
#MD and OH excluded

(435, 7)


array(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI',
       'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI',
       'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV',
       'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT',
       'VA', 'VT', 'WA', 'WI', 'WV', 'WY'], dtype=object)

In [11]:
print(pvi_118.shape)
print(pvi_118.ST.unique().shape)

(435, 7)
(50,)


Now we have something to represent all 435 Districts, we can export a dataset with the best possible estimates of what the 2022 map will look like

In [12]:
pvi_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/full_districts/data_118.csv",index=False)

In [13]:
pvi_118[pvi_118["metric"].isna()]

Unnamed: 0,year,congress,ST,ST#,PVI,metric,lean
