# Upload Data from Dave's Redistricting

Due to discrepancies in 538's data and untraceable sourcing, I have pulled together public data from Dave's Redistricting App on 3.20.2022.

At the point of initial pull, Missouri, Louisiana, New Hampshire, and Florida still had incomplete maps. 

Also, Kentucky and West Virginia, for some reason, did not yet have data for the 2020 Presidential election yet. Ohio's new map was also not included. For these states, we'll still have to use 538's less traceable data.

In [1]:
import requests
import pandas as pd
import os
import glob
import numpy as np
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [2]:
#load the files into a single dataframe
path = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/raw_data/daves_pvis"
all_files = sorted(glob.glob(path + "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0).reset_index()
    df["ST"] = filename[-6:-4]
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=False)

In [3]:
#cut the frame and fix the columns
df = frame[["ST","index","Devation","Dem"]]
df.columns = ["ST","district","dem_pct","gop_pct"]
#remove nonumeric rows
possible_numbers = list([str(i) for i in range(53)])
df = df[df["district"].isin(possible_numbers)]
df.shape

(362, 4)

Without Florida, Ohio, Missouri, Kentucky, Louisiana, New Hampshire having valid data, and the 6 single district states not being in the set, 362 is the exact number we should have

# Clean the Dave's data into our standardized format

In [4]:
#Input standard text values
df["year"] = "2022"
df["congress"] = "118"
df["ST#"] = df["ST"] + df["district"]
#compute metric
df["metric"] = round((.5 + df["gop_pct"] - .4831),2)
#compute PVI
df["lean"] = np.where(df["gop_pct"] > df["dem_pct"], "R", "D")
df['PVI'] = df['lean'] + "+" + (((df['metric']-.5).abs())*100).fillna(1000).astype(int).astype(str).str.rstrip(".0")

In [5]:
#create a clean version of the data to match other outputs
pvi_118 = df[["year","congress","ST","ST#","PVI","metric"]]

# Load in 538 Data to fill in gaps

In [6]:
#load in filepaths for 538's data
path_538 = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/data_118_538.csv"
path_pre = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/previews_118.csv"

In [7]:
#grab states with maps not available in Dave's
no_data_states = ["KY","WV","OH","VT", "DE", "WY", "ND", "SD", "AK"]
load_538 = pd.read_csv(path_538)
no_data_rows = load_538[load_538["ST"].isin(no_data_states)]
#pull in maps not yet passed
pre_rows = pd.read_csv(path_pre)

In [8]:
pvi_118 = pd.concat([pvi_118,no_data_rows,pre_rows]).sort_values("ST")
pvi_118.shape

(435, 6)

Now we have something to represent all 435 Districts, we can export a dataset with the best possible estimates of what the 2022 map will look like

In [None]:
data_118.to_csv("/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/full_districts/data_118.csv",index=False)

In [9]:
#pull in overturned maps
path_ovs = "/Users/xavier/Desktop/DSPP/solo_projects/redistricting_project/clean_data/supplimental/ovs_118.csv"
ov_rows = pd.read_csv(path_ovs)