# Motivation
COVID has turned the world upside down and in this country the response varied across locales. I am interested in seeing what factors impacted different communities case rates.
Given the poor showing by this and other developed nations in our response to the pandemic, it will be useful from a scientific, practical and human-centered perspective to try and glean any lessons we can about what factors led communities to perform or underperform on critical metrics such as infection rate and infection fatality rate.

# Data
*[County level vote counts for 2016 president](https://electionlab.mit.edu/data) 

*[County level covid infections and fatalities by date](https://github.com/nytimes/covid-19-data) 

*[County population estimates](https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html)

*[County poverty and median income estimates](https://www.census.gov/data/datasets/2016/demo/saipe/2016-state-and-county.html)

*[County land area to establish density](https://www2.census.gov/library/publications/2011/compendia/usa-counties/excel/LND01.xls)

# Unknowns
If there are holes or other abnormalities in the datasets.

# Research questions:
The main hypothesis that will be tested is:
After controlling for likely confounders such as county poverty rate, county median household income, county population density and county population total, is there an effect from voter ratio of trump to clinton in that county in the 2016 election on fatalities attributed to Covid w/in that county (obviously it would be ideal to have 2020 biden/trump ratio as well, and if this becomes available I will use it). This effect will be explored in a couple of ways, discussed under the methodology.


# Background/Existing work:
I haven't found this analysis performed anywhere yet, but [this study](https://www.medrxiv.org/content/10.1101/2020.10.16.20213892v2) is similar to the one I will perform only at the state level rather than county. [Here is another similar study](https://www.nature.com/articles/s41562-020-00977-7#data-availability), which looked at smartphone data to test for an association between voting for Trump and reduced social distancing. The authors established this effect, and found that it also associated to increases in infection and fatality growth rates.

In June the media started writing a lot about the increase of Covid cases in red states. For instance, [this article](https://apnews.com/article/7aa2fcf7955333834e01a7f9217c77d2) shows red states matching blue states in new daily cases per million by mid June. Also, [here is a Washington post article](https://www.washingtonpost.com/politics/2020/06/24/shift-coronavirus-primarily-red-states-is-complete-its-not-that-simple/) from late June discussing differences in new cases in red and blue states and counties.

[Here is an article](https://thewell.unc.edu/2020/06/22/debunking-the-partisan-narrative/) from UNC discussing potential non-partisan drivers that may have contributed to state level differences in Covid responses.



# Methodology:
The most straightforward approach will be to construct a multiple linear regression model with county per capita deaths as response, and use vote proportion and the variables identified as potential confounders as regressors. This will allow for the effect of voting proportion within a county on per capita deaths due to covid to be tested while controlling for the potentially confounding variables. 
Evidence of an effect from voting proportion on covid deaths will have changed significantly over time. For instance, the US epidemic was seeded first in major cities which tend to lean heavily democratic, and took time to spread to through the rest of the country. Also, it took time for epidemic response to become politicized and for partisan narratives to become established. Because of this it will also be interesting to perform the analysis at different periods over the last 9 months to see how the effect has changed over time. 

In [None]:
#global config
ALWAYS_REFRESH = False

#imports
import pandas as pd
import urllib
from os import path

COUNTY_COVID_URL = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
COUNTY_COVID_CSV = 'county_covid.csv'
COUNTY_POPULATION_ESTIMATES_URL = 'https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv'
COUNTY_POPULATION_CSV = 'county_population.csv'
COUNTY_VOTES_URL = 'https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ#'
COUNTY_VOTES_CSV = 'county_votes_2016.csv'

def download_file(url, fname,overwrite=False):
    if overwrite or not path.exists(fname):
        urllib.request.urlretrieve(url, fname)

download_file(COUNTY_COVID_URL, COUNTY_COVID_CSV, ALWAYS_REFRESH)
download_file(COUNTY_POPULATION_ESTIMATES_URL, COUNTY_POPULATION_CSV)
download_file(COUNTY_VOTES_URL, COUNTY_VOTES_CSV)

county_covid_df = pd.read_csv(COUNTY_COVID_CSV, sep = ',', dtype= {'fips': str})

county_population_df = pd.read_csv(COUNTY_POPULATION_CSV, sep = ',', encoding = "ISO-8859-1", dtype= {'STATE': str, 'COUNTY': str})
county_population_df['fips'] = county_population_df['STATE']+county_population_df['COUNTY']
county_population_df = county_population_df[['fips', 'POPESTIMATE2019']]

county_votes_df = pd.read_csv(COUNTY_VOTES_CSV, sep = ',', dtype= {'STATE': str, 'COUNTY': str})

most_recent_county_covid_data = list(county_covid_df.groupby(county_covid_df['date']))[-1:][0][1]

def merge_tables(covid, population):
    return covid.merge(
    population, on='fips', how="inner")
t_step_1_merge = merge_tables(most_recent_county_covid_data, county_population_df)
print(t_step_1_merge)