Spending data processing

In this section I will process the data from the FEC to get the total amount spent by each presidential candidate in each state for the 2008, 2012, 2016, 2020, and 2024 elctions. 

Raw data is from here: https://www.fec.gov/data/candidates/president/presidential-map/ 
To get the data yourself, use the dropdown menu to select the electin year, open the tab on the side of the map for spending, and click "Export spending data". 
The files do not have labeled names so you need to label them accordingly, we have followed the pattern "spending_data_{year}.csv". 

In [125]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import scipy
from scipy.stats import wilcoxon
import datetime

Import spending data of presidential candidates for the 2008, 2012, 2016, 2020, and 2024 elctions. 

In [126]:
years = ['2008', '2012', '2016', '2020', '2024']
spending_dfs = dict()
state_sums = dict()
for year in years:
    spending_dfs[year] = pd.read_csv(f"spending_data_{year}.csv", index_col=False, low_memory=False)
    # state_sums[year] = spending_dfs[year].groupby('recipient_st').sum('disb_amt').sort_values('disb_amt')

# spending_24_df = pd.read_csv("spending_data_2024.csv", index_col=False, low_memory=False)
# spending_24_df.groupby('recipient_st').sum('disb_amt').sort_values('disb_amt')
# spending_24_df.head()
# spending_dfs
# state_sums['2024']

Upon inspection some states have been labeled wrong. For example some rows have C, since they occur in san francisco we can assume this should be CA. 

For each year I have a comment labeling which abbreviations need to be replaced, for example in 2008 'AA' needs to be replaced by 'MA'. 

Most were done with a simple dictionary replace, but in 2024 some bad labels were for different states, for example zip codes in Iowa and Indiana were both labled 'I', so these require extra attention to replace them by zip code. 

Then once all errors are corrected we remove all rows where the recipient state is not one of the 50 US states or DC, for example some transactions are labled as "UK" which isn't useful for our analysis. 

In [166]:
# correct 2008 states 
spending_dfs['2008']['recipient_st']= spending_dfs['2008']['recipient_st'].replace({'AA': 'MA', 'C': 'CA', 'I': 'IA', 'II': 'IL', 'K': 'KS', 'KA': 'KS' , 'N': 'NC', 'VW': 'WV', '46': 'IN', 'MY': 'MT', 'OA': 'PA', 'T': 'TX', 'WW': 'WA'})

# AA = MA
# C = CA
# I = IA
# II = IL
# K = KS
# KA = KS 
# N = NC
# VW = WV
# 46 = IN
# MY = MT
# OA = PA
# T = TX
# WW = WA


# correct 2012 states 
spending_dfs['2012']['recipient_st']= spending_dfs['2012']['recipient_st'].replace({'D.': 'DC', 'MY': 'NY', 'HA': 'HI', 'MH': 'NH', 'HN': 'NH'})

# "D." = "DC"
# MY = NY
# HA = HI
# MH = NH
# HN = HN

# correct 2016 states 
spending_dfs['2016']['recipient_st']= spending_dfs['2016']['recipient_st'].replace({'D.': 'DC', 'MY': 'NY', 'HA': 'HI', 'MH': 'NH', 'HN': 'NH'})

# NB = NE
# D. = DC


# correct 2020 states 
# none


# correct 2024 states 
spending_dfs['2024']['recipient_st']= spending_dfs['2024']['recipient_st'].replace({'C': 'CA', 'AA': 'CA', 'F': 'FL', 'G': 'GA', 'T': 'TX'})
spending_dfs['2024'].loc[spending_dfs['2024']['recipient_zip'] == '46038', 'recipient_st'] = 'IN'
spending_dfs['2024']['recipient_st'] = spending_dfs['2024']['recipient_st'].replace('I', 'IA')
spending_dfs['2024'].loc[spending_dfs['2024']['recipient_zip'] == '03276', 'recipient_st'] = 'NH'
spending_dfs['2024'].loc[spending_dfs['2024']['recipient_zip'] == '03063', 'recipient_st'] = 'NH'
spending_dfs['2024']['recipient_st'] = spending_dfs['2024']['recipient_st'].replace('N', 'NY')

# C = CA,
# AA = CA
# F = FL
# G = GA
# T = TX
# I = IN or IA
# N = NH or NY


states = [
        'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'FL', 'GA', 'IA', 
        'IL', 'IN', 'LA', 'MA', 'MD', 'ME', 'MN', 'MO', 'NC', 'NE', 
        'NH', 'NJ', 'NY', 'OK', 'PA', 'SC', 'TN', 'TX', 'VA', 'SD',
        'WA', 'WI', 'WY', 'OH', 'WV', 'AK', 'DE', 'HI', 'ID', 'KS', 
        'KY', 'MI', 'MS', 'MT', 'ND', 'NM', 'NV', 'OR', 'RI', 'UT', 
        'VT'
        ]

# remove transactions not in the 50 states or DC
for year in spending_dfs:
    spending_dfs[year] = spending_dfs[year][spending_dfs[year]['recipient_st'].isin(states)]


Next we want to remove the candidates that aren't either the Democratic or Republican nominee, while other candidates exist they don't receive a significant amount of votes. This will leave us with a dataframe per candidate per election year, so 10 in total. 

In [168]:
party_spending_dfs = {}

# 2008
# Republican: 'McCain, John S'
# Dem : 'Obama, Barack'
# spending_dfs['2008'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'].isin(['McCain, John S', 'Obama, Barack'])]
party_spending_dfs['2008_R'] = spending_dfs['2008'][spending_dfs['2008']['cand_nm'] == 'McCain, John S']
party_spending_dfs['2008_D'] = spending_dfs['2008'][spending_dfs['2008']['cand_nm'] == 'Obama, Barack']

# 2012
# Rep: 'Romney, Mitt'
# Dem: 'Obama, Barack'
# spending_dfs['2012'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'].isin(['Romney, Mitt', 'Obama, Barack'])]
party_spending_dfs['2012_R'] = spending_dfs['2012'][spending_dfs['2012']['cand_nm'] == 'Romney, Mitt']
party_spending_dfs['2012_D'] = spending_dfs['2012'][spending_dfs['2012']['cand_nm'] == 'Obama, Barack']

# 2016
# Rep: 'Trump, Donald J.'
# Dem: 'Clinton, Hillary Rodham'
# spending_dfs['2016'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'].isin(['Trump, Donald J.', 'Clinton, Hillary Rodham'])]
party_spending_dfs['2016_R'] = spending_dfs['2016'][spending_dfs['2016']['cand_nm'] == 'Trump, Donald J.']
party_spending_dfs['2016_D'] = spending_dfs['2016'][spending_dfs['2016']['cand_nm'] == 'Clinton, Hillary Rodham']

# 2020
# Rep: 'Trump, Donald J.'
# Dem: 'Biden, Joseph R Jr'
# spending_dfs['2020'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'].isin(['Trump, Donald J.', 'Biden, Joseph R Jr'])]
party_spending_dfs['2020_R'] = spending_dfs['2020'][spending_dfs['2020']['cand_nm'] == 'Trump, Donald J.']
party_spending_dfs['2020_D'] = spending_dfs['2020'][spending_dfs['2020']['cand_nm'] == 'Biden, Joseph R Jr']

# 2024
# Rep: 'Trump, Donald J.'
# Dem: 'Harris, Kamala' 
# spending_dfs['2024'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'].isin(['Trump, Donald J.', 'Harris, Kamala'])]
party_spending_dfs['2024_R'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'] == 'Trump, Donald J.']
party_spending_dfs['2024_D'] = spending_dfs['2024'][spending_dfs['2024']['cand_nm'] == 'Harris, Kamala']


At the end we check to see if any states/DC are missing data, and see both 2024 

In [None]:
for year in party_spending_dfs:
    print(year, len(party_spending_dfs[year]['recipient_st'].unique()))

In [139]:
# print(spending_dfs['2024']['recipient_st'].unique())

for state in states:
    if state not in party_spending_dfs['2024_R']['recipient_st'].unique():
        print(state)

SD


In [131]:
# print(spending_dfs.keys())
# print(party_spending_dfs.keys())
party_years = list(party_spending_dfs.keys())

In [165]:
# spending_dfs["2024"]['new_col'] = spending_dfs["2024"]['recipient_st'] + "_2024"
# spending_dfs["2024"]

for year in party_years:
    # print(len(spending_dfs[year]['recipient_st'].unique()))
    state_sums[year] = party_spending_dfs[year].groupby('recipient_st').sum('disb_amt').sort_values('disb_amt')
    state_sums[year] = state_sums[year].drop('file_num', axis=1)


for year, st_sum in state_sums.items():
    st_sum['state_year'] = st_sum.index + f"_{year}"


For 2024, neither Harris nor Trump recorded spending in South Dakota. so we add rows for this missing data to keep it consistent with previous years. 

In [163]:
state_sums['2024_D'].loc['SD']=[0, "SD_2024_D"]
state_sums['2024_R'].loc['SD']=[0, "SD_2024_R"]


In [164]:
all_state_sums_df = pd.concat(state_sums.values())
all_state_sums_df

Unnamed: 0_level_0,disb_amt,state_year
recipient_st,Unnamed: 1_level_1,Unnamed: 2_level_1
ND,2.982530e+03,ND_2008_R
WY,4.573600e+03,WY_2008_R
ID,6.016700e+03,ID_2008_R
HI,6.673290e+03,HI_2008_R
DE,6.685440e+03,DE_2008_R
...,...,...
CA,1.724403e+07,CA_2024_D
PA,1.908446e+07,PA_2024_D
GA,1.399371e+08,GA_2024_D
DC,2.110289e+08,DC_2024_D
