# Swing States Correlations Analysis

> In God We Trust. All others must bring data.
> -W. Edwards Deming

US presidential elections are decided by the [Electoral College](https://en.wikipedia.org/wiki/United_States_Electoral_College). Swing states' influence over elections is disproportionate to their populations because of the Electoral College & the US's ([mostly](https://www.270towin.com/content/split-electoral-votes-maine-and-nebraska/)) winner-take-all electoral system.

In the 2024 election, seven states are widely regarded as swing states:
- Arizona
- Georgia
- Michigan
- Wisconsin
- North Carolina
- Pennsylvania
- Nevada

Polling averages as of October 18, 2024 indicate that the two candidates are **statistically tied** in these seven states. Additionally, Nebraska's second congressional district is closely divided and could swing the election [in some scenarios](https://www.ft.com/content/714f8c07-3b2f-4862-bdf2-6878ac8c42ca).

Media outlets are rife with speculation about the different possible electoral maps. For example, *The Financial Times* writes:


> But in mathematical terms, there are scores of other pathways to winning the necessary votes in the electoral college, with and without Pennsylvania.
> For example, either candidate could shore up support in the southern swing states, and Harris would score a big win if she could flip North Carolina — a long-standing Democratic target that Trump won by a razor-thin margin in 2016 — and its 16 electoral college votes back to the Democrats’ column.
> Here is one way to enumerate the routes: **there are 128 combinations of possible outcomes in the seven swing states (two candidates to the seventh power)** where polls suggest the races are in effect tied.

While the FT's basic math is correct, the $2^7$ possible outcomes **are not equally likely.** 

If they were equally likely, we'd have to believe that:
1. the probability of Donald Trump or Kamala Harris winning each state is 50%
2. the outcomes are independent of each other

Polling supports proposition #1, but does not address claim #2.

Anecdotally, recent history suggests swing state outcomes may not be independent of each other. Prior to 2016, Wisconsin, Michigan, and Pennsylvania formed part of the so-called Blue Wall of reliably Democratic Midwestern states. In 2016, these states moved *together* into Donald Trump's column; in 2020, they again moved together into Joe Biden's. 

I was interested in checking whether this anecdotal evidence holds up more rigorously, so I accessed data on US presidential elections since 1976 to investigate quantitatively **how strongly swing state outcomes move together** – statisticians and data scientists call this covariance.

## Hypothesis Pre-registration

Recently statisticians and other quantitative researchers have moved towards a norm of pre-registering hypotheses. This is intended to prevent cherry-picking data. To contribute to this norm, I am writing my hypotheses in this notebook BEFORE any analysis. Readers can verify this by referencing the GitHub commit history and examining the "pre-registering hypotheses" commit.

Hypotheses:
1. The presidential election outcome in each swing state are is moderately correlated ($R^2 > 0.3$) with at least one other swing state.
2. Swing state correlations have increased over time, as evidenced by an increase in the moving average correlation coefficient.
3. Wisconsin, Michigan, and Pennsylvania are more highly correlated with each other than with other swing states.

## Data Import and Preparation

In [3]:
import pandas as pd

In [6]:
data_path = "../data/us_elections_1976-2020.csv"

raw_df = pd.read_csv(data_path)

raw_df.head()

Unnamed: 0,year,state,state_po,state_fips,state_cen,state_ic,office,candidate,party_detailed,writein,candidatevotes,totalvotes,version,notes,party_simplified
0,1976,ALABAMA,AL,1,63,41,US PRESIDENT,"CARTER, JIMMY",DEMOCRAT,False,659170,1182850,20210113,,DEMOCRAT
1,1976,ALABAMA,AL,1,63,41,US PRESIDENT,"FORD, GERALD",REPUBLICAN,False,504070,1182850,20210113,,REPUBLICAN
2,1976,ALABAMA,AL,1,63,41,US PRESIDENT,"MADDOX, LESTER",AMERICAN INDEPENDENT PARTY,False,9198,1182850,20210113,,OTHER
3,1976,ALABAMA,AL,1,63,41,US PRESIDENT,"BUBAR, BENJAMIN """"BEN""""",PROHIBITION,False,6669,1182850,20210113,,OTHER
4,1976,ALABAMA,AL,1,63,41,US PRESIDENT,"HALL, GUS",COMMUNIST PARTY USE,False,1954,1182850,20210113,,OTHER


In [8]:
print("\n".join(list(raw_df.columns)))

year
state
state_po
state_fips
state_cen
state_ic
office
candidate
party_detailed
writein
candidatevotes
totalvotes
version
notes
party_simplified


In [9]:
set(raw_df["office"])

{'US PRESIDENT'}