# How close was the 2020 US Presidential Election?



## Introduction

Before diving into this question, let us define some key concepts first.

### The Electoral College

It is often said that the US Presidential Election system, especially the [Electoral College](https://en.wikipedia.org/wiki/United_States_Electoral_College) mechanism, is a bad system. In summary, the Electoral College is the part of the process where the vote of each states are translated to a number of "electors", vaguely proportional to the population of the state. These electors then perform a second turn of voting whose result is the actual result of the election.

Common issues with the Electoral College includes:

- Most states use a "winner-take-all" system, which can discourage voters of states where the outcome is not expected to change
- The number of electors of each state is not proportional to the population of each state, giving an artificial advantage to some states
- The popular vote winner may not be the actual winner, even with a significant margin. This is not a trivial quirk of the system, as it already happened twice this century (as of 2021).
- More than 3 million people in the [US Territories](https://en.wikipedia.org/wiki/Territories_of_the_United_States) cannot vote to elect their leader

The following assumes that the reader has a basic grasp on the working of the Electoral College mechanism.

### Election result stability

Here we introduce a new concept (or a not-new concept, I'm not a social choice theory expert), called **election result stability**. We define it as *the minimal number of voters that would have to change their vote to overturn the result of an election*. It is expressed as a percentage of the total number of voters (excluding spoiled ballots and "None of the above"-type votes).

For example, in the second round of the [2017 French election](https://en.wikipedia.org/wiki/2017_French_presidential_election):
- Emmanuel Macron won with 66.1% ($V_{M}$) of the vote
- Marine Le Pen lost with 33.9% ($V_{LP}$) of the vote

So the election result stability $S$ would trivially be:

$S = \frac{V_{M} - V_{LP}}{2} = 16.1\%$

and indeed, we see that $V_{LP} + S = 50\%$, the score needed to overturn the election.

If this example seems extremely simple, it's because the French presidential election, while not perfect, uses a system that is actually sane and produces a result that is clear for everyone to understand and trust.

With that said, let's return to the US Presidential Election.

## Problem definition

Here we concentrate on the [2020 US Presidential Election](https://en.wikipedia.org/wiki/2020_United_States_presidential_election), won by Joe Biden over Donald Trump by an Electoral College victory of 321-306 (margin: 25) and, anecdotally, by a popular vote victory of 51%-47% (margin: 4%). We will answer the following question:

> What is the election result stability of the election?

or, in other words

> How many voters changing their vote would be sufficient to reverse the election?

This is a surprisingly hard question to answer, as we can imagine a number of different scenarios where the results would be overturned (e.g. California voting for Trump for some reason), but each of them would require a different number of voters changing their vote.

To simplify our analysis, we make a few hypotheses:
- Ignore 3rd party voters (Sorry to Jo Jorgensen, the only female presidential candidate 😬)
- Ignore "None of the Above" voters (But kudos for putting principles before reality)
- Ignore the special rules surrounding Maine and Nebraska's congressional districts (assume all EV go to the state popular winner)
- No faithless electors, i.e. assume the result of the electoral college will be correctly translated in the second turn

Technical note: the first two hypotheses make our model simpler, but won't affect the absolute number of voters change needed. Indeed, the quickest path to overturn the election is converting Biden voters to Trump voters.

With that being said, let's get our hands dirty 🙌

# Data

We will use two datasets for our analysis
- [Results of US Election](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX) by the MIT ElectionLab, retrieved on 2021-03-02
- A [custom-made dataset of the number of electors in each state](electoral-college-electors.csv) (based on [this article](https://www.britannica.com/topic/United-States-Electoral-College-Votes-by-State-1787124))

Also notable, even if not directly used:
- [Official results](https://www.fec.gov/resources/cms-content/documents/2020presgeresults.pdf) by the FEC

# Code

Let's load the data and combine our datasets in a single data structure.

In [13]:
import csv

# Data structure:
# state_data[STATE_NAME] = {
#    'DEMOCRAT': nb of votes for BIDEN
#    'REPUBLICAN': nb of votes for TRUMP
#    'ec_count': nb of electoral college electors
#    'total_relevant_votes': total number of voters (exl. 3rd-party and NOTA voters)
# }
state_data = {}

# Parse election results
with open("1976-2020-president.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row["year"] != "2020":
            continue

        # Ignore 3rd party and NOTA
        if row["party_simplified"] not in ["DEMOCRAT", "REPUBLICAN"]:
            continue

        # Sanity checks
        assert row["office"] == "US PRESIDENT"
        if "BIDEN" not in row["candidate"] and "TRUMP" not in row["candidate"]:
            raise Exception("Unexpected candidate " + str(row))

        # Initialize row in state_data
        state = row["state"]
        if state not in state_data:
            state_data[state] = {}

        # Store result data
        party = row["party_simplified"]
        if party in state_data[state]:
            raise Exception("Duplicate result for " + state + " " + party)
        state_data[state][party] = int(row["candidatevotes"])

# Compute total of relevant votes
for state, data in state_data.items():
    state_data[state]["total_relevant_votes"] = state_data[state]["DEMOCRAT"] + state_data[state]["REPUBLICAN"]

# Parse electoral college data
with open("electoral-college-electors.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        state = row["state"].upper()
        if state not in state_data:
            raise Exception("Unrecognized state", state)
        state_data[state]["ec_count"] = int(row["nbElectors"])

print("Got data for", len(state_data), "states")

Got data for 51 states


As a quick check, let's make sure that we got the same results as the official ones (at least for the electoral college).

In [14]:
# Check if we reproduce the results

electorsForBiden = 0
electorsForTrump = 0
for state, results in state_data.items():
    if results["REPUBLICAN"] > results["DEMOCRAT"]:
        electorsForTrump += results["ec_count"]
    else:
        electorsForBiden += results["ec_count"]

assert electorsForBiden == 306
assert electorsForTrump == 232

print("Reproducing the election results: OK")

Reproducing the election results: OK


Then, let's generate all scenarios that would result in a Trump win, and count the number of voters that would have to vote differently in that scenario.

Technical note: We don't use a particularily clever approach here.
We iterate over all possible paths, with a few optimizations:
- Once the current path requires more voters than the currently known best result, we skip the rest of the branch
- We check for duplicate paths (e.g. ARIZONA-NEVADA and NEVADA-ARIZONA), and skip them when encountered

In [15]:
import math

electorsDiff = electorsForBiden - electorsForTrump
statesForBiden = [
    state for state, data in state_data.items() if data["DEMOCRAT"] > data["REPUBLICAN"]
]

minKnownVotes = 250000000  # Initial value: estimation of the total US population


# Helper function
def sumOfVoteChangesToOverturn(states):
    """ Return the number of voters that needs to change their vote for all the states in `states` to change result """
    acc = 0
    for state in states:
        nbDiffVotes = state_data[state]["DEMOCRAT"] - state_data[state]["REPUBLICAN"]
        if nbDiffVotes % 2 == 0:
            acc += int(nbDiffVotes / 2) + 1
        else:
            acc += math.ceil(nbDiffVotes / 2)
    return acc


# We will iterate over all scenarios in a very naive way that results in a lot of duplicate
# As a quick optimization, we mark the scenarios already encountered to avoid repeating ourselves
already_tried = set()


def getCanonicalForm(path, remainingElectors):
    return "-".join(sorted(path)) + "-" + str(remainingElectors)


# generator that will yield all possible scenarios where the election is overturned
def allPossiblePaths(prevStates, states, electorsDiff):
    # print('all', states, electorsDiff)

    canonicalForm = getCanonicalForm(prevStates, electorsDiff)
    if canonicalForm in already_tried:
        return []
    already_tried.add(canonicalForm)

    if sumOfVoteChangesToOverturn(prevStates) > minKnownVotes:
        return []

    for state in states:
        ecCount = state_data[state]["ec_count"]

        if ecCount <= electorsDiff:
            # Adding this state does not overturn the election
            # Add to path, then continue exploring 
            for nextPath in allPossiblePaths(
                prevStates + [state],
                [s for s in states if s != state],
                electorsDiff - ecCount,
            ):
                yield [state] + nextPath

        if ecCount > electorsDiff:
            # Adding this state overturn the election
            # => Add to path, but stop there
            yield [state]
            continue

bestPath = None

for path in allPossiblePaths([], statesForBiden, electorsDiff):
    pathScore = sumOfVoteChangesToOverturn(path)
    if pathScore < minKnownVotes:
        minKnownVotes = pathScore
        bestPath = path

print("Found best path:", " ".join(bestPath), "with a score of", minKnownVotes, "votes")

print("Details:")
for state in bestPath:
    diff = sumOfVoteChangesToOverturn([state])
    print(
        "\t",
        state,
        ":",
        diff,
        "votes =",
        round(diff * 100 / state_data[state]["total_relevant_votes"], 2),
        "% of the state voters",
    )

# Compare to the total number of voters
totalRelevantVoters = sum([ data["total_relevant_votes"] for data in state_data.values() ] )
electionResultsStability = (minKnownVotes * 100) / totalRelevantVoters

print("\n", "Election result stability:", minKnownVotes, " / ", totalRelevantVoters, "=", round(electionResultsStability, 2), "%")


Found best path: ARIZONA GEORGIA MICHIGAN NEVADA PENNSYLVANIA WISCONSIN with a score of 155633 votes
Details:
	 ARIZONA : 5229 votes = 0.16 % of the state voters
	 GEORGIA : 5890 votes = 0.12 % of the state voters
	 MICHIGAN : 77095 votes = 1.41 % of the state voters
	 NEVADA : 16799 votes = 1.22 % of the state voters
	 PENNSYLVANIA : 40278 votes = 0.59 % of the state voters
	 WISCONSIN : 10342 votes = 0.32 % of the state voters

 Election result stability: 155633  /  155485054 = 0.1 %


## Result

In summary, the answer to our question is:
- 155633 strategically chosen voters could change the result of the election by modifying their vote (From Biden to Trump)
- This represents 0.1% (one tenth of a percent) of the total voters

## Conclusion

The election result stability that was found in this analysis gives us a different sense of how close the election was, and how stable its result is. It can be seen as the "critical path" margin.

In our opinion, it is more accurate than the more immediate values that are often discussed, such as the electoral college margin (5%, aka "a landslide") or the popular vote margin (4.5%). Here, we both see that the election was really close, and that its result was very unstable. However, we also realize that this is a very specific definition of the margin, so it must be discussed with the full context (e.g. only one specific set of changes of 0.1% of votes would overturn the election, not any random changes of that size).

An interesting further work would be to compute the flipside value, the "maximal resilience", i.e. the maximal number of Biden->Trump changes that can happen without overturning the election. Because of the electoral college quirkyness, this number will probably be laughably huge. Note that in a more straightforward system (e.g. the French election discussed above), the stability and the maximal resilience would be equal.