# How close was the 2020 US Presidential Election?

## The Electoral College

It is often said that the US Presidential Election system, especially the [Electoral College](https://en.wikipedia.org/wiki/United_States_Electoral_College) mechanism, is a bad system. In summary, the Electoral College is the part of the process where the vote of each states are translated to a number of "electors", vaguely proportional to the population of the state, who performs a second turn of voting whose result will actually decide the result of the election.

Common issues with the Electoral College includes:

- Most states use a "winner-take-all" system, which can discourage voters of states where the outcome is known in advance
- The number of electors of each state is not exactly proportional to the population of each state, giving an artificial advantage to some states
- The popular vote winner may not be the actual winner, even with a significant majority. This is not a trivial quirk of the system, as it already happened twice this century (as of 2021).
- More than 3 million people in the [US Territories](https://en.wikipedia.org/wiki/Territories_of_the_United_States) cannot vote to elect their leader

The following assumes that the reader has a basic grasp on the working of the Electoral College mechanism.

## Election result stability

Here we introduce a new concept (or a not-new concept, I'm not a social choice theory expert), called **election result stability** and defined as *the minimal number of voters that would have to change their vote to change the result of an election*. It is expressed as a percentage of the total number of voters (excluding spoiled ballots and "None of the above"-type votes).

For example, in the second round of the [2017 French election](https://en.wikipedia.org/wiki/2017_French_presidential_election):
- Emmanuel Macron won with 66.1% ($V_{M}$) of the vote
- Marine Le Pen lost with 33.9% ($V_{LP}$) of the vote

So the election result stability $S$ would trivially be:

$S = \frac{V_{M} - V_{LP}}{2} = 16.1\%$

and indeed, we see that $V_{LP} + S = 50\%$, the score needed to overturn the election.

If this example seems extremely simple, it's because the French presidential election, while not perfect, uses a system that is actually sane and produces a result that is clear for everyone to understand and trust.

With that said, let's return to the US Presidential Election.

## Problem definition

Here we concentrate on the [2020 US Presidential Election](https://en.wikipedia.org/wiki/2020_United_States_presidential_election), won by Joe Biden over Donald Trump by an Electoral College victory of 321-306 (margin: 25) and, anecdotally, by a popular vote victory of 51%-47% (margin: 4%). We will answer the following question:

> What is the election result stability of the election?

or, in other words

> How many voters changing their vote would be sufficient to reverse the election?

This is a surprisingly hard question to answer, as we can imagine a number of different scenarios where the results would be overturned (e.g. California voting for Trump for some reason), but each of them would require a different number of voters changing their vote.

To simplify our analysis, we make a few hypotheses:
- Consider 3rd party voters as spoiled ballots (sorry to Jo Jorgensen, the only female presidential candidate 😬)
- Consider "None of the Above" voters as spoiled ballots
- No faithless electors (the result of the electoral college will be correctly translated in the second turn)

(Note that the first two hypotheses make our model simpler, but won't affect the result as the quickest path to change the election is converting Biden voters to Trump voters anyway)

With that being said, let's get our hands dirty With that being said, let's get our hands dirty 🙌

# Data

We will use two datasets for our analysis
- [Results of US Election](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX) by the MIT ElectionLab
- A custom-made dataset of the number of electors in each state (based on [this article](https://www.britannica.com/topic/United-States-Electoral-College-Votes-by-State-1787124))

# Code

Let's load the data and combine our datasets in a single data structure.

In [None]:
import csv
from functools import lru_cache

# Data structure:
# state_data[STATE_NAME] = {
#    'DEMOCRAT': nb of votes for BIDEN
#    'REPUBLICAN': nb of votes for TRUMP
#    'ec_count': nb of electoral college electors
#    'total_votes': total number of voters (inc. 3rd-party and NOTA voters)
# }
state_data = {}

# Parse election results
with open("1976-2020-president.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row["year"] != "2020":
            continue

        # Ignore 3rd party and NOTA
        if row["party_simplified"] not in ["DEMOCRAT", "REPUBLICAN"]:
            continue

        assert row["office"] == "US PRESIDENT"

        if "BIDEN" not in row["candidate"] and "TRUMP" not in row["candidate"]:
            raise Exception("Unexpected candidate " + str(row))

        state = row["state"]
        if state not in state_data:
            state_data[state] = {}

        party = row["party_simplified"]
        if party in state_data[state]:
            raise Exception("Duplicate result for " + state + " " + party)
        state_data[state][party] = int(row["candidatevotes"])

        state_data[state]["total_votes"] = int(row["totalvotes"])

# Parse electoral college data
with open("electoral-college-electors.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        state = row["state"].upper()
        if state not in state_data:
            raise Exception("Unrecognized state", state)
        state_data[state]["ec_count"] = int(row["nbElectors"])

print("Got data for", len(state_data), "states")

As a quick check, let's verify that we got the same results as the official ones (at least for the electoral college)

In [None]:
# Check if we reproduce the results

electorsForBiden = 0
electorsForTrump = 0
votesForBiden = 0
votesForTrump = 0
for state, results in state_data.items():
    votesForBiden += results["DEMOCRAT"]
    votesForTrump += results["REPUBLICAN"]
    if results["REPUBLICAN"] > results["DEMOCRAT"]:
        electorsForTrump += results["ec_count"]
    else:
        electorsForBiden += results["ec_count"]

assert electorsForBiden == 306
assert electorsForTrump == 232

print("Reproducing the election results: OK")

Then, let's generate all scenarios that would result in a Trump win, and count the number of voters that would have to vote differently in that scenario.

This code is a big messy, but hopefully you can follow along (future employers -- please skip this)

In [None]:
electorsDiff = electorsForBiden - electorsForTrump
statesForBiden = [
    state for state, data in state_data.items() if data["DEMOCRAT"] > data["REPUBLICAN"]
]

minKnownVotes = 250000000  # Set the total US population as the initial value


def sumOfDiffVotes(states):
    """ Return the number of voters that needs to change their vote for all the states in `states` to change result """
    acc = 0
    for state in states:
        nbDiffVotes = state_data[state]["DEMOCRAT"] - state_data[state]["REPUBLICAN"]
        acc += nbDiffVotes
    return acc


# We will iterate over all scenarios in a very naive way that results in a lot of duplicate
# As a quick optimization, we mark the scenarios already encountered to avoid repeating ourselves
already_tried = set()


def getCanonicalForm(path, remainingElectors):
    return "-".join(sorted(path)) + "-" + str(remainingElectors)


# generator that will yield all possible scenarios where the election is overturned
def allPossiblePaths(prevStates, states, electorsDiff):
    # print('all', states, electorsDiff)

    canonicalForm = getCanonicalForm(prevStates, electorsDiff)
    if canonicalForm in already_tried:
        return []
    already_tried.add(canonicalForm)

    if sumOfDiffVotes(prevStates) > minKnownVotes:
        return []

    for state in states:
        ecCount = state_data[state]["ec_count"]

        if ecCount <= electorsDiff:
            # Adding this state does not overturn the election
            # Add to path, then continue exploring 
            for nextPath in allPossiblePaths(
                prevStates + [state],
                [s for s in states if s != state],
                electorsDiff - ecCount,
            ):
                yield [state] + nextPath

        if ecCount > electorsDiff:
            # Adding this state overturn the election
            # => Add to path, but stop there
            yield [state]
            continue

bestPath = None

for path in allPossiblePaths([], statesForBiden, electorsDiff):
    pathScore = sumOfDiffVotes(path)
    if pathScore < minKnownVotes:
        minKnownVotes = pathScore
        bestPath = path

print("Found best path:", " ".join(bestPath), "with a score of", minKnownVotes, "votes")

print("Details:")
for state in bestPath:
    diff = state_data[state]["DEMOCRAT"] - state_data[state]["REPUBLICAN"]
    print(
        "\t",
        state,
        ":",
        diff,
        "votes =",
        round(diff * 100 / state_data[state]["total_votes"], 2),
        "%% of the state voters",
    )


We can compare this result with the total number of voters (taken from [the official results by the FEC](https://www.fec.gov/resources/cms-content/documents/2020presgeresults.pdf)) to get our answer as a percentage.

In [None]:
totalNbOfVotes = 158383403

electionResultsStability = (minKnownVotes * 100) / totalNbOfVotes

print("Stability:", round(electionResultsStability, 2), "%")

## Result

In summary, the answer to our question is:
- 311257 strategically chosen voters could change the result of the election by modifying their vote (From Biden to Trump)
- This represents 0.2% (2 tenth of a percent) of the total voters

## Conclusion

The election result stability that was found in this analysis gives us a different sense of how close the election was, and how stable its result is. It can be seen as the "critical path" margin

In our opinion, it is more accurate than the more intuitive values that are often discussed, such as the electoral college margin (5%) or the popular vote margin (4.5%). Here, we both see that the election was really close, and that its result was very unstable.

An interesting further work would be to compute the opposite value, the "inversed stability", i.e. the maximal number of voters that can change their vote without overturning the election. Because of the electoral college quirkyness, this number will probably be laughably huge. Note that in a sane system (e.g. the French election discussed above), the stability and the inversed stability would be equal.



