## B. Presidential results (D%, R%, two-party %)

Difficulty: Easy

Sources: MIT Election Data & Science Lab (county & state returns), state SOS, or Dave Leip’s Atlas (paid). MIT Election Lab is recommended for reproducibility.

MIT Election Lab
Quality issues: late-certified adjustments are rare but check state certification dates. Third-party vote handling—compute two-party share.
Estimated time: hours to 1 day using MIT Election Lab CSVs.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX

In [None]:
#!pip install pandas

In [None]:
#!pip install duckdb

In [None]:
import pandas as pd
df = pd.read_csv("data/POTUS/1976-2020-president.csv")

In [None]:
print(df)

In [1]:
import pandas as pd

# === 1. Load the data ===
file_path = "1976-2020-president_min.csv"  # update path as needed
df = pd.read_csv("data/POTUS/1976-2020-president.csv")

df["state_fips"] = df["state_fips"].astype(str).str.zfill(2)

# Normalize column names just in case (some datasets differ in case)
df.columns = df.columns.str.lower()

# === 2. Keep only relevant columns ===
cols = ["year", "state", "state_po", "state_fips", "party_detailed", "candidatevotes", "totalvotes"]
df = df[cols]

# === 3. Group parties into Dem, Rep, or Other ===
df["party_grouped"] = df["party_detailed"].apply(
    lambda x: (
        "dem" if "democrat" in str(x).lower()
        else "rep" if "republican" in str(x).lower()
        else "other"
    )
)

# === 4. Aggregate votes by year, state, and party group ===
pivot_df = (
    df.groupby(["year", "state", "state_po", "state_fips", "party_grouped"])["candidatevotes"]
    .sum()
    .unstack(fill_value=0)
    .reset_index()
)

# === 5. Merge in total votes (max should be same per state-year) ===
total_votes = df.groupby(["year", "state", "state_po", "state_fips"])["totalvotes"].max().reset_index()

merged = pivot_df.merge(total_votes, on=["year", "state", "state_po", "state_fips"], how="left")

# === 6. Compute party vote percentages ===
for party in ["dem", "rep", "other"]:
    merged[f"{party.lower()}_pct"] = merged[party]*100 / merged["totalvotes"]

# === 7. Compute Dem–Rep difference ===
merged["d_r_diff"] = merged["dem_pct"] - merged["rep_pct"]

# === 8. Optional: sort and inspect ===
merged = merged.sort_values(["year", "state_po"]).reset_index(drop=True)

#print(merged.head())
display(merged)
# === 9. Optional: save output ===
# merged.to_csv("data/presidential_votes_pivot.csv", index=False)


Unnamed: 0,year,state,state_po,state_fips,dem,other,rep,totalvotes,dem_pct,rep_pct,other_pct,d_r_diff
0,1976,ALASKA,AK,02,44058,7961,71555,123574,35.653131,57.904575,6.442294,-22.251444
1,1976,ALABAMA,AL,01,659170,19610,504070,1182850,55.727269,42.614871,1.657860,13.112398
2,1976,ARKANSAS,AR,05,498604,1028,267903,767535,64.961728,34.904337,0.133935,30.057392
3,1976,ARIZONA,AZ,04,295602,28475,418642,742719,39.799978,56.366136,3.833886,-16.566158
4,1976,CALIFORNIA,CA,06,3742284,179242,3882244,7803770,47.954822,49.748314,2.296864,-1.793492
...,...,...,...,...,...,...,...,...,...,...,...,...
607,2020,VERMONT,VT,50,242820,15444,112704,370968,65.455781,30.381057,4.163162,35.074723
608,2020,WASHINGTON,WA,53,2369612,133368,1584651,4087631,57.970301,38.766978,3.262721,19.203323
609,2020,WISCONSIN,WI,55,1630866,56991,1610184,3298041,49.449537,48.822437,1.728026,0.627100
610,2020,WEST VIRGINIA,WV,54,235984,13286,545382,794652,29.696521,68.631552,1.671927,-38.935031
