# Louisiana 2008 Presidential Elections: Data Cleaning & Preprocessing

**Goal:** Build a clean, analysis-ready county-level table for Louisiana, 2008 by merging the presidential primary and presidential general election results, and then derive summary stats (party totals).

**Output**: A single CSV where each row is a county and columns include:

- Primary per-candidate vote counts (prefixed with `pri_`)
- General per-candidate vote counts (prefixed with `gen_`)
- Party totals: `rep_primary_total`, `dem_primary_total`, `rep_general_total`, `dem_general_total`, `grn_general_total`, `oth_general_total`

**Last Updated**: 2025/10/02

## 0. Library Import

In [1]:
import re
import pandas as pd
import numpy as np
from pathlib import Path

  from pandas.core import (


## 1. Inputs & Parameters

Define raw file paths once here so the entire notebook is easy to rerun on another machine. If a path changes, we only update it here. We keep a single `OUTPUT_PATH` so all exports land in one known place.

In [4]:
# LA 2008 dataset path
PRIMARY_PATH1 = r"../../data/raw/2008/LA/20080209__la__primary.csv"
PRIMARY_PATH2 = r"../../data/raw/2008/LA/20081004__la__primary.csv"
GENERAL_PATH  = r"../../data/raw/2008/LA/20081104__la__general.csv"

# Output directory
OUTPUT_PATH  = r"../../data/processed/2008/LA/"

# Analysis parameters
DISPLAY_ROWS = 10   # Number of rows to display in dataframes

## 2. Load & Filter

We load primary and general datasets separately and immediately subset to the rows we truly need:

- Restrict `office` to 'President' to avoid mixing down-ballot contests

- Remove columns that are fully missing or irrelevant post-filter (e.g., a district column that’s empty for county-level rows)

### a. Primary Election Dataset

There are two files for the primary election. We will go through each of them and preprocess, then merge if neccessary.

In [5]:
# Load primary data
primary1_df = pd.read_csv(PRIMARY_PATH1)
primary1_df.head(DISPLAY_ROWS)

Unnamed: 0,county,office,district,party,candidate,votes
0,Acadia,President,,D,"""Joe"" Biden",100
1,Acadia,President,,D,Hillary Clinton,2030
2,Acadia,President,,D,Christopher J. Dodd,32
3,Acadia,President,,D,John Edwards,213
4,Acadia,President,,D,Dennis J. Kucinich,15
5,Acadia,President,,D,Barack Obama,1958
6,Acadia,President,,D,"William ""Bill"" Richardson",53
7,Allen,President,,D,"""Joe"" Biden",45
8,Allen,President,,D,Hillary Clinton,978
9,Allen,President,,D,Christopher J. Dodd,13


In [6]:
# Different values in 'office' column
primary1_df["office"].value_counts()

office
President    1152
Name: count, dtype: int64

In [7]:
# Since there is only presidential data, drop "office" column
# Also, drop the district column
primary1_df = primary1_df.drop(columns=["office", "district"]).reset_index(drop=True)
primary1_df.head(DISPLAY_ROWS)

Unnamed: 0,county,party,candidate,votes
0,Acadia,D,"""Joe"" Biden",100
1,Acadia,D,Hillary Clinton,2030
2,Acadia,D,Christopher J. Dodd,32
3,Acadia,D,John Edwards,213
4,Acadia,D,Dennis J. Kucinich,15
5,Acadia,D,Barack Obama,1958
6,Acadia,D,"William ""Bill"" Richardson",53
7,Allen,D,"""Joe"" Biden",45
8,Allen,D,Hillary Clinton,978
9,Allen,D,Christopher J. Dodd,13


In [8]:
# Unique parties in primary1_df
primary1_df["party"].value_counts()

party
R    704
D    448
Name: count, dtype: int64

In [9]:
# Candidates in primary1_df
primary1_df["candidate"].value_counts()

candidate
"Joe" Biden                  64
Hillary Clinton              64
"Tom" Tancredo               64
Mitt Romney                  64
Ron Paul                     64
John McCain                  64
Alan Keyes                   64
Duncan Hunter                64
"Mike" Huckabee              64
Rudolph W. Giuliani          64
Daniel Gilbert               64
Jerry Curry                  64
William "Bill" Richardson    64
Barack Obama                 64
Dennis J. Kucinich           64
John Edwards                 64
Christopher J. Dodd          64
Fred Thompson                64
Name: count, dtype: int64

In [10]:
# Final look at the (supposed) cleaned primary1_df
primary1_df.head(DISPLAY_ROWS)

Unnamed: 0,county,party,candidate,votes
0,Acadia,D,"""Joe"" Biden",100
1,Acadia,D,Hillary Clinton,2030
2,Acadia,D,Christopher J. Dodd,32
3,Acadia,D,John Edwards,213
4,Acadia,D,Dennis J. Kucinich,15
5,Acadia,D,Barack Obama,1958
6,Acadia,D,"William ""Bill"" Richardson",53
7,Allen,D,"""Joe"" Biden",45
8,Allen,D,Hillary Clinton,978
9,Allen,D,Christopher J. Dodd,13


In [11]:
# Shape after preprocessing
primary1_df.shape

(1152, 4)

Finishing up with this dataset, we now look at the other primary election dataset.

In [12]:
# Load primary data
primary2_df = pd.read_csv(PRIMARY_PATH2)
primary2_df.head(DISPLAY_ROWS)

Unnamed: 0,county,office,district,party,candidate,votes
0,Jefferson,U.S. House,1,D,"""Jim"" Harlan",10740
1,Jefferson,U.S. House,1,D,"M.V. ""Vinny"" Mendoza",4882
2,Orleans,U.S. House,1,D,"""Jim"" Harlan",3005
3,Orleans,U.S. House,1,D,"M.V. ""Vinny"" Mendoza",927
4,St. Charles,U.S. House,1,D,"""Jim"" Harlan",629
5,St. Charles,U.S. House,1,D,"M.V. ""Vinny"" Mendoza",337
6,St. Tammany,U.S. House,1,D,"""Jim"" Harlan",9795
7,St. Tammany,U.S. House,1,D,"M.V. ""Vinny"" Mendoza",3125
8,Tangipahoa,U.S. House,1,D,"""Jim"" Harlan",9892
9,Tangipahoa,U.S. House,1,D,"M.V. ""Vinny"" Mendoza",4044


In [13]:
# Different values in 'office' column
primary2_df["office"].value_counts()

office
U.S. House    161
Name: count, dtype: int64

There are no presidential data in this dataframe. Then, we can just safely ignore it and proceed with general dataset.

### b. General Election Dataset

In [14]:
# Load general data
general_df = pd.read_csv(GENERAL_PATH)
general_df.head(DISPLAY_ROWS)

Unnamed: 0,county,office,district,party,candidate,votes
0,Acadia,President,,D,"Barack Obama, Joe Biden",7028
1,Acadia,President,,G,"Cynthia McKinney,Rosa Clemente",182
2,Acadia,President,,R,"John McCain, Sarah Palin",19229
3,Acadia,President,,O,"Chuck Baldwin, Darrell Castle",35
4,Acadia,President,,O,"Ralph Nader, Matt Gonzalez",117
5,Acadia,President,,O,"""Ron"" Paul,Barry Goldwater,Jr.",101
6,Acadia,President,,O,"Gene Amondson, Leroy Pletten",6
7,Acadia,President,,O,"Gloria La Riva, Eugene Puryear",4
8,Acadia,President,,O,"James Harris, Alyson Kennedy",9
9,Allen,President,,D,"Barack Obama, Joe Biden",2891


In [15]:
# Different values in 'office' column
general_df["office"].value_counts()

office
President      576
U.S. Senate    320
U.S. House     119
Name: count, dtype: int64

In [16]:
# Only keep rows where 'office' is 'President'
general_df = general_df[general_df["office"] == "President"]
general_df.shape

(576, 6)

In [17]:
# Now, drop the "office" column as it's no longer needed
# Also, drop the district column
general_df = general_df.drop(columns=["office", "district"]).reset_index(drop=True)
general_df.head(DISPLAY_ROWS)

Unnamed: 0,county,party,candidate,votes
0,Acadia,D,"Barack Obama, Joe Biden",7028
1,Acadia,G,"Cynthia McKinney,Rosa Clemente",182
2,Acadia,R,"John McCain, Sarah Palin",19229
3,Acadia,O,"Chuck Baldwin, Darrell Castle",35
4,Acadia,O,"Ralph Nader, Matt Gonzalez",117
5,Acadia,O,"""Ron"" Paul,Barry Goldwater,Jr.",101
6,Acadia,O,"Gene Amondson, Leroy Pletten",6
7,Acadia,O,"Gloria La Riva, Eugene Puryear",4
8,Acadia,O,"James Harris, Alyson Kennedy",9
9,Allen,D,"Barack Obama, Joe Biden",2891


In [18]:
# List out all the parties in the general election data
general_df["party"].value_counts()

party
O    384
D     64
G     64
R     64
Name: count, dtype: int64

In [20]:
# Candidates in general_df
general_df["candidate"].value_counts()

candidate
Barack Obama, Joe Biden           64
Cynthia McKinney,Rosa Clemente    64
John McCain, Sarah Palin          64
Chuck Baldwin, Darrell Castle     64
Ralph Nader, Matt Gonzalez        64
"Ron" Paul,Barry Goldwater,Jr.    64
Gene Amondson, Leroy Pletten      64
Gloria La Riva, Eugene Puryear    64
James Harris, Alyson Kennedy      64
Name: count, dtype: int64

Again, the `candidate` field is in format "President, Vice President". Since we only want the presidential candidate, we will modify the value in such column as follows:

In [21]:
# Keep only the presidential candidate name
general_df["candidate"] = (
    general_df["candidate"].apply(
        lambda x: x.split(",")[0].strip() if isinstance(x, str) else x
    )
)

# Updated candidates list
general_df["candidate"].value_counts() 

candidate
Barack Obama        64
Cynthia McKinney    64
John McCain         64
Chuck Baldwin       64
Ralph Nader         64
"Ron" Paul          64
Gene Amondson       64
Gloria La Riva      64
James Harris        64
Name: count, dtype: int64

In [22]:
# Missing values count
general_df.isnull().sum()

county       0
party        0
candidate    0
votes        0
dtype: int64

In [23]:
# Final look at cleaned general_df
general_df.head(DISPLAY_ROWS)

Unnamed: 0,county,party,candidate,votes
0,Acadia,D,Barack Obama,7028
1,Acadia,G,Cynthia McKinney,182
2,Acadia,R,John McCain,19229
3,Acadia,O,Chuck Baldwin,35
4,Acadia,O,Ralph Nader,117
5,Acadia,O,"""Ron"" Paul",101
6,Acadia,O,Gene Amondson,6
7,Acadia,O,Gloria La Riva,4
8,Acadia,O,James Harris,9
9,Allen,D,Barack Obama,2891


In [24]:
# Shape after preprocessing
general_df.shape

(576, 4)

## 3. Table Pivoting

We convert tall (one row per county/party/candidate) into wide (one row per county with one column per candidate). This creates the consistent schema with previous group cleaned data.

Helper functions:

- `normalize_party(s)`: maps common forms (e.g., “Democratic”, “Republican”) to keys dem/rep so column names are stable
- `candidate_token(name)`: turns “Barack Obama” -> OBAMA, “John McCain” -> MCCAIN, etc. Create a short, readable, unique token for column names
- `pivot_wide(df, prefix, key_col="county")`: Main pivot function
        
    * groups by `county` x `party` × `candidate`, sums `votes`,
    * pivots to columns named like:
        * Primary: `pri_dem_OBAMA`, `pri_rep_MCCAIN`,...
        * General: `gen_dem_OBAMA`, `gen_rep_MCCAIN`,...

    * flattens the MultiIndex into plain column strings,
    * returns one wide row per county

In [25]:
def normalize_party(s: pd.Series) -> pd.Series:
    """
    Normalize party names: Democratic -> dem, Republican -> rep
    """
    return(s.str.strip()
           .str.capitalize()
           .map({
                "D" : "dem", 
                "R" : "rep",
                "G" : "grn",
                "O" : "oth",
               })
           .fillna(s.str.strip().str.lower()))      # For defensive purposes only, would not expect other parties

In [26]:
SUFFIXES = {
    "JR","SR","JNR","SNR",
    "II","III","IV","V","VI","VII","VIII","IX","X","XI","XII"
}

def candidate_token(name: str) -> str:
    """
    Turn John McCain -> MCCAIN, Barack Obama -> OBAMA
    Skip suffixes, keep last name/token, capitalize, and remove punctuation
    """
    if pd.isna(name):
        return "UNKNOWN"                # Defensive purposes only, would not expect missing values
    
    # Remove suffixes
    raw = str(name).strip()

    # If a comma exists, treat as 'LAST, FIRST ...'
    if "," in raw:
        last_part = raw.split(",", 1)[0]
        last_part = re.sub(r"[^A-Za-z0-9\s]+", "", last_part).strip().upper()
        tokens = last_part.split()
        return tokens[-1] if tokens else "UNKNOWN"

    # Otherwise: remove punctuation, split, then drop trailing suffixes
    tokens = re.sub(r"[^A-Za-z0-9\s]+", "", raw).strip().upper().split()
    while tokens and tokens[-1] in SUFFIXES:
        tokens.pop()
    return tokens[-1] if tokens else "UNKNOWN"

In [27]:
def pivot_wide(df: pd.DataFrame, prefix: str, key_col: str="county") -> pd.DataFrame:
    """
    Pivot the dataframe to wide format based on party and candidate
    """
    # Normalize party names
    df['party_key'] = normalize_party(df['party'])
    
    # Create candidate tokens
    df['candidate_token'] = df['candidate'].apply(candidate_token)
    
    # Create new column names based on party and candidate token
    df['new_col'] = prefix + '_' + df['party'] + '_' + df['candidate_token']
    
    # Pivot the dataframe
    pivot_df = df.pivot_table(index=key_col, 
                              columns=["party_key", "candidate_token"], 
                              values="votes", 
                              aggfunc='sum', 
                              fill_value=0)
    
    # Flatten multi-level columns
    pivot_df.columns = [f"{prefix}_{p}_{c}" for p, c in pivot_df.columns]
    
    # Reset index to turn key_col back into a column
    pivot_df = pivot_df.reset_index()
    
    return pivot_df

In [28]:
# Primary dataframe pivot
primary_pivot = pivot_wide(primary1_df, prefix="pri")
primary_pivot.head(DISPLAY_ROWS)

Unnamed: 0,county,pri_dem_BIDEN,pri_dem_CLINTON,pri_dem_DODD,pri_dem_EDWARDS,pri_dem_KUCINICH,pri_dem_OBAMA,pri_dem_RICHARDSON,pri_rep_CURRY,pri_rep_GILBERT,pri_rep_GIULIANI,pri_rep_HUCKABEE,pri_rep_HUNTER,pri_rep_KEYES,pri_rep_MCCAIN,pri_rep_PAUL,pri_rep_ROMNEY,pri_rep_TANCREDO,pri_rep_THOMPSON
0,Acadia,100,2030,32,213,15,1958,53,5,4,8,659,5,8,550,51,79,1,19
1,Allen,45,978,13,92,9,757,30,0,0,2,262,0,1,145,14,14,0,2
2,Ascension,122,2794,41,248,20,3726,84,6,2,32,1689,7,14,1182,223,220,5,38
3,Assumption,53,899,22,77,4,1178,21,1,0,3,112,1,1,156,18,12,0,4
4,Avoyelles,70,1499,22,114,15,1471,28,0,2,16,235,2,1,308,16,26,0,12
5,Beauregard,40,1202,8,133,10,732,40,2,2,9,755,2,7,433,34,54,0,6
6,Bienville,46,642,10,56,12,963,22,2,0,3,326,0,2,135,34,13,1,4
7,Bossier,77,2568,18,228,7,2615,92,12,1,31,2684,13,15,1459,152,314,3,41
8,Caddo,279,7521,78,496,63,14604,220,27,6,81,5517,20,33,3666,431,741,7,83
9,Calcasieu,221,7283,68,436,47,8179,173,13,7,39,3000,11,37,2234,282,344,5,59


In [29]:
# Primary dataframe shape after pivot
primary_pivot.shape

(64, 19)

In [30]:
# General dataframe pivot
general_pivot = pivot_wide(general_df, prefix="gen")
general_pivot.head(DISPLAY_ROWS)

Unnamed: 0,county,gen_dem_OBAMA,gen_grn_MCKINNEY,gen_oth_AMONDSON,gen_oth_BALDWIN,gen_oth_HARRIS,gen_oth_NADER,gen_oth_PAUL,gen_oth_RIVA,gen_rep_MCCAIN
0,Acadia,7028,182,6,35,9,117,101,4,19229
1,Allen,2891,92,3,23,5,64,54,2,6333
2,Ascension,14625,191,6,63,12,214,213,8,31239
3,Assumption,4756,92,1,15,6,53,52,4,5981
4,Avoyelles,6327,146,3,29,14,109,72,2,10236
5,Beauregard,3071,72,0,27,4,99,78,5,10718
6,Bienville,3589,15,0,10,6,17,17,0,3776
7,Bossier,12703,115,3,42,5,104,149,1,32713
8,Caddo,55536,195,6,91,26,213,353,12,52228
9,Calcasieu,30244,331,10,114,37,510,425,11,50449


In [31]:
# General dataframe shape after pivot
general_pivot.shape

(64, 10)

## 4. Merge Dataframes

Before merging, we verify that county names match across primary and general:

In [32]:
# Check if county names match between primary_df and general_df
primary_counties = set(primary1_df["county"].unique())
general_counties = set(general_df["county"].unique())
common_counties = primary_counties.intersection(general_counties)
print(f"Number of common counties: {len(common_counties)} out of {len(primary_counties)}")

Number of common counties: 64 out of 64


Great. Since we know that all counties name are matched, we don't need to perform further data preprocessing to match the county names. Thus, we can now merge them:

In [33]:
# Merge primary and general dataframes on 'county'
merged_df = primary_pivot.merge(general_pivot, on="county", how="inner").fillna(0)    # There should be no missing values to fill with 0
merged_df.head(DISPLAY_ROWS)

Unnamed: 0,county,pri_dem_BIDEN,pri_dem_CLINTON,pri_dem_DODD,pri_dem_EDWARDS,pri_dem_KUCINICH,pri_dem_OBAMA,pri_dem_RICHARDSON,pri_rep_CURRY,pri_rep_GILBERT,...,pri_rep_THOMPSON,gen_dem_OBAMA,gen_grn_MCKINNEY,gen_oth_AMONDSON,gen_oth_BALDWIN,gen_oth_HARRIS,gen_oth_NADER,gen_oth_PAUL,gen_oth_RIVA,gen_rep_MCCAIN
0,Acadia,100,2030,32,213,15,1958,53,5,4,...,19,7028,182,6,35,9,117,101,4,19229
1,Allen,45,978,13,92,9,757,30,0,0,...,2,2891,92,3,23,5,64,54,2,6333
2,Ascension,122,2794,41,248,20,3726,84,6,2,...,38,14625,191,6,63,12,214,213,8,31239
3,Assumption,53,899,22,77,4,1178,21,1,0,...,4,4756,92,1,15,6,53,52,4,5981
4,Avoyelles,70,1499,22,114,15,1471,28,0,2,...,12,6327,146,3,29,14,109,72,2,10236
5,Beauregard,40,1202,8,133,10,732,40,2,2,...,6,3071,72,0,27,4,99,78,5,10718
6,Bienville,46,642,10,56,12,963,22,2,0,...,4,3589,15,0,10,6,17,17,0,3776
7,Bossier,77,2568,18,228,7,2615,92,12,1,...,41,12703,115,3,42,5,104,149,1,32713
8,Caddo,279,7521,78,496,63,14604,220,27,6,...,83,55536,195,6,91,26,213,353,12,52228
9,Calcasieu,221,7283,68,436,47,8179,173,13,7,...,59,30244,331,10,114,37,510,425,11,50449


In [34]:
# Statistics check on merged dataframe 
merged_df.describe()

Unnamed: 0,pri_dem_BIDEN,pri_dem_CLINTON,pri_dem_DODD,pri_dem_EDWARDS,pri_dem_KUCINICH,pri_dem_OBAMA,pri_dem_RICHARDSON,pri_rep_CURRY,pri_rep_GILBERT,pri_rep_GIULIANI,...,pri_rep_THOMPSON,gen_dem_OBAMA,gen_grn_MCKINNEY,gen_oth_AMONDSON,gen_oth_BALDWIN,gen_oth_HARRIS,gen_oth_NADER,gen_oth_PAUL,gen_oth_RIVA,gen_rep_MCCAIN
count,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,...,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0
mean,96.53125,2139.453125,30.0625,203.53125,21.9375,3447.375,66.515625,8.140625,2.859375,24.890625,...,25.046875,12234.203125,143.546875,4.296875,40.328125,11.484375,109.328125,146.375,5.53125,17941.796875
std,145.804388,2949.020354,39.378858,310.265476,37.321799,6569.907676,119.035141,12.660233,4.813563,60.292967,...,51.942713,20971.774954,187.201269,4.981489,48.366673,11.327719,140.456179,211.721103,7.455623,22515.156499
min,6.0,222.0,3.0,13.0,1.0,118.0,7.0,0.0,0.0,1.0,...,0.0,613.0,9.0,0.0,2.0,1.0,4.0,8.0,0.0,1254.0
25%,28.75,588.25,10.75,82.5,7.0,754.25,22.75,1.0,0.0,3.0,...,4.0,3059.5,38.5,1.0,10.75,4.75,24.75,29.75,1.0,5559.5
50%,55.5,1156.0,20.0,117.5,12.0,1472.0,36.5,4.0,1.0,7.0,...,7.0,5654.0,81.0,3.0,25.0,8.0,54.5,64.5,4.0,8983.0
75%,103.0,2320.5,32.5,216.75,22.0,2863.0,60.0,9.5,3.0,19.5,...,23.25,10141.75,186.5,6.0,43.25,14.0,113.25,136.75,6.25,20507.75
max,1073.0,16191.0,287.0,2385.0,283.0,37179.0,928.0,76.0,29.0,438.0,...,367.0,117102.0,998.0,27.0,208.0,57.0,686.0,1025.0,40.0,113191.0


Now, we will add party totals columns: 

- Primary totals:
    * `rep_primary_total` = sum of all `pri_rep_*` columns
    * `dem_primary_total` = sum of all `pri_dem_*` columns

- General totals:
    * `rep_general_total` = sum of all `gen_rep_*` columns
    * `dem_general_total` = sum of all `gen_dem_*` columns
    * `grn_general_total` = sum of all `gen_grn_*` columns
    * `oth_general_total` = sum of all `gen_oth_*` columns

In [35]:
# Add party totals for primary election
rep_primary_cols   = [c for c in merged_df.columns if c.startswith("pri_rep_")]
dem_primary_cols   = [c for c in merged_df.columns if c.startswith("pri_dem_")]

merged_df["rep_primary_total"] = merged_df[rep_primary_cols].sum(axis=1) if rep_primary_cols else 0
merged_df["dem_primary_total"] = merged_df[dem_primary_cols].sum(axis=1) if dem_primary_cols else 0

In [36]:
# Add party totals for general election
rep_general_cols   = [c for c in merged_df.columns if c.startswith("gen_rep_")]
dem_general_cols   = [c for c in merged_df.columns if c.startswith("gen_dem_")]
grn_general_cols   = [c for c in merged_df.columns if c.startswith("gen_grn_")]
oth_general_cols   = [c for c in merged_df.columns if c.startswith("gen_oth_")]

merged_df["rep_general_total"] = merged_df[rep_general_cols].sum(axis=1) if rep_general_cols else 0
merged_df["dem_general_total"] = merged_df[dem_general_cols].sum(axis=1) if dem_general_cols else 0
merged_df["grn_general_total"] = merged_df[grn_general_cols].sum(axis=1) if grn_general_cols else 0
merged_df["oth_general_total"] = merged_df[oth_general_cols].sum(axis=1) if oth_general_cols else 0

In [37]:
# Print out all the column names in the final dataframe
print("Final columns in the cleaned dataframe:")
merged_df.columns

Final columns in the cleaned dataframe:


Index(['county', 'pri_dem_BIDEN', 'pri_dem_CLINTON', 'pri_dem_DODD',
       'pri_dem_EDWARDS', 'pri_dem_KUCINICH', 'pri_dem_OBAMA',
       'pri_dem_RICHARDSON', 'pri_rep_CURRY', 'pri_rep_GILBERT',
       'pri_rep_GIULIANI', 'pri_rep_HUCKABEE', 'pri_rep_HUNTER',
       'pri_rep_KEYES', 'pri_rep_MCCAIN', 'pri_rep_PAUL', 'pri_rep_ROMNEY',
       'pri_rep_TANCREDO', 'pri_rep_THOMPSON', 'gen_dem_OBAMA',
       'gen_grn_MCKINNEY', 'gen_oth_AMONDSON', 'gen_oth_BALDWIN',
       'gen_oth_HARRIS', 'gen_oth_NADER', 'gen_oth_PAUL', 'gen_oth_RIVA',
       'gen_rep_MCCAIN', 'rep_primary_total', 'dem_primary_total',
       'rep_general_total', 'dem_general_total', 'grn_general_total',
       'oth_general_total'],
      dtype='object')

In [38]:
# Preview merged dataframe with totals
merged_df.head(DISPLAY_ROWS)

Unnamed: 0,county,pri_dem_BIDEN,pri_dem_CLINTON,pri_dem_DODD,pri_dem_EDWARDS,pri_dem_KUCINICH,pri_dem_OBAMA,pri_dem_RICHARDSON,pri_rep_CURRY,pri_rep_GILBERT,...,gen_oth_NADER,gen_oth_PAUL,gen_oth_RIVA,gen_rep_MCCAIN,rep_primary_total,dem_primary_total,rep_general_total,dem_general_total,grn_general_total,oth_general_total
0,Acadia,100,2030,32,213,15,1958,53,5,4,...,117,101,4,19229,1389,4401,19229,7028,182,272
1,Allen,45,978,13,92,9,757,30,0,0,...,64,54,2,6333,440,1924,6333,2891,92,151
2,Ascension,122,2794,41,248,20,3726,84,6,2,...,214,213,8,31239,3418,7035,31239,14625,191,516
3,Assumption,53,899,22,77,4,1178,21,1,0,...,53,52,4,5981,308,2254,5981,4756,92,131
4,Avoyelles,70,1499,22,114,15,1471,28,0,2,...,109,72,2,10236,618,3219,10236,6327,146,229
5,Beauregard,40,1202,8,133,10,732,40,2,2,...,99,78,5,10718,1304,2165,10718,3071,72,213
6,Bienville,46,642,10,56,12,963,22,2,0,...,17,17,0,3776,520,1751,3776,3589,15,50
7,Bossier,77,2568,18,228,7,2615,92,12,1,...,104,149,1,32713,4725,5605,32713,12703,115,304
8,Caddo,279,7521,78,496,63,14604,220,27,6,...,213,353,12,52228,10612,23261,52228,55536,195,701
9,Calcasieu,221,7283,68,436,47,8179,173,13,7,...,510,425,11,50449,6031,16407,50449,30244,331,1107


Now, we save the cleaned dataframe into the processed directory.

In [39]:
# Save the cleaned and merged dataframe to CSV
out_dir = Path(OUTPUT_PATH)
out_dir.mkdir(parents=True, exist_ok=True)
merged_df.to_csv(OUTPUT_PATH + "LA.csv", index=False)