# Electoral College Data Mapping

The purpose of this project is to analyze the how the electoral college votes were distributed after the 2010 census, look at how the redistribution of population (by estimate) has shifted over time until today, and what that means for the voter per electoral college vote in each state. I will also analyze how we expect the electoral college to be redistributed after the 2020 census, given census bureau predictions.

An analysis will also be performed on the percentage likelihood of each state giving its electoral college votes to a particular party and their respective nominees, based purely on historical data. The aim is to demonstrate which states have the greatest power per vote, given both their current electoral votes allotted and the likelihood of that state assigning thier votes to either candidate.

### Things to get

- likelihood of state vote going to party
- states signing pact for popular vote
- predictions for state populations in 2020
- DC gets electoral votes, but no house votes
- Webscrape status of NPVIC website: https://www.nationalpopularvote.com/state-status


### Vizualizations
- viz of state pops over time (5 lowest, 5 highest)
- animation of voter potency (voters / electoral vote)
- electoral votes w/ new census
- voter potency w/ new census
- guaranteed (statistically) electoral votes for each party
- swing states and leanings and votes


## Import

In [301]:
#import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import requests
import json
import copy

from bs4 import BeautifulSoup
import re

import sqlite3
%matplotlib inline

## Support Functions

In [2]:
#define how to retrieve api keys

def get_keys(path):
    """
    Pulls necessary api keys from designated path
    """
    with open(path) as f:
        return json.load(f)

In [183]:
def reciprocal_geometric_mean(next_house_seat):
    """
    Calculates the reciprocal geometric mean for the next house seat a state could potentially receive
    """
    return 1 / np.sqrt(next_house_seat*(next_house_seat-1))

In [184]:
def priority_value(state_pop, next_house_seat):
    """
    Calculates the priority value a state has for receiving another house seat.
    The highest priority value state gets the next house seat available.
    """
    return int(round(state_pop * reciprocal_geometric_mean(next_house_seat), 0))

## Data Collection

### API Request

In [3]:
#get key for census bureau api
keys = get_keys("/Users/flatironschool/.secret/census_api.json")

api_key = keys['api_key']

In [5]:
#make and print request for census count and estimates from 2010-2019 for all states
year = '2019'

url = 'https://api.census.gov/data/{}/pep/population'.format(year)

variables = ['DATE_CODE',
             'DATE_DESC',
             'POP',
             'NAME']

granularity = 'state:*'

params = {'get': ','.join(variables), 'for': granularity, 'key': api_key}

r = requests.get(url, params=params)
print(r.url)
print(r)
print(type(r.text))
print(r.text[:1000])

https://api.census.gov/data/2019/pep/population?get=DATE_CODE%2CDATE_DESC%2CPOP%2CNAME&for=state%3A%2A&key=b7961d22ec04ff1777be8a0450921d3f28af8315
<Response [200]>
<class 'str'>
[["DATE_CODE","DATE_DESC","POP","NAME","state"],
["1","4/1/2010 Census population","5303925","Minnesota","27"],
["2","4/1/2010 population estimates base","5303927","Minnesota","27"],
["3","7/1/2010 population estimate","5310828","Minnesota","27"],
["4","7/1/2011 population estimate","5346143","Minnesota","27"],
["5","7/1/2012 population estimate","5376643","Minnesota","27"],
["6","7/1/2013 population estimate","5413479","Minnesota","27"],
["7","7/1/2014 population estimate","5451079","Minnesota","27"],
["8","7/1/2015 population estimate","5482032","Minnesota","27"],
["9","7/1/2016 population estimate","5522744","Minnesota","27"],
["10","7/1/2017 population estimate","5566230","Minnesota","27"],
["11","7/1/2018 population estimate","5606249","Minnesota","27"],
["12","7/1/2019 population estimate","5639632","Min

In [115]:
#clean data (only need population, state, and datetime) with pandas before putting into SQL database
data = r.json()
main_df = pd.DataFrame(data[1:], columns=data[0])
main_df['YEAR'] = main_df.DATE_DESC.apply(lambda x: x[4:8])
main_df.drop(main_df[(main_df.DATE_CODE == '2') | (main_df.DATE_CODE == '3')].index, inplace=True)
main_df.drop(['state','DATE_CODE', 'DATE_DESC'], axis=1, inplace=True)
main_df.reset_index(drop=True, inplace=True)
main_df['POP'] = main_df.POP.astype('int64')
main_df = main_df[['YEAR', 'NAME', 'POP']]
main_df.columns = ['Year', 'State', 'Population']

display(main_df.head(15))
display(main_df.info())

Unnamed: 0,Year,State,Population
0,2010,Minnesota,5303925
1,2011,Minnesota,5346143
2,2012,Minnesota,5376643
3,2013,Minnesota,5413479
4,2014,Minnesota,5451079
5,2015,Minnesota,5482032
6,2016,Minnesota,5522744
7,2017,Minnesota,5566230
8,2018,Minnesota,5606249
9,2019,Minnesota,5639632


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 520 entries, 0 to 519
Data columns (total 3 columns):
Year          520 non-null object
State         520 non-null object
Population    520 non-null int64
dtypes: int64(1), object(2)
memory usage: 12.3+ KB


None

### Overseas Population Import

In [116]:
overseas_df = pd.read_excel('Overseas Population 2010.xls', skiprows=7)
overseas_df.dropna(inplace=True)
overseas_df.columns = ['State', 'Overseas_pop']
overseas_df['Year'] = '2010'
overseas_df['Overseas_pop'] = overseas_df['Overseas_pop'].astype('int64')
overseas_df.reset_index(drop=True, inplace=True)

display(overseas_df.head())
display(overseas_df.info())

Unnamed: 0,State,Overseas_pop,Year
0,Alabama,23246,2010
1,Alaska,11292,2010
2,Arizona,20683,2010
3,Arkansas,10311,2010
4,California,88033,2010


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 3 columns):
State           52 non-null object
Overseas_pop    52 non-null int64
Year            52 non-null object
dtypes: int64(1), object(2)
memory usage: 1.3+ KB


None

### Scraping National Popular Vote Interstate Compact State Status

In [295]:
url = "https://www.nationalpopularvote.com/state-status"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

<!DOCTYPE html>
<!--[if IEMobile 7]><html class="iem7"  lang="en" dir="ltr"><![endif]-->
<!--[if lte IE 6]><html class="lt-ie9 lt-ie8 lt-ie7"  lang="en" dir="ltr"><![endif]-->
<!--[if (IE 7)&(!IEMobile)]><html class="lt-ie9 lt-ie8"  lang="en" dir="ltr"><![endif]-->
<!--[if IE 8]><html class="lt-ie9"  lang="en" dir="ltr"><![endif]-->
<!--[if (gte IE 9)|(gt IEMobile 7)]><!-->
<html dir="ltr" lang="en" prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article# book: http://ogp.me/ns/book# profile: http://ogp.me/ns/profile# video: http://ogp.me/ns/video# product: http://ogp.me/ns/product# content: http://purl.org/rss/1.0/modules/content/ dc: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdfs: http://www.w3.org/2000/01/rdf-schema# sioc: http://rdfs.org/sioc/ns# sioct: http://rdfs.org/sioc/types# skos: http://www.w3.org/2004/02/skos/core# xsd: http://www.w3.org/2001/XMLSchema#">
 <!--<![endif]-->
 <head>
  <meta charset="utf-8"/>
  <link href="https://www.nationalpopularvo

In [349]:
#skips over the initial list (and links) of states at the top of the page
init_state_skip = soup.find("p", text=re.compile("On the map below"))

In [363]:
#actually brings us to the states and their current statuses
state_start = init_state_skip.find_next_sibling().find_next_sibling()

In [364]:
state_start

<p> </p>

In [365]:
state_start.find_next_sibling().find_next_sibling().find_next_sibling().get_text()

'Arkansas\xa0- Passed House in 2007 and 2009'

In [361]:
'Arizona\xa0- Passed\xa0House in 2016'.replace('\xa0', ' ').split(" ")

['Arizona', '-', 'Passed', 'House', 'in', '2016']

In [374]:
state_start.find_next_sibling().contents

[<a class="menu__link" href="/state/ak">Alaska</a>]

In [381]:
state_npvic_status = {}
for p in state_start.find_next_siblings():
    if p.contents == []:
        break
    
    soup_state_status = p.get_text().replace('\xa0', ' ')
    if '-' not in soup_state_status:
        state = soup_state_status
        status = ''
    else:
        dash_index = soup_state_status.index('-')
        state = soup_state_status[:dash_index-1]
        status = soup_state_status[dash_index+2:]
        
    state_npvic_status.update({state: status})

In [382]:
state_npvic_status

{'Alaska': '',
 'Alabama': '',
 'Arkansas': 'Passed House in 2007 and 2009',
 'Arizona': 'Passed House in 2016',
 'California': 'Enacted into law',
 'Colorado': 'Enacted into law, but subject to statewide vote in November 2020',
 'Connecticut': 'Enacted into law',
 'District of Columbia': 'Enacted into law',
 'Delaware': 'Enacted into law',
 'Florida': '',
 'Georgia': 'Unanimously approved by House committee in 2016',
 'Hawaii': 'Enacted into law',
 'Iowa': '',
 'Idaho': '',
 'Illinois': 'Enacted into law',
 'Indiana': '',
 'Kansas': '',
 'Kentucky': '',
 'Louisiana': '',
 'Massachusetts': 'Enacted into law',
 'Maryland': 'Enacted into law',
 'Maine': 'Passed Maine in 2008 and 2019',
 'Michigan': 'Passed House in 2008',
 'Minnesota': 'Passed House in 2019',
 'Missouri': 'Unanimously approved by House committee in 2016',
 'Mississippi': '',
 'Montana': '',
 'North Carolina': 'Passed Senate in 2007',
 'North Dakota': '',
 'Nebraska': '',
 'New Hampshire': '',
 'New Jersey': 'Enacted into

In [333]:
for tr in start.find_next_siblings("p"):
    # exit if reached C
    if tr.find("td", text="C"):
        break

    # get all tds with a desired class
    tds = tr.find_all("td", class_="y")
    for td in tds:
        print(td.get_text())

[<img alt="Home" class="header__logo-image img-responsive" src="https://www.nationalpopularvote.com/sites/all/themes/npv/logo.png"/>,
 <img alt="" height="936" src="/sites/default/files/npv-map-v9-2019-7-11-png.png" width="1200"/>]

In [330]:
soup.find_all(href=re.compile("/state*"))

[<link href="https://www.nationalpopularvote.com/state-status" rel="canonical"/>,
 <a class="menu__link is-active-trail active-trail active" href="/state-status">Status in States</a>,
 <a class="menu__link is-active-trail active-trail active" href="/state-status">Status in States</a>,
 <a class="menu__link" href="/state/ak">Alaska</a>,
 <a class="menu__link" href="/state/al">Alabama</a>,
 <a class="menu__link" href="/state/ar">Arkansas</a>,
 <a class="menu__link" href="/state/az">Arizona</a>,
 <a class="menu__link" href="/state/ca">California</a>,
 <a class="menu__link" href="/state/co">Colorado</a>,
 <a class="menu__link" href="/state/ct">Connecticut</a>,
 <a class="menu__link" href="/state/dc">DC</a>,
 <a class="menu__link" href="/state/de">Delaware</a>,
 <a class="menu__link" href="/state/fl">Florida</a>,
 <a class="menu__link" href="/state/ga">Georgia</a>,
 <a class="menu__link" href="/state/hi">Hawaii</a>,
 <a class="menu__link" href="/state/ia">Iowa</a>,
 <a class="menu__link" hr

### Merging all data into SQL Database

In [160]:
#merge the overseas data with the main population estimates table and extrapolate overseas estimates based on 
#percentage change in the population year over year

#merge the population estimates with overseas data on State and Year
merged_df = main_df.merge(overseas_df, how='left', on=['State','Year'])

#create our Percent change in population column
merged_df['Percent_change'] = (merged_df.Population - merged_df.Population.shift(1)) / merged_df.Population.shift(1)

#iterate through the missing data in overseas population, calculating the next missing value from the previous known
#(from the 2010 census) or from the last calculated value based on percentage change in that state
for i in range(len(merged_df)):
    if pd.isna(merged_df.iloc[i]['Overseas_pop']):
        merged_df.at[i,'Overseas_pop'] = round(merged_df.iloc[i]['Percent_change'] * \
                                               merged_df.iloc[i-1]['Overseas_pop'] + \
                                               merged_df.iloc[i-1]['Overseas_pop'], 0)
        
merged_df.drop('Percent_change', axis=1, inplace=True)        
merged_df['Overseas_pop'] = merged_df.Overseas_pop.astype('int64')

display(merged_df.head(30))
display(merged_df.info())

Unnamed: 0,Year,State,Population,Overseas_pop
0,2010,Minnesota,5303925,10954
1,2011,Minnesota,5346143,11041
2,2012,Minnesota,5376643,11104
3,2013,Minnesota,5413479,11180
4,2014,Minnesota,5451079,11258
5,2015,Minnesota,5482032,11322
6,2016,Minnesota,5522744,11406
7,2017,Minnesota,5566230,11496
8,2018,Minnesota,5606249,11579
9,2019,Minnesota,5639632,11648


<class 'pandas.core.frame.DataFrame'>
Int64Index: 520 entries, 0 to 519
Data columns (total 4 columns):
Year            520 non-null object
State           520 non-null object
Population      520 non-null int64
Overseas_pop    520 non-null int64
dtypes: int64(2), object(2)
memory usage: 40.3+ KB


None

In [161]:
#put pop data into SQL database for easy retrieval
conn = sqlite3.connect('census_pop_data.db')
c = conn.cursor()

In [163]:
try:
    c.execute('CREATE TABLE POPULATION (Year text, State text, Population integer, Overseas_pop integer)')
    conn.commit()
    print('Population Table created (Year, State, Population, Overseas_pop)')
except:
    c.execute('DROP TABLE POPULATION')
    print('Population table dropped')
    c.execute('CREATE TABLE POPULATION (Year text, State text, Population integer, Overseas_pop integer)')
    conn.commit()
    print('Population table created (Year, State, Population, Overseas_pop)')

Population table dropped
Population table created (Year, State, Population, Overseas_pop)


In [166]:
merged_df.to_sql('POPULATION', conn, if_exists='replace')

In [167]:
c.execute('SELECT * FROM population')
for row in c.fetchall():
    print(row)

(0, '2010', 'Minnesota', 5303925, 10954)
(1, '2011', 'Minnesota', 5346143, 11041)
(2, '2012', 'Minnesota', 5376643, 11104)
(3, '2013', 'Minnesota', 5413479, 11180)
(4, '2014', 'Minnesota', 5451079, 11258)
(5, '2015', 'Minnesota', 5482032, 11322)
(6, '2016', 'Minnesota', 5522744, 11406)
(7, '2017', 'Minnesota', 5566230, 11496)
(8, '2018', 'Minnesota', 5606249, 11579)
(9, '2019', 'Minnesota', 5639632, 11648)
(10, '2010', 'Mississippi', 2967297, 10943)
(11, '2011', 'Mississippi', 2978731, 10985)
(12, '2012', 'Mississippi', 2983816, 11004)
(13, '2013', 'Mississippi', 2988711, 11022)
(14, '2014', 'Mississippi', 2990468, 11028)
(15, '2015', 'Mississippi', 2988471, 11021)
(16, '2016', 'Mississippi', 2987938, 11019)
(17, '2017', 'Mississippi', 2988510, 11021)
(18, '2018', 'Mississippi', 2981020, 10993)
(19, '2019', 'Mississippi', 2976149, 10975)
(20, '2010', 'Missouri', 5988927, 22551)
(21, '2011', 'Missouri', 6010275, 22631)
(22, '2012', 'Missouri', 6024367, 22684)
(23, '2013', 'Missouri', 60

## Pull data for manipulation

In [214]:
# init dictionary for states and their populations, house seats, and priority value
state_dict = {}

# query database and put data to manipulate in dictionary
c.execute('SELECT state, (population + overseas_pop) as total_pop \
           FROM population \
           WHERE year = "2010" AND state NOT IN ("District of Columbia", "Puerto Rico") \
           ORDER BY state')
for state in c.fetchall():
    state_dict.update({state[0]: {'Population': state[1],
                                  'House_seats': 1,
                                  'Priority_value': priority_value(state[1], 2)}})
    
display(state_dict)
display(len(state_dict.keys()))

{'Alabama': {'Population': 4802982,
  'House_seats': 1,
  'Priority_value': 3396221},
 'Alaska': {'Population': 721523, 'House_seats': 1, 'Priority_value': 510194},
 'Arizona': {'Population': 6412700,
  'House_seats': 1,
  'Priority_value': 4534464},
 'Arkansas': {'Population': 2926229,
  'House_seats': 1,
  'Priority_value': 2069156},
 'California': {'Population': 37341989,
  'House_seats': 1,
  'Priority_value': 26404774},
 'Colorado': {'Population': 5044930,
  'House_seats': 1,
  'Priority_value': 3567304},
 'Connecticut': {'Population': 3581628,
  'House_seats': 1,
  'Priority_value': 2532593},
 'Delaware': {'Population': 900877,
  'House_seats': 1,
  'Priority_value': 637016},
 'Florida': {'Population': 18900773,
  'House_seats': 1,
  'Priority_value': 13364865},
 'Georgia': {'Population': 9727566,
  'House_seats': 1,
  'Priority_value': 6878428},
 'Hawaii': {'Population': 1366862, 'House_seats': 1, 'Priority_value': 966517},
 'Idaho': {'Population': 1573499, 'House_seats': 1, 'Pr

50

In [215]:
def distribute_house_seats(state_dict):
    """
    Iterates through all available house seats after initial 50 are distributed (one to each state)
    to determine who should get each next house seat according to priority value
    """
        
    distributed_dict = copy.deepcopy(state_dict)
    #iterates over all available house seats, range chosen for actual house seat being evaluated
    #whichever state is highest, add one house seat and recalculate the priority value
    for i in range(51, 436):
        next_state = assign_next_seat(distributed_dict)
        distributed_dict[next_state]['House_seats'] += 1
        distributed_dict[next_state]['Priority_value'] = priority_value(distributed_dict[next_state]['Population'], 
                                                                        distributed_dict[next_state]['House_seats']+1)
    
    #return the dictionary with properly distributed house seats    
    return distributed_dict

In [216]:
def assign_next_seat(distributed_dict):
    """
    Assigns the next house seat available according to current priority values for each state
    Returns a list with the first value as the state who should get the next seat
    """
    top_value = ["",0]
    
    #iterate through the dictionary keys and keep the state and value for whoever has the 
    #highest priority value
    for state in distributed_dict.keys():
        if distributed_dict[state]['Priority_value'] > top_value[1]:
            top_value = [state, distributed_dict[state]['Priority_value']]  
    
    #return just the state with the top priority value
    return top_value[0]

In [287]:
# creates a dictionary with house seats properly distributed
apportioned_dict = distribute_house_seats(state_dict)

In [288]:
# reads census apportionment of house seats into pandas dataframe and cleans
census_apportionment = pd.read_excel('ApportionmentPopulation2010.xls', skiprows=10, usecols=[0,3])[:50]
census_apportionment.columns = ['State','House_seats']
census_apportionment['House_seats'] = census_apportionment.House_seats.astype('int64')
census_apportionment.set_index('State', inplace=True)

display(census_apportionment.head())
display(census_apportionment.info())

Unnamed: 0_level_0,House_seats
State,Unnamed: 1_level_1
Alabama,7
Alaska,1
Arizona,9
Arkansas,4
California,53


<class 'pandas.core.frame.DataFrame'>
Index: 50 entries, Alabama to Wyoming
Data columns (total 1 columns):
House_seats    50 non-null int64
dtypes: int64(1)
memory usage: 800.0+ bytes


None

In [289]:
# prints out the state, the calculated house seats through my code, the actual assigned seats according
# to the census bureau, and if they match (to check that my math matches theirs)

print('State', '\t\t', 'Calc', '\t', 'Actual', 'Match')
print()
for state in apportioned_dict.keys():
    if len(state) <= 6:
        print(state, '\t\t', apportioned_dict[state]['House_seats'], '\t', census_apportionment.loc[state][0], 
              '\t', apportioned_dict[state]['House_seats'] == census_apportionment.loc[state][0])
    else:
        print(state, '\t', apportioned_dict[state]['House_seats'], '\t', census_apportionment.loc[state][0], 
              '\t', apportioned_dict[state]['House_seats'] == census_apportionment.loc[state][0]) 

State 		 Calc 	 Actual Match

Alabama 	 7 	 7 	 True
Alaska 		 1 	 1 	 True
Arizona 	 9 	 9 	 True
Arkansas 	 4 	 4 	 True
California 	 53 	 53 	 True
Colorado 	 7 	 7 	 True
Connecticut 	 5 	 5 	 True
Delaware 	 1 	 1 	 True
Florida 	 27 	 27 	 True
Georgia 	 14 	 14 	 True
Hawaii 		 2 	 2 	 True
Idaho 		 2 	 2 	 True
Illinois 	 18 	 18 	 True
Indiana 	 9 	 9 	 True
Iowa 		 4 	 4 	 True
Kansas 		 4 	 4 	 True
Kentucky 	 6 	 6 	 True
Louisiana 	 6 	 6 	 True
Maine 		 2 	 2 	 True
Maryland 	 8 	 8 	 True
Massachusetts 	 9 	 9 	 True
Michigan 	 14 	 14 	 True
Minnesota 	 8 	 8 	 True
Mississippi 	 4 	 4 	 True
Missouri 	 8 	 8 	 True
Montana 	 1 	 1 	 True
Nebraska 	 3 	 3 	 True
Nevada 		 4 	 4 	 True
New Hampshire 	 2 	 2 	 True
New Jersey 	 12 	 12 	 True
New Mexico 	 3 	 3 	 True
New York 	 27 	 27 	 True
North Carolina 	 13 	 13 	 True
North Dakota 	 1 	 1 	 True
Ohio 		 16 	 16 	 True
Oklahoma 	 5 	 5 	 True
Oregon 		 5 	 5 	 True
Pennsylvania 	 18 	 18 	 True
Rhode Island 	 2 	 2 

In [290]:
c.close()
conn.close()

# Working Zone