# Choropleth Map
In this notebook, I will be practicing how to create choropleth maps using plotly.
I will be plotting two different datasets provided by the "Python for data science and machine learning bootcamp" Udemy course.

To begin, we're going to import the necessay libraries to allow us to work with and see the plotly graphs in jupyter notebook

In [11]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

Next, we will import pandas and read our first dataset, 2014 World Power Consumption

In [2]:
import pandas as pd

In [3]:
dfwpc = pd.read_csv('2014_World_Power_Consumption')

In [4]:
dfwpc.head()

Unnamed: 0,Country,Power Consumption KWH,Text
0,China,5523000000000.0,"China 5,523,000,000,000"
1,United States,3832000000000.0,"United 3,832,000,000,000"
2,European,2771000000000.0,"European 2,771,000,000,000"
3,Russia,1065000000000.0,"Russia 1,065,000,000,000"
4,Japan,921000000000.0,"Japan 921,000,000,000"


We see that we have the power consumption by country and an accompanying descriptive text. Next, we'll create the data and layout dictionaries for plotly to read. 

In [18]:
data = dict(type='choropleth',
           locations=dfwpc['Country'],
            locationmode = 'country names',
           z = dfwpc['Power Consumption KWH'],
           text = dfwpc['Text'],
           colorbar = {'title':'Power Consumption (KWH)'})

In [24]:
layout = dict(title = '2014 Global Power Consumption',
              geo = dict(showframe = False,
                        projection = {'type':'natural earth'}))

Now, we can use the data and layout we made and put it into a variable that plotly can read and read it

In [25]:
choromap1 = go.Figure(data = [data], layout = layout)
iplot(choromap1)

The next data set we're going to utilize is data from the 2012 US Election

In [93]:
dfe = pd.read_csv('2012_Election_Data')

In [27]:
dfe.head()

Unnamed: 0,Year,ICPSR State Code,Alphanumeric State Code,State,VEP Total Ballots Counted,VEP Highest Office,VAP Highest Office,Total Ballots Counted,Highest Office,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,State Abv
0,2012,41,1,Alabama,,58.6%,56.0%,,2074338,3539217,3707440.0,2.6%,32232,57993,8616,71584,AL
1,2012,81,2,Alaska,58.9%,58.7%,55.3%,301694.0,300495,511792,543763.0,3.8%,5633,7173,1882,11317,AK
2,2012,61,3,Arizona,53.0%,52.6%,46.5%,2323579.0,2306559,4387900,4959270.0,9.9%,35188,72452,7460,81048,AZ
3,2012,42,4,Arkansas,51.1%,50.7%,47.7%,1078548.0,1069468,2109847,2242740.0,3.5%,14471,30122,23372,53808,AR
4,2012,71,5,California,55.7%,55.1%,45.1%,13202158.0,13038547,23681837,28913129.0,17.4%,119455,0,89287,208742,CA


We notice that, unlike the previous dataset, there are many more columns of data available.

To begin with, let's create a choropleth map of the Voting-Age-Population (VAP) across the US by state

Like before, we need to create the data and layout for this choropleth map. This will become our template for using other variables later on

In [36]:
data = dict(type='choropleth',
           colorscale = 'YlOrRd',
            reversescale = True,
           locations = dfe['State Abv'],
           locationmode = 'USA-states',
           z = dfe['Voting-Age Population (VAP)'],
           colorbar = {'title':'Voting Age Population'}
           )

In [29]:
layout = dict(title = 'Voting Age Population by State',
             geo = dict(scope = 'usa',
                       showlakes = True,
                       lakecolor = 'rgb(85,173,240)')
             )

Now let's make our choropleth map

In [37]:
choromap2 = go.Figure(data = [data], layout = layout)
iplot(choromap2)

Next, to continue practicing, I am going to make a choropleth map of prison population by state. 

In [94]:
#first we need to convert prison datatype to a int
sprison = dfe['Prison'].apply(lambda da: da.replace(',','')).astype(int)
dfe['Prison'] =sprison

In [65]:
dfe.head()

Unnamed: 0,Year,ICPSR State Code,Alphanumeric State Code,State,VEP Total Ballots Counted,VEP Highest Office,VAP Highest Office,Total Ballots Counted,Highest Office,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,State Abv
0,2012,41,1,Alabama,,58.6%,56.0%,,2074338,3539217,3707440.0,2.6%,32232,57993,8616,71584,AL
1,2012,81,2,Alaska,58.9%,58.7%,55.3%,301694.0,300495,511792,543763.0,3.8%,5633,7173,1882,11317,AK
2,2012,61,3,Arizona,53.0%,52.6%,46.5%,2323579.0,2306559,4387900,4959270.0,9.9%,35188,72452,7460,81048,AZ
3,2012,42,4,Arkansas,51.1%,50.7%,47.7%,1078548.0,1069468,2109847,2242740.0,3.5%,14471,30122,23372,53808,AR
4,2012,71,5,California,55.7%,55.1%,45.1%,13202158.0,13038547,23681837,28913129.0,17.4%,119455,0,89287,208742,CA


In [63]:
data = dict(type='choropleth',
           colorscale = 'Greens',
            reversescale = True,
           locations = dfe['State Abv'],
           locationmode = 'USA-states',
           z = dfe['Prison'],
           colorbar = {'title':'Number of People in Prison'}
           )

In [52]:
layout = dict(title = 'Number of People in Prison by State',
             geo = dict(scope = 'usa',
                       showlakes = True,
                       lakecolor = 'rgb(85,173,240)')
             )

In [64]:
choromap3 = go.Figure(data = [data], layout = layout)
iplot(choromap3)

Noticing that this looks similar to the Voting Age Population, I'm interested in seeing what the distribution of non-citizens across the US is.

In [95]:
#Same as before, need to change the data type
snoncit = dfe['% Non-citizen'].apply(lambda da: float(da[:-1])/100)
dfe['% Non-citizen'] = snoncit

In [96]:
dfe.head()

Unnamed: 0,Year,ICPSR State Code,Alphanumeric State Code,State,VEP Total Ballots Counted,VEP Highest Office,VAP Highest Office,Total Ballots Counted,Highest Office,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,State Abv
0,2012,41,1,Alabama,,58.6%,56.0%,,2074338,3539217,3707440.0,0.026,32232,57993,8616,71584,AL
1,2012,81,2,Alaska,58.9%,58.7%,55.3%,301694.0,300495,511792,543763.0,0.038,5633,7173,1882,11317,AK
2,2012,61,3,Arizona,53.0%,52.6%,46.5%,2323579.0,2306559,4387900,4959270.0,0.099,35188,72452,7460,81048,AZ
3,2012,42,4,Arkansas,51.1%,50.7%,47.7%,1078548.0,1069468,2109847,2242740.0,0.035,14471,30122,23372,53808,AR
4,2012,71,5,California,55.7%,55.1%,45.1%,13202158.0,13038547,23681837,28913129.0,0.174,119455,0,89287,208742,CA


Now that our data is in the right format, let's make a choropleth map.

In [97]:
data = dict(type='choropleth',
           colorscale = 'Blues',
            reversescale = True,
           locations = dfe['State Abv'],
           locationmode = 'USA-states',
           z = dfe['% Non-citizen'],
           colorbar = {'title':'% population not citizens'}
           )

In [98]:
layout = dict(title = 'Percent of Non-citizen Population by State',
             geo = dict(scope = 'usa',
                       showlakes = True,
                       lakecolor = 'rgb(85,173,240)')
             )

In [99]:
choromap4 = go.Figure(data = [data], layout = layout)
iplot(choromap4)

By comparing the previous 3 maps, we see that, while there are similarities in how they look, it doesn't appear that there are any obvious correlations between any of these three. Any similarities between these states could most likely be attributed to total population of each state, and would need to be validated by the addition of that data to this dataset in order to normalize the data.