# Dynamic Maps with Folium

## Introduction

In this tutorial, we will be looking at creating dynamic maps and visualizing geospatial data with Folium. Folium is a popular dynamic map library built upon a popular Javascript mapping library, Leaflet.js. The benefit of using Folium is that it allows you to use the data manipulation abilities of Python, primarily through pandas and geopandas, while still being able to use the power of the Javascript mapping library, Leaflet.js.

We will start off with the basics, looking at some basic examples of Folium. After covering the basics, we will look at a more in-depth example, visualizing voter turnout by county in the 2012 US Presidential Election.

## Installing Folium, GeoPandas, Geopy

In this tutorial, we will be using Folium primarily, but also we will need GeoPandas and Geopy. 

    $ pip install folium
    
    $ pip install geopy
    
    $ pip install geopandas
    
If for some reason you get an error with pip and you have condas installed, try installing geopandas with the following command.

    $ conda install -c conda-forge geopandas

In [3]:
import folium
import geopandas as gpd
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim

## Folium Basics

Folium allows you to create a Leaflet map with the use of the Map function. Map takes in a variety of parameters to customize the settings for the map. The important parameters are location and zoom start. Location allows you to set the default location of the map and zoom start allows to set the default zoom. There are also a variety of built-in tiles that can be chosen, specified with the tiles parameter. The default tile is OpenStreetMap. For now, we will stick with the default.

In [4]:
location = (40.4435, -79.9435) # coordinates of CMU
zoom_start = 16
CMU_map = folium.Map(location = location, zoom_start = zoom_start)

To display the map, you can call the map object was created from folium.Map. You can also save the map to an HTML file and display the HTML file within a Jupyter notebook using or elsewhere. For the rest of the tutorial, we will be displaying from an HTML file just to make space management easier as these are maps take up a lot of space.

In [5]:
# Display by calling the map object
CMU_map

To save the map to an HTML file, simply call save on the map object and provide a directory path and file name to indicate where you want the file to be saved.

In [6]:
CMU_map.save("extras/maps/CMU_map.html")

To display an HTML within a Juypter Notebook, we can take advantage of magics. Putting %%HTML at the top of a code block will make the code block run HTML. 

In [7]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/CMU_map.html"/>

## Creating Choropleth Maps

We have covered the basics of creating maps. Now, we will look into creating choropleth maps which is a type of map visualization where areas are shaded in proportion to the measurement of the statistical variable being displayed on the map. This is a very useful data visualization technique for geospatial data such as unemployment rates by state.

We will be working with GeoJSON data in this tutorial. If you only have access to shape files (.shp), there are many tools and tutorials online that cover converting shape files to GeoJSON files. Since this tutorial is focused more so on using Folium, it will not be explicitly covered here. Here are some resources on the topic:

https://ogre.adc4gis.com/ (Converter for SHP to GeoJSON)

http://www.mapshaper.org (Online Converter for SHP to GeoJSON)

https://gist.github.com/frankrowe/6071443 (Python code to convert SHP to GeoJSON using PyShp)
    
Within this section, we will be working with the Folium map object method choropleth, which as the name implies, creates a choropleth map. We will start with a very simple example, displaying an overlay of the US states. 

First, we will set up a map as we did in the basics section.

In [8]:
US_center = (39.8282, -98.5795) 
zoom_start = 4
US_state_map = folium.Map(location = US_center, zoom_start= zoom_start)

We will import a GeoJSON file of the states of the US. We will then use the choropleth method on the Folium map object.

In [9]:
us_states = 'extras/data/usa_states.json'
US_state_map.choropleth(geo_path=us_states)
US_state_map.save("extras/maps/US_state_map.html")

In [10]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/US_state_map.html"/>

Now to demonstrate visualizing actual data, we will look at a very simple data set of GDP per capita by US state for 2015. We will import the data, clean it up a bit, and then visualize it on the map.

First, lets import the data and see what we are working with.

In [11]:
gdp = pd.read_csv("extras/data/GDP2015.csv")
gdp.head()

Unnamed: 0,State,GDP
0,AL,42663
1,AK,81801
2,AZ,43269
3,AR,41129
4,CA,61924


In [12]:
gdp.dtypes

State    object
GDP      object
dtype: object

We have two columns, State and GDP (in dollars). Note that GDP is not an integer value. We need to get rid of the commas in the dollar amounts and than convert the column to an integer column so we can visualize it with Folium.

In [13]:
# Remove commas from the dollar amounts
GDP = list(gdp["GDP"])
for i in xrange(len(GDP)):
    GDP[i] = GDP[i].replace(",", "")
gdp["GDP"] = GDP
# Convert to correct types
gdp["GDP"] = gdp["GDP"].astype(int)

In [14]:
gdp.head()

Unnamed: 0,State,GDP
0,AL,42663
1,AK,81801
2,AZ,43269
3,AR,41129
4,CA,61924


In [15]:
gdp.dtypes

State    object
GDP       int32
dtype: object

Before visualizing the data, we should take a look at our GeoJSON file, usa_states.json, to get a sense of how its organized, which will be important for visualizing our GDP per capita data. 

In [16]:
us_states_df = gpd.read_file("extras/data/usa_states.json")
us_states_df.head()

Unnamed: 0,AFFGEOID,ALAND,AWATER,GEOID,LSAD,NAME,STATEFP,STATENS,STUSPS,geometry
0,0400000US31,198972429772,1356294775,31,0,Nebraska,31,1779792,NE,"POLYGON ((-104.053028 43.000587, -103.132955 4..."
1,0400000US53,172121026344,12540093192,53,0,Washington,53,1779804,WA,"POLYGON ((-117.032351 48.999188, -117.04247 47..."
2,0400000US35,314161426332,755674004,35,0,New Mexico,35,897535,NM,"POLYGON ((-109.045223 36.999084, -108.249358 3..."
3,0400000US46,196349394266,3379810671,46,0,South Dakota,46,1785534,SD,"POLYGON ((-104.057698 44.997431, -104.039725 4..."
4,0400000US21,102266253024,2389483091,21,0,Kentucky,21,1779786,KY,"POLYGON ((-89.132685 36.9822, -89.182509 37.03..."


Taking a look at the head of our USA States GeoJSON file, we can see that the STUSPS is the state abbreviations, which is what our GDP data is organized by. This will be important for visualizing the GDP data. It is also important to note that the way GeoJSON data is formated, the data displayed above is stored in usa_states.json within the object feature.properties. Therefore, to access STUSPS, we would access it with properties.STUSPS.

We will now visualize the data using the choropleth function.

There are many parameters for choropleth, but the parmaters of interest are:
    - geo_path - File path to our GeoJSON file that has data for the geography
    - data - Data Frame of data to visualize
    - columns - Column mapping to key_on and column with data values to visualize from data frame of data
    - key_on - Object attribute from geo_path to match on data to visualize
    - fill_color - Color Brewer Code for visualization
    - fill_opacity - Opacity for the fill
    - line_opacity - Opacity for lines

In [17]:
GDP_map = folium.Map(location=US_center, zoom_start=3)
GDP_map.choropleth(geo_path=us_states, 
                   data=gdp,
                   columns=['State', 'GDP'],
                   key_on='properties.STUSPS',
                   fill_color='YlGn', 
                   fill_opacity=0.7, 
                   line_opacity=0.2)
GDP_map.save("extras/maps/GDP_map.html")



In [18]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/GDP_map.html"/>

From this visualization of GDP per capita by state, we can get a better sense of the data. For example, we can see that the South has a lower GDP as compared to the Northeast. 

## Adding Markers to Maps

We will look into adding markers to maps with Folium. We will go back to our map of CMU and add some relevant markers of campus buildings onto the map. We will use GeoPy to get the latitude and longitude of different addresses on campus in order to create the markers. 

Lets first get the latitude and longitude of a few addresses using the geocode method of geolocator from GeoPy.

In [19]:
geolocator = Nominatim()

huntLibrary = geolocator.geocode("4909 Frew St, Pittsburgh, PA")
huntLibraryLoc = (huntLibrary.latitude, huntLibrary.longitude)

morewoodGardens = geolocator.geocode("1060 Morewood Ave, Pittsburgh, PA")
morewoodGardensLoc = (morewoodGardens.latitude, morewoodGardens.longitude)

flagstaff = geolocator.geocode("Flagstaff Hill, Pittsburgh, PA")
flagstaffLoc = (flagstaff.latitude, flagstaff.longitude)

Now that we have the latlong for Hunt Library, Morewood Gardens, and Flagstaff Hill, we will look into adding markers for these locations on our CMU map.

Marker takes in a location and popup text and adds a simple Google Maps like marker on the map. 

In [20]:
folium.Marker(location=huntLibraryLoc, 
              popup='Hunt Library').add_to(CMU_map)

RegularPolygonMarker can be more customized than Marker, taking in a location, popup text, fill_color, the number of sides for the marker to have, and the radius of the marker.

In [None]:
folium.RegularPolygonMarker(location=morewoodGardensLoc, 
                            popup='Morewood Gardens',
                            fill_color='red', 
                            number_of_sides=5, 
                            radius=10).add_to(CMU_map)

CircleMarker takes a location, popup text, color, fill-color, and a radius. It creates a translucent circular marker, which is good for indicating larger locations such as parks.

In [None]:
folium.CircleMarker(location=flagstaffLoc, 
                    radius=75,
                    popup='Flagstaff Hill', 
                    color='#ADD8E6',
                    fill_color='#ADD8E6').add_to(CMU_map)

In [None]:
CMU_map.save("extras/maps/CMU_map_markers.html")

In [21]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/CMU_map_markers.html"/>

## Example: Creating a Political Map With Folium

Now that we have covered the fundamentals of Folium, you should have a good sense of the basic functionalities. To tie everything we have learned together, we will look into using real data sets to visualize a map of voter turnout data by county for the 2012 US Presidential Election. To create this statistic, we will look at US Census data for 2012 to get data on eligible voters by county and then looking at US 2012 Election data to get data on how many people voted in each county.

Therefore, we will be using a few datasets. Some of the data sets had some basic cleaning prior to not detract too much from the primary purpose of this tutorial, but they remain in their original state for the most part to highlight the data cleaning needed to properly use Folium.

The sources for the data can be found in the conclusion.

Lets begin by importing the datasets.

In [22]:
county_shape = gpd.read_file("extras/data/2012CountyShapes.json")
presidential_results = pd.read_csv("extras/data/2012PresidentialResults.csv")
voting_pop = pd.read_csv("extras/data/2012VotingAgePopulation.csv")

Lets take a look at the head of each of our datasets.

Let's start with our GeoJSON file of the counties. The primary columns to take note of here are NAME, GEOID, and geometry. NAME is simply the name of the county, GEOID is a standardized code to uniquely identify the state and county, and geometry is the polygon shape of the county. GEOID is the same as a FIPS code, which will be important when we look at the other datasets.

In [23]:
county_shape.head()

Unnamed: 0,ALAND,AWATER,CBSAFP,CLASSFP,COUNTYFP,COUNTYNS,CSAFP,FUNCSTAT,GEOID,INTPTLAT,INTPTLON,LSAD,METDIVFP,MTFCC,NAME,NAMELSAD,STATEFP,geometry
0,10433603617,2739477738,25900.0,H1,1,365280,,A,15001,19.5977643,-155.5024434,6,,G4020,Hawaii,Hawaii County,15,"POLYGON ((-155.963465 19.082072, -155.968892 1..."
1,1984070931,1809509,31620.0,H1,27,63761,,A,5027,33.2230377,-93.2328433,6,,G4020,Columbia,Columbia County,5,"POLYGON ((-93.365585 33.444459, -93.326751 33...."
2,3009063806,13587978,,H1,11,1035619,,A,38011,46.1101438,-103.5059433,6,,G4020,Bowman,Bowman County,38,"POLYGON ((-104.045569 46.280188, -104.000486 4..."
3,4647119053,315962259,,H1,31,974114,,A,36031,44.1089711,-73.7775726,6,,G4020,Essex,Essex County,36,"POLYGON ((-73.90968699999999 44.429699, -73.80..."
4,9733550626,26310378,,H1,7,929117,,A,35007,36.6129625,-104.6401256,6,,G4020,Colfax,Colfax County,35,"POLYGON ((-105.340907 36.262245, -105.327521 3..."


Now lets take a look at the presidential data. The primary columns to take note here of are FIPS Code and TOTAL VOTES CAST. We can use the FIPS code to map to our geometries in the county_shape file with the GEOID column. TOTAL VOTES CAST is the total number of votes for that county for the 2012 US Presidential Election. 

In [24]:
presidential_results.head()

Unnamed: 0,State Postal,County Number,FIPS Code,County Name,Precincts Reporting,Total Precincts,TOTAL VOTES CAST
0,AK,1,0,Alaska,437,438,220596
1,AK,2001,2000,Alaska,437,438,220596
2,AL,1,0,Alabama,2711,2711,2064699
3,AL,1001,1001,Autauga,28,28,23909
4,AL,1002,1003,Baldwin,55,55,84988


Finally, lets take a look at the voting population data. Notice that the data is broken down by ethnicity. We are primarily concerned about the total elgible voting population for each county, so we will need to filter just those rows out. Also notice we don't have a FIPS column, but we have a GEOID column. This GEOID column is formatted differently from our GEOID column in the county shape file. For this GEOID column, the last 5 digits of the GEOID is the FIPS, so with some data transformations we can create the FIPS column for this data frame and join it with the other two data sets.

In [28]:
voting_pop.head()

Unnamed: 0,GEONAME,LNTITLE,GEOID,LNNUMBER,TOT_EST,TOT_MOE,ADU_EST,ADU_MOE,CIT_EST,CIT_MOE,CVAP_EST,CVAP_MOE
0,"Autauga County, Alabama",Total,05000US01001,1,55135,,40920,,54675,198,40545,160
1,"Autauga County, Alabama",Not Hispanic or Latino,05000US01001,2,53745,,40055,,53510,152,39885,115
2,"Autauga County, Alabama",American Indian or Alaska Native Alone,05000US01001,3,235,64.0,190,55.0,235,64,190,55
3,"Autauga County, Alabama",Asian Alone,05000US01001,4,545,93.0,420,59.0,475,127,355,94
4,"Autauga County, Alabama",Black or African American Alone,05000US01001,5,10155,144.0,7285,41.0,10115,152,7285,41


Lets first visualize the counties on a map to get a sense that our county GeoJSON looks good.

In [29]:
US_center = (39.8282, -98.5795) 
zoom_start = 4
US_counties_map = folium.Map(location = US_center, zoom_start= zoom_start)
us_counties = 'extras/data/2012CountyShapes.json'
US_counties_map.choropleth(geo_path=us_counties)
US_counties_map.save("extras/maps/US_counties_map.html")

In [30]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/US_counties_map.html"/>

Our GeoJSON looks great! Now lets start transforming our data to work toward visualizing election turnout in the 2012 US Presidential Election.

First, we will clean the county shape data. We will simply rename the GEOID column to FIPS so we can join the datasets together later on with the FIPS columns. We will also ensure the FIPS column is a string.

In [38]:
def cleanCountyShapeData(data):
    data = data.rename(columns = {'GEOID' : 'FIPS'}) # rename column
    data["FIPS"] = data["FIPS"].astype(str) # cast column as string type
    return data

Next, we will clean our presidential data. We will rename column and filter to just county data. The presidential data has an issue in that it goes more specific than FIPS code, meaning some rows have the same FIPS code. We need to aggregate the totals of all the votes with the same FIPS code as to match the other two data sets.

In [39]:
def cleanPresidentialElectionData(data):
    data = data.rename(columns = {'FIPS Code' : 'FIPS'}) #rename column
    data["FIPS"] = data["FIPS"].astype(str) # cast column as string type
    data = data[data["FIPS"] != "0"] # remove state level data as we only care about county
    
    # add leading zero if the length of the FIPS is 4
    FIPS = list(data["FIPS"])
    for i in xrange(len(FIPS)):
        if len(FIPS[i]) == 4:
            FIPS[i] = "0" + FIPS[i]
    
    # merge votes of any rows with the same FIPS code
    data["FIPS"] = FIPS
    fips = list(data["FIPS"])
    totals = list(data["TOTAL VOTES CAST"])
    totalByFIP = dict()
    for i in xrange(len(fips)):
        if fips[i] not in totalByFIP:
            totalByFIP[fips[i]] = 0
        totalByFIP[fips[i]] += totals[i]
    fips = []
    totals = []
    for fip in totalByFIP:
        fips.append(fip)
        totals.append(totalByFIP[fip])
        
    #create new data frame with merged FIPS code votes
    transformed = pd.DataFrame()
    transformed["FIPS"] = fips
    transformed["Totals"] = totals
    
    return transformed

Next, we will clean our voting population data. We need to filter out the ethnicity level data and just look at the county level data. We also need to reformat the GEOID column so we just have the FIPS code which the last 5 digits of the GEOID. 

In [40]:
def cleanVotingPopulationData(data):
    data = pd.DataFrame(data[data["LNTITLE"] == "Total"]) # We only want total, not ethnicity level data
    
    # Grab the FIPS code from the GEOID column
    GEOID = list(data["GEOID"])
    for i in xrange(len(GEOID)):
        GEOID[i] = GEOID[i][-5:]
        
    # Add a column for FIPS and cast it to string type
    data["FIPS"] = GEOID
    data["FIPS"] = data["FIPS"].astype(str)
    return data 

Now that our data has been cleaned, we will merge the datasets together on the FIPS column of each data set. We will left join the data sets together beginning with countyData. We use a leftjoin because Folium requires there to be data for every entry in the GeoJSON file, so left joining gurantees we will have a value (or NaN) for every row in the county data.

In [41]:
def cleanAndCombine(countyData, presidentData, votingPopData):
    countyData = cleanCountyShapeData(countyData)
    presidentData = cleanPresidentialElectionData(presidentData)
    votingPopData = cleanVotingPopulationData(votingPopData)
    
    # Left merge the data together starting with county data to ensure we have a row for every FIPS code in the county data
    data = countyData.merge(presidentData, on = "FIPS", how = "left")
    data = data.merge(votingPopData, on = "FIPS", how = "left")
    return data

Finally, we will compute voter turnout with our merged dataset. With our joined dataset, this is as simple as dividing the total number of votes per county by the total number of estimated voting elgible people in the county.

In [42]:
def computeVoterTurnout(data):
    data["VoterTurnout"] = (data["Totals"] / data["TOT_EST"]) * 100
    data["VoterTurnout"] = data["VoterTurnout"].replace(np.nan, 0) # Need to replace NaNs from left-join
    return data

In [43]:
data = cleanAndCombine(county_shape, presidential_results, voting_pop)
data = computeVoterTurnout(data)



Now, we can finally visualize voter turnout. We will key on the GEOID property in the county GEOjson file and visualize the voter turnout column, matching on the FIPS column.

In [None]:
political_map = folium.Map(location = US_center, zoom_start= zoom_start)
political_map.choropleth(geo_path=us_counties, 
                   data=data,
                   columns=['FIPS', 'VoterTurnout'],
                   key_on='feature.properties.GEOID',
                   fill_color='BuPu', 
                   fill_opacity=0.7, 
                   line_opacity=0.2)
political_map.save("extras/maps/political_map.html")

In [34]:
%%HTML
<iframe width="100%" height ="500" src="extras/maps/political_map.html"/>

This map provides a great data visualization of voter turnout. It would especially be useful for Presidential and Congressional campaigns, helping them see trends across the country in voter turnout as well as determining what counties to target for campaigning. This data would be much harder to understand if it was not visualized in this way.

## Conclusion and Further Resources 

Thanks for reading through my tutorial! I hope you feel you now have a better grasp of Folium and learned a bit about geospatial data. Here are some further resources if you are interested in learning more:

Folium: https://folium.readthedocs.io/en/latest/

GeoPandas: http://geopandas.org

GeoPy: https://geopy.readthedocs.io

Voting Age Populating By County: https://www.census.gov/rdo/data/voting_age_population_by_citizenship_and_race_cvap.html

US Presential Election Results by County: https://www.theguardian.com/news/datablog/2012/nov/07/us-2012-election-county-results-download

Shapefile of US Counties: https://catalog.data.gov/dataset/tiger-line-shapefile-2012-nation-u-s-current-county-and-equivlaent-national-shapefile
