<a href="https://colab.research.google.com/github/benliebersohn/alg-ds-lab2/blob/master/Interactive_Laborforce_participation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interactive Laborforce Participation Rate Map

## Goals:
### 1. Demonstrate how to use Python to map US Census Bureau data

*   Provide informative links, data, and template style code for you to refer to.

### 2. Identify a meaningful metric you can model and visualize

*   Laborforce participation can be derived from a simple model.
*   This is sometimes an overlooked metric that can be critical to reenrolling discouraged workers into the workforce. Addressing this topic is central in political policy debates. The importance of laborforce participation has been made especially prominent by standing labor shortages since the global Covid-19 pandemic began.

### 3. Visualize the data and export webmaps
*   This python notebook uses data provided by the US Census Bureau to make interactive maps using Python. This notebook can be a resource for exploratory work with US Census data, both in its unmodified state and after doing some modifications. This is very useful if you are planning to do advanced modelling, or just want to visually display a table.



This notebook was written by Ben Liebersohn in 2021 for the Cyber GIS Center RIF meeting, 10/25/2021

The function to set up Bokeh is by Dr. Ziqi Li. It allows us to use Bokeh for mapping purposes.

### Some useful and practical python skills/topics

- [Using `censusdata` library to pull ACS data](#census)
- [Geocoding using `geopandas`](#geocoding)
- [Spatial join using `geopandas` (e.g. point-in-polygon)](#sj)
- [Interactive map with `bokeh`](#bokeh)



Further resources:

`censusdata`: https://jtleider.github.io/censusdata/

`geopandas` geocoding: https://automating-gis-processes.github.io/CSC18/lessons/L3/geocoding.html

`geopandas` spatial join: https://geopandas.org/mergingdata.html

`bokeh` example: https://docs.bokeh.org/en/latest/docs/gallery/texas.html

Download shapefile for the US counties from TIGER/Shapefile website, or for this one (cb_2018_us_county_20m.zip), download it here: https://drive.google.com/file/d/1Mv6f4SeVEk75-K1fRRheFXEAh94U_y0f/view?usp=sharing

In [1]:
!unzip cb_2018_us_county_20m.zip
# This extracts our compressed folder. If it "cannot find or open" then you need to upload the zip file
# Notice all the output, this is useful for understanding what is happening

unzip:  cannot find or open cb_2018_us_county_20m.zip, cb_2018_us_county_20m.zip.zip or cb_2018_us_county_20m.zip.ZIP.


In [2]:
!apt-get install -y python-rtree; #Semicolon at the end suppresses the output when installing the rtree library

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package python-rtree


In [3]:
!pip install pygeos #We install some libraries with the tool called "pip"
!pip install geopandas; #semicolon suppressed output
# If you want to learn more about these libraries, Google search "Pygeos" or "geopandas"

Collecting pygeos
  Downloading pygeos-0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pygeos
Successfully installed pygeos-0.14


In [4]:
import pygeos # Now that we installed pygeos, we still need to import them
import pandas as pd # by using the shorter names we save some time later on
import numpy as np # instead of calling it numpy we call it np
import geopandas as gpd # even though we are calling it gpd it is still widely known as "geopandas"


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd # even though we are calling it gpd it is still widely known as "geopandas"


<a id='census'></a>
### Census API


In [5]:
!pip install censusdata # We install and import the censusdata library
import censusdata # This library loads census data for us, so we don't need to go to the US Census Bureau website

Collecting censusdata
  Downloading CensusData-1.15.post1.tar.gz (26.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.6/26.6 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: censusdata
  Building wheel for censusdata (setup.py) ... [?25l[?25hdone
  Created wheel for censusdata: filename=CensusData-1.15.post1-py3-none-any.whl size=28205746 sha256=3c53161ca20f2e0e1151f4fe30a2fef851b4c7834f67e604a066e036e8c0fc53
  Stored in directory: /root/.cache/pip/wheels/40/0a/09/c996fa9cc686a1efb90426ce5fbaac1e2e0d7e0efbb3939a85
Successfully built censusdata
Installing collected packages: censusdata
Successfully installed censusdata-1.15.post1


In [6]:
censusdata.printtable(censusdata.censustable('acs5', 2018, 'B28007')) # 2018 ACS 5 YEAR: LABOR FORCE STATUS BY PRESENCE OF A COMPUTER AND TYPES OF INTERNET SUBSCRIPTION IN HOUSEHOLD

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B28007_001E  | LABOR FORCE STATUS BY PRESENCE | !! Estimate Total                                        | int  
B28007_002E  | LABOR FORCE STATUS BY PRESENCE | !! !! Estimate Total In the civilian labor force         | int  
B28007_003E  | LABOR FORCE STATUS BY PRESENCE | !! !! !! Estimate Total In the civilian labor force Empl | int  
B28007_004E  | LABOR FORCE STATUS BY PRESENCE | !! !! !! !! Estimate Total In the civilian labor force E | int  
B28007_005E  | LABOR FORCE STATUS BY PRESENCE | !! !! !! !! !! Estimate Total In the civilian labor forc | int  
B28007_006E  | LABOR FORCE STATUS BY PRESENCE | !! !! !! !! !! Estimate Total In the civilian labor forc | int  
B28007_007E  | LABOR FORCE STATUS BY PRESENCE | !! !! !! !! !! Estimate Total In the civilian

In [None]:
labor = censusdata.download('acs5', 2018,
           censusdata.censusgeo([('state', '17'),  # STATE is Illinois
                                 ('county', '*')]), # All Counties in Illinois
                                 ['B28007_001E','B28007_002E','B28007_006E','B28007_012E','B28007_018E'])

labor.columns = ['total','total_in_laborforce','employed_w_broadband','unemployed_w_broadband','not_laborforce_w_broadband']

# B28007_006E == EMPLOYED WITH BROADBAND
# B28007_012E == UNEMPLOYED WITH BROADBAND
# B28007_018E == NOT IN LABOR FORCE WITH BROADBAND

In [None]:
labor

In [None]:
# Declaring a new list as a column: pct_in_laborforce = 100*(total_in_laborforce/total)
# Example: IF total pop = 20, total_in_laborforce = 10, THEN pct_in_laborforce = 50
pct_in_laborforce = 100*(labor.total_in_laborforce.divide(labor.total))
labor['pct_in_laborforce'] = pct_in_laborforce

In [None]:
censusdata.export.exportcsv("labor.csv", labor)
labor = pd.read_csv("labor.csv")

In [None]:
labor.head()

In [None]:
labor["GEOID"]= (labor['state'].astype(str).str.zfill(2) + labor['county'].astype(str).str.zfill(3)).astype(int)

In [None]:
# Read in shapefile, and convert TRACTCE to type INT
counties = gpd.read_file("/content/cb_2018_us_county_20m/cb_2018_us_county_20m.shp")
counties["GEOID_int"] = counties.GEOID.astype(int)
# Drop non-IL data from the map
#counties.drop('STATEFP'=='17')

In [None]:
counties.head()

In [None]:
counties["GEOID_int"] = counties.GEOID.astype(int)

In [None]:
# Join US Counties shapefile with the workforce data
labor_county_merged = gpd.GeoDataFrame(pd.merge(labor,counties,how="left",left_on="GEOID",right_on='GEOID_int'))

#labor_county_map = gpd.GeoDataFrame(pd.merge(labor,counties,how="left",left_on="county",right_on="COUNTYFP"))

In [None]:
labor_county_merged

In [None]:
labor_county_merged.plot(column="pct_in_laborforce",legend=True)

In [None]:
#exporting to shapefile for more analysis with ArcGIS Pro
labor_county_map.to_file("labor_county_map.shp")

<a id='bokeh'></a>
### Interactive mapping


In [None]:
#We use an awesome package called bokeh (I love this package!)

from bokeh.io import output_file, show,output_notebook
from bokeh.models import ColumnDataSource,ColorBar,HoverTool
from bokeh.transform import linear_cmap
from bokeh.plotting import figure
from bokeh.palettes import Spectral6 #https://docs.bokeh.org/en/latest/docs/reference/palettes.html

In [None]:
#To make your map be outputted inline
output_notebook()

In [None]:
#Don't change this!!!
#Just copy this whole cell.
#This is a helper function for converting a GeoDataFrame to the format that bokeh can recognize.

def gpd_bokeh(df):
    """Convert geometries from geopandas to bokeh format"""
    nan = float('nan')
    lons = []
    lats = []
    for i,shape in enumerate(df.geometry.values):
        if shape.geom_type == 'MultiPolygon':
            gx = []
            gy = []
            ng = len(shape.geoms) - 1
            for j,member in enumerate(shape.geoms):
                xy = np.array(list(member.exterior.coords))
                xs = xy[:,0].tolist()
                ys = xy[:,1].tolist()
                gx.extend(xs)
                gy.extend(ys)
                if j < ng:
                    gx.append(nan)
                    gy.append(nan)
            lons.append(gx)
            lats.append(gy)

        else:
            xy = np.array(list(shape.exterior.coords))
            xs = xy[:,0].tolist()
            ys = xy[:,1].tolist()
            lons.append(xs)
            lats.append(ys)

    return lons,lats

In [None]:
#Feed in the data for bokeh

lons, lats = gpd_bokeh(labor_county_merged)

source = ColumnDataSource(data=dict( #specify the x, y coordinates, and the data we want to put in to the map
        x=lons,
        y=lats,
        name = labor_county_merged['NAME_x'], #Add any columns you want to bokeh. NAME_x is the county name
        population = labor_county_merged['total'],
        pct_in_laborforce = labor_county_merged['pct_in_laborforce'])) #pct_bach is the percentage of residents with a bachelors degree

In [None]:
#Create a color map
color_mapper = linear_cmap(field_name='pct_in_laborforce', #the field to map
                           palette=Spectral6, #the color to use
                           low=min(labor_county_merged['pct_in_laborforce']) , # The low and high bounds for your color map
                           high=max(labor_county_merged['pct_in_laborforce']))


In [None]:
#Add tools you want
TOOLS = "pan,wheel_zoom,reset,hover,save"

In [None]:
#Create a plot frame with size and title
map = figure(plot_width=500, plot_height=660,title="Illinois Laborforce Participation Rate by County, 2018 ACS 5 Year Estimate", tools=TOOLS)

#Add the polygon patches
map.patches('x', 'y', source=source, line_color="white", line_width=0.1, color=color_mapper)

#Add the hover tool and the hover field to display
map.select_one(HoverTool).tooltips = [
    ('County Name', '@name'), #each tuple needs to follow this format.
    ('Population', '@population'),
    ('% Laborforce Participation', '@pct_in_laborforce')
]

#Add your colorbar
color_bar = ColorBar(color_mapper=color_mapper['transform'], width=16, location=(0,0))
map.add_layout(color_bar, 'right')

In [None]:
#Show the map
show(map)

In [None]:
#You can export your map to a html file.
output_file("laborforce_participation.html")