## 7. Geospatial Analysis

### This script contains the following:
1. Importing libraries and data
2. Data wrangling for geospatial analysis
3. Data cleaning for geospatial analysis 
4. Choropleth of dataset

### 1. Importing libraries and data

In [2]:
import pandas as pd
import numpy as np
import os
import matplotlib
import seaborn as sns
import folium
import json

In [3]:
path = r'C:\Users\Neena Tilton\Dropbox\Projects\MinWage_Crime'

In [4]:
# Import merged dataframe from last script 

df = pd.read_pickle(os.path.join(path, '02_Data', 'PreparedData', 'df_newvar.pkl'))

In [5]:
# Prompt matplotlib visuals to appear in the notebook

%matplotlib inline

In [6]:
# Import json file of US states

country_geo = r"C:\Users\Neena Tilton\Dropbox\CareerFoundry\Dataset\originalUSstates.json"

In [8]:
df.head()

Unnamed: 0,Year,State,state_mw,state_mw_2020,fed_mw,fed_mw_2020,effective_mw,effective_mw_2020,prisoner_count,state_population,violent_crime,murder,robbery,burglary,incarceration_rate,rate_rank,avg_rate_of_year
0,2001,Alabama,0.0,0.0,5.15,7.52,5.15,7.52,24741.0,4468912.0,19582.0,379.0,5584.0,40642.0,0.005536,High,0.003832
1,2001,Alaska,5.65,8.25,5.15,7.52,5.65,8.25,4570.0,633630.0,3735.0,39.0,514.0,3847.0,0.007212,Unusually High,0.003832
2,2001,Arizona,0.0,0.0,5.15,7.52,5.15,7.52,27710.0,5306966.0,28675.0,400.0,8868.0,54821.0,0.005221,High,0.003832
3,2001,Arkansas,5.15,7.52,5.15,7.52,5.15,7.52,11489.0,2694698.0,12190.0,148.0,2181.0,22196.0,0.004264,Medium,0.003832
4,2001,California,6.25,9.13,5.15,7.52,6.25,9.13,157142.0,34600464.0,212867.0,2206.0,64614.0,232273.0,0.004542,Medium,0.003832


In [9]:
# See json file

f = open(country_geo)

In [10]:
data = json.load(f)

In [11]:
for i in data['features']:
    print(i)

{'type': 'Feature', 'id': 'AL', 'properties': {'name': 'Alabama'}, 'geometry': {'type': 'Polygon', 'coordinates': [[[-87.359296, 35.00118], [-85.606675, 34.984749], [-85.431413, 34.124869], [-85.184951, 32.859696], [-85.069935, 32.580372], [-84.960397, 32.421541], [-85.004212, 32.322956], [-84.889196, 32.262709], [-85.058981, 32.13674], [-85.053504, 32.01077], [-85.141136, 31.840985], [-85.042551, 31.539753], [-85.113751, 31.27686], [-85.004212, 31.003013], [-85.497137, 30.997536], [-87.600282, 30.997536], [-87.633143, 30.86609], [-87.408589, 30.674397], [-87.446927, 30.510088], [-87.37025, 30.427934], [-87.518128, 30.280057], [-87.655051, 30.247195], [-87.90699, 30.411504], [-87.934375, 30.657966], [-88.011052, 30.685351], [-88.10416, 30.499135], [-88.137022, 30.318396], [-88.394438, 30.367688], [-88.471115, 31.895754], [-88.241084, 33.796253], [-88.098683, 34.891641], [-88.202745, 34.995703], [-87.359296, 35.00118]]]}}
{'type': 'Feature', 'id': 'AK', 'properties': {'name': 'Alaska'},

### 2. Data wrangling for geospatial analysis

Create a subset of only needed columns for geospatial visualization by dropping unneccessary columns:

In [12]:
df.columns

Index(['Year', 'State', 'state_mw', 'state_mw_2020', 'fed_mw', 'fed_mw_2020',
       'effective_mw', 'effective_mw_2020', 'prisoner_count',
       'state_population', 'violent_crime', 'murder', 'robbery', 'burglary',
       'incarceration_rate', 'rate_rank', 'avg_rate_of_year'],
      dtype='object')

In [15]:
df_sub = df[['State','incarceration_rate']]
df_sub.head()

Unnamed: 0,State,incarceration_rate
0,Alabama,0.005536
1,Alaska,0.007212
2,Arizona,0.005221
3,Arkansas,0.004264
4,California,0.004542


In [16]:
# Setup a folium map at a high-level zoom
map = folium.Map(location = [100, 0], zoom_start = 1.5)

folium.Choropleth(
    geo_data = country_geo, 
    data = df_sub,
    columns = ['State', 'incarceration_rate'],
    key_on = 'feature.properties.name', 
    fill_color = 'YlOrBr', fill_opacity=0.6, line_opacity=0.1,
    legend_name = 'Rate of Incarceration').add_to(map)
folium.LayerControl().add_to(map)

map

In [17]:
map.save(os.path.join(path, '04_Analysis','incarceration_map.html')) 

The states with the highest rate of incarceration (prison population of total population) are Arizona, Alaska, Oklahoma, and Washington D.C. There is also a concentration in the South of higher incarceration rate, including Missouri, Arkasas, Texas, and Georgia. The states with lower rates are Utah, North Dakota, Minnessota, Maine, Massachusetts, and New Hampshire. 