PROJECT: VISUALIZATION OF ACCIDENTS IN NY.

SUBPROJECT: Creation of Choropleth Map Based on # Accidents in NY at the ZIP Code Level

At this subproject I'll try to create choropleth map in Folium by showing number of accidents and injuries at the ZIP code level using data from https://data.cityofnewyork.us/NYC-BigApps/NYPD-Motor-Vehicle-Collisions-Summary/m666-sf2m 

First of all we download necessary libraries to Jupyter notebook 

In [103]:
import json
import urllib.request, json
import pandas as pd
import folium

We will use stats from NYPD about Motor Vehicle Collisions. Link is provided in the description section.

In [108]:
df = pd.read_csv(r'C:\Users\Mi Notebook\Downloads\NYPD_Motor_Vehicle_Collisions.csv', parse_dates = ['DATE', 'TIME'])

  interactivity=interactivity, compiler=compiler, result=result)


This data contain all collisions stats starting from 2012 and has 1385920 rows therefore for this project we will use stats only for 2018.

In [112]:
df18 = df[(df['DATE']>='2018-01-01') & (df['DATE']<'2018-11-01')]

Then we do some cleaning of data that contain missing values.

In [113]:
df18.dropna(how = 'all', inplace=True)
df18.dropna(subset=['ZIP CODE'], how = 'any', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


After cleaning the data we want to create a new dataframe from the dataframe that represents number of accidents and total number of injuries from collisions so we could create choropleth map and display it on ZIP code level. 

We begin this by counting values of all accidents. Then we aggregate these numbers in "ACCIDENTS" column per each ZIP code.

To calculate total sum of injuries from the collisions we do this by grouping the dataframe based on each ZIP code and summing number of injuries.

In [114]:
df_zip = pd.DataFrame(df18['ZIP CODE'].value_counts().reset_index().values, columns=["ZIP", "ACCIDENTS"])
df_zip['ZIP'] = df_zip['ZIP'].astype(int)

In [115]:
df_injuries = pd.DataFrame(df18.groupby(['ZIP CODE'])['NUMBER OF PERSONS INJURED'].sum().reset_index().values, columns=["ZIP", "INJURIES"])
df_injuries['ZIP'] = df_injuries['ZIP'].astype(int)

To create a resulting dataframe we merge two dataframes with number of accidents and injuries based on ZIP code.

In [116]:
df_result = pd.merge(df_injuries, df_zip, on='ZIP')

In [117]:
df_result.head()

Unnamed: 0,ZIP,INJURIES,ACCIDENTS
0,10000,14.0,47.0
1,10001,168.0,1408.0
2,10002,269.0,1314.0
3,10003,165.0,754.0
4,10004,30.0,217.0


Now we can start to work with Folium library. 

To map out the data by ZIP code in Folium, we’ll need a GeoJSON to represent the boundaries of each ZIP code. It can be found in open data sources such as http://data.beta.nyc 

In [118]:
with urllib.request.urlopen("http://data.beta.nyc//dataset/3bf5fb73-edb5-4b05-bb29-7c95f4a727fc/resource/6df127b1-6d04-4bb7-b983-07402a2c3f90/download/f4129d9aa6dd4281bc98d0f701629b76nyczipcodetabulationareas.geojson") as url:
    data = json.loads(url.read().decode())

Then we create variable with NYC coordinates (LAT, LONG)

In [119]:
man_coordinates = (40.7218, -73.9998)

If we examine GeoJSON data we can see that identification of ZIP code areas goes under 'properties' key in 'OBJECTID' field. Therefore we need to create mapping table to find ZIP codes that corresponds to 'OBJECTID' number.

In [121]:
data['features'][0]

{'type': 'Feature',
 'properties': {'OBJECTID': 1,
  'postalCode': '11372',
  'PO_NAME': 'Jackson Heights',
  'STATE': 'NY',
  'borough': 'Queens',
  'ST_FIPS': '36',
  'CTY_FIPS': '081',
  'BLDGpostal': 0,
  '@id': 'http://nyc.pediacities.com/Resource/PostalCode/11372',
  'longitude': -73.883573184,
  'latitude': 40.751662187},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-73.86942457284177, 40.74915687096788],
    [-73.89143129977276, 40.74684466041932],
    [-73.89507143240859, 40.746465470812154],
    [-73.8961873786782, 40.74850942518088],
    [-73.8958395418514, 40.74854687570604],
    [-73.89525242774397, 40.748306609450246],
    [-73.89654041085562, 40.75054199814359],
    [-73.89579868613829, 40.75061972133262],
    [-73.89652230661434, 40.75438879610903],
    [-73.88164812188481, 40.75595161704187],
    [-73.87221855882478, 40.75694324806748],
    [-73.87167992356792, 40.75398717439604],
    [-73.8720704651389, 40.753862007052064],
    [-73.86942457284177, 40.74915687

In [123]:
zip_codes = [(data['features'][key]['properties']['OBJECTID'], data['features'][key]['properties']['postalCode']) for key in range(len(data['features']))]

In [124]:
zip_codes[:5]

[(1, '11372'), (2, '11004'), (3, '11040'), (4, '11426'), (5, '11365')]

In [125]:
zip_data = pd.DataFrame(zip_codes, columns=['CODE', 'ZIP'])

In [126]:
zip_data['ZIP'] = zip_data['ZIP'].astype(int)

And finally we create the dataframe we will use for choropleth map.

In [127]:
df_result = pd.merge(df_result, zip_data, on='ZIP')

In [154]:
df_result.head(5)

Unnamed: 0,ZIP,INJURIES,ACCIDENTS,CODE
0,10001,168.0,1408.0,114
1,10002,269.0,1314.0,124
2,10003,165.0,754.0,122
3,10004,30.0,217.0,139
4,10004,30.0,217.0,142


The choropleth map will have two layers: the first for a number of accidents per postal code area, the second for number of injuries.  

In [254]:
map = folium.Map(man_coordinates, zoom_start=12)

In [255]:
folium.Choropleth(
    geo_data=data,
    name='Accidents1',
    data=df_result,
    columns=['CODE', 'ACCIDENTS'],
    key_on='feature.properties.OBJECTID',
    fill_color='YlOrRd',
    legend_name='# Accidents',
    highlight=True,
    nan_fill_color='yellow',
    nan_fill_opacity=0.4,
    show=False
).add_to(map)

<folium.features.Choropleth at 0x1e255796fd0>

In [256]:
folium.Choropleth(
    geo_data=data,
    name='Injuries2',
    data=df_result,
    columns=['CODE', 'INJURIES'],
    key_on='feature.properties.OBJECTID',
    fill_color='YlGn',
    legend_name='# Injuries',
    highlight=True,
    nan_fill_color='yellow',
    nan_fill_opacity=0.4,
    show=False
).add_to(map)
folium.LayerControl(collapsed=False).add_to(map)

<folium.map.LayerControl at 0x1e24ffd6fd0>

In [257]:
map

In [258]:
map.save('GeoJSON_NewYork.html')