# crash_01_data_wrangling_with_shst

This notebook is for mapping NYC crash dataset on Sharedstreet geometry. It contains three chapters like below:

- **0. Pre-processing & making small batches**: Because the processed vehicle collision dataset is too big to apply Sharedstreet API in the local machine, we will divide the dataset into small batches to solve the problem. Each batch will contain less than 60000 crash records.
<br>

- **1. Data wrangling with Sharedstreets API**: This chapter should be conducted outside of this notebook. We included an URL of a pdf file 'how to use Sharedstreet API with Docker' 
<br>

- **2. Processing Sharedstreet results** : This chapter is for processing the results of 'chapter 1' and merging the results files into one geojson file. 

## 0. Pre-processing & making small batches

In [2]:
# import libraries
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.geometry import Point

In [3]:
# make sure that you run the 'crash_00_data_wrangling' notebook to get the '511_mv_collisions.csv'
# import the crash dataset
gdf_crash = pd.read_csv('../data/cleaned_data/511_mv_collisions.csv')

In [4]:
# make a list of crash locations. we will use this as a 'geometry' column in GeoDataFrame
points = [Point(x,y) for x,y in zip(gdf_crash['longitude'],gdf_crash['latitude'])]

In [5]:
# convert pd.DataFrame into gpd.GeoDataFrame
gdf_crash = gpd.GeoDataFrame(gdf_crash, geometry=points)
gdf_crash['crash_date'] = pd.to_datetime(gdf_crash['crash_date']) 

In [6]:
# extract 'year'. the small batches will be made based on this column
gdf_crash['year'] = gdf_crash['crash_date'].apply(lambda x: x.year)

In [7]:
gdf_crash.head(3)

Unnamed: 0,crash_date,crash_time,borough,zip_code,latitude,longitude,location,on_street_name,cross_street_name,off_street_name,...,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5,geometry,index_right,boro_code,boro_name,shape_area,shape_leng,year
0,2016-10-01,20:20,manhattan,10038.0,40.711567,-74.00774,POINT (-74.00774 40.711567),,,20 park row,...,,,,POINT (-74.00774 40.71157),4,1.0,Manhattan,944294600.0,203803.483188,2016
1,2016-10-01,1:40,,,40.654984,-74.00711,POINT (-74.00711 40.654984),gowanus expy (bqe),,,...,,,,POINT (-74.00711 40.65498),2,3.0,Brooklyn,2684410000.0,234924.030131,2016
2,2016-10-01,22:30,manhattan,10032.0,40.837803,-73.94215,POINT (-73.94215 40.837803),west 163 street,broadway,,...,,,,POINT (-73.94215 40.83780),4,1.0,Manhattan,944294600.0,203803.483188,2016


In [8]:
gdf_crash['year'].unique()

array([2016, 2017, 2018, 2019, 2020], dtype=int64)

In [9]:
# simplify the dataset
gdf_crash = gdf_crash.loc[:,['collision_id','year','geometry','crash_date']]

## 2016

In [9]:
# filter dataset 
gdf_crash_2016 = gdf_crash.loc[gdf_crash['year']==2016]

In [10]:
gdf_crash_2016.shape

(55097, 4)

In [11]:
gdf_crash_2016 = gdf_crash_2016.to_file('../data/sharedstreets_results/crash/2016/before_applying/gdf_crash_2016.geojson',
                              driver='GeoJSON')

## 2017

In [11]:
# filter dataset
gdf_crash_2017 = gdf_crash.loc[gdf_crash['year']==2017]

In [12]:
gdf_crash_2017.shape

(216863, 4)

In [13]:
# divde 2017 crash dataframe into small batches
gdf_crash_2017_0_60000 = gdf_crash_2017.iloc[:60000,:]
gdf_crash_2017_60000_120000 = gdf_crash_2017.iloc[60000:120000,:]
gdf_crash_2017_120000_180000 = gdf_crash_2017.iloc[120000:180000,:]
gdf_crash_2017_180000_ = gdf_crash_2017.iloc[180000:,:]

In [14]:
# export the 2017 crash datasets. we will re-use this in the Chapter 2 
gdf_crash_2017.to_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017.geojson',
                       driver='GeoJSON')

In [15]:
# export the small batches. we will use this in the Chapter 1
gdf_crash_2017_0_60000.to_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017_0_60000.geojson',
                              driver='GeoJSON')
gdf_crash_2017_60000_120000.to_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017_60000_120000.geojson',
                              driver='GeoJSON')
gdf_crash_2017_120000_180000.to_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017_120000_180000.geojson',
                              driver='GeoJSON')
gdf_crash_2017_180000_.to_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017_180000_.geojson',
                              driver='GeoJSON')

## 2018

The process is same as what we did for 2017

In [120]:
gdf_crash_2018 = gdf_crash.loc[gdf_crash['year']==2018]

In [121]:
gdf_crash_2018.shape

(216046, 4)

In [114]:
gdf_crash_2018.to_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018.geojson',
                       driver='GeoJSON')

In [22]:
gdf_crash_2018_0_60000 = gdf_crash_2018.iloc[:60000,:]
gdf_crash_2018_60000_120000 = gdf_crash_2018.iloc[60000:120000,:]
gdf_crash_2018_120000_180000 = gdf_crash_2018.iloc[120000:180000,:]
gdf_crash_2018_180000_ = gdf_crash_2018.iloc[180000:,:]

In [23]:
gdf_crash_2018_0_60000.to_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018_0_60000.geojson',
                              driver='GeoJSON')
gdf_crash_2018_60000_120000.to_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018_60000_120000.geojson',
                              driver='GeoJSON')
gdf_crash_2018_120000_180000.to_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018_120000_180000.geojson',
                              driver='GeoJSON')
gdf_crash_2018_180000_.to_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018_180000_.geojson',
                              driver='GeoJSON')

## 2019

The process is the same as what we did for 2017 and 2018. But We will remove one crash record that will occur an error in Chapter 1. Also, we filtered the dataset by month to match the temporal range of the dataset with the 511 event dataset.

In [159]:
gdf_crash_2019 = gdf_crash.loc[gdf_crash['year']==2019]

In [160]:
gdf_crash_2019['month'] = gdf_crash_2019['crash_date'].apply(lambda x:x.month)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gdf_crash_2019['month'] = gdf_crash_2019['crash_date'].apply(lambda x:x.month)


In [161]:
gdf_crash_2019 = gdf_crash_2019.loc[gdf_crash_2019['month']<=10]

In [163]:
# this collision makes an error so I will delete it.
gdf_crash_2019 = gdf_crash_2019.loc[gdf_crash_2019['collision_id']!= 4122407]

In [166]:
gdf_crash_2019_0_60000 = gdf_crash_2019.iloc[:60000,:]
gdf_crash_2019_60000_120000 = gdf_crash_2019.iloc[60000:120000,:]
gdf_crash_2019_120000_ = gdf_crash_2019.iloc[120000:,:]

In [168]:
gdf_crash_2019.to_file('../data/sharedstreets_results/crash/2019/before_applying/gdf_crash_2019.geojson',
                        driver='GeoJSON')

In [167]:
gdf_crash_2019_0_60000.to_file('../data/sharedstreets_results/crash/2019/before_applying/gdf_crash_2019_0_60000.geojson',
                              driver='GeoJSON')
gdf_crash_2019_60000_120000.to_file('../data/sharedstreets_results/crash/2019/before_applying/gdf_crash_2019_60000_120000.geojson',
                              driver='GeoJSON')
gdf_crash_2019_120000_.to_file('../data/sharedstreets_results/crash/2019/before_applying/gdf_crash_2019_120000_.geojson',
                              driver='GeoJSON')

## 1. Processing with SharedStreet API

Please check 'how_to_use_sharedstreets_api' document. We will use search-radius=40m to map crash records on the Sharedstreet geometry.

## 2. Processing sharedstreet result

Because 'matched' geojson files don't contain 'collision_id' and the other attributes, we need to process the result to get attributes of matched crashes results. We will drop unmatched collisions from the entire crash dataset, and concatenate geometries of matached crash events to it.

### 2016

In [3]:
# make sure that the result files that imported in below were generated by Sharedstreets API
# import geojson files that generated by Sharedstreets API
gdf_crash_2016 = gpd.read_file('../data/sharedstreets_results/crash/2016/before_applying/gdf_crash_2016.geojson')
gdf_crash_2016_matched_result = gpd.read_file('../data/sharedstreets_results/crash/2016/radius40/gdf_crash_2016.matched.geojson')
gdf_crash_2016_unmatched_result = gpd.read_file('../data/sharedstreets_results/crash/2016/radius40/gdf_crash_2016.unmatched.geojson')

In [4]:
# extract list of unmatched crahses
list_crash_2016_unmatched = gdf_crash_2016_unmatched_result['collision_id'].tolist()

In [5]:
# drop unmatched crashes from entire 2016 crash dataset
gdf_crash_2016_matched = gdf_crash_2016.loc[~gdf_crash_2016['collision_id'].isin(list_crash_2016_unmatched)]

In [6]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2016_matched = gdf_crash_2016_matched.reset_index().drop('index', axis=1)
gdf_crash_2016_matched_result = gdf_crash_2016_matched_result.reset_index().drop('index', axis=1)

In [7]:
# get geometries from the matched crash geojson file
gdf_crash_2016_matched['geometry'] = gdf_crash_2016_matched_result['geometry']

In [10]:
# rename a column
gdf_crash_2016_matched['geometry_id'] = gdf_crash_2016_matched_result['geometryId']

### 2017

Process is same as 2016

In [11]:
# make sure that the result files that imported in below were generated by Sharedstreets API
# import geojson files that generated by Sharedstreets API
gdf_crash_2017 = gpd.read_file('../data/sharedstreets_results/crash/2017/before_applying/gdf_crash_2017.geojson')
gdf_crash_2017_matched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_0_60000.matched.geojson')
gdf_crash_2017_matched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_60000_120000.matched.geojson')
gdf_crash_2017_matched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_120000_180000.matched.geojson')
gdf_crash_2017_matched_result3 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_180000_.matched.geojson')

In [12]:
# import geojson files that generated by Sharedstreets API
gdf_crash_2017_unmatched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_0_60000.unmatched.geojson')
gdf_crash_2017_unmatched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_60000_120000.unmatched.geojson')
gdf_crash_2017_unmatched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_120000_180000.unmatched.geojson')
gdf_crash_2017_unmatched_result3 = gpd.read_file('../data/sharedstreets_results/crash/2017/radius40/gdf_crash_2017_180000_.unmatched.geojson')

In [13]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2017_matched_result0 = gdf_crash_2017_matched_result0.reset_index().drop('index', axis=1)
gdf_crash_2017_matched_result1 = gdf_crash_2017_matched_result1.reset_index().drop('index', axis=1)
gdf_crash_2017_matched_result2 = gdf_crash_2017_matched_result2.reset_index().drop('index', axis=1)
gdf_crash_2017_matched_result3 = gdf_crash_2017_matched_result3.reset_index().drop('index', axis=1)

In [14]:
# merge matched crash datasets
gdf_crash_2017_matched_result = pd.concat([gdf_crash_2017_matched_result0,
                                           gdf_crash_2017_matched_result1,
                                           gdf_crash_2017_matched_result2,
                                           gdf_crash_2017_matched_result3])

In [15]:
# reset index of unmatched crashes to concatenate dataset
gdf_crash_2017_unmatched_result0 = gdf_crash_2017_unmatched_result0.reset_index().drop('index', axis=1)
gdf_crash_2017_unmatched_result1 = gdf_crash_2017_unmatched_result1.reset_index().drop('index', axis=1)
gdf_crash_2017_unmatched_result2 = gdf_crash_2017_unmatched_result2.reset_index().drop('index', axis=1)
gdf_crash_2017_unmatched_result3 = gdf_crash_2017_unmatched_result3.reset_index().drop('index', axis=1)

In [16]:
# merge unmatched crash datasets
gdf_crash_2017_unmatched_result = pd.concat([gdf_crash_2017_unmatched_result0,
                                             gdf_crash_2017_unmatched_result1,
                                             gdf_crash_2017_unmatched_result2,
                                             gdf_crash_2017_unmatched_result3])

In [17]:
# extract list of unmatched crahses
list_crash_2017_unmatched = gdf_crash_2017_unmatched_result['collision_id'].tolist()

In [18]:
# drop unmatched crashes from entire 2017 crash dataset
gdf_crash_2017_matched = gdf_crash_2017.loc[~gdf_crash_2017['collision_id'].isin(list_crash_2017_unmatched)]

In [19]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2017_matched = gdf_crash_2017_matched.reset_index().drop('index', axis=1)
gdf_crash_2017_matched_result = gdf_crash_2017_matched_result.reset_index().drop('index', axis=1)

In [20]:
# get geometries from the matched crash geojson file
gdf_crash_2017_matched['geometry'] = gdf_crash_2017_matched_result['geometry']

In [21]:
# rename a column
gdf_crash_2017_matched['geometry_id'] = gdf_crash_2017_matched_result['geometryId']

### 2018

Process is same as 2017

In [22]:
# make sure that the result files that imported in below were generated by Sharedstreets API
# import geojson files that generated by Sharedstreets API
gdf_crash_2018 = gpd.read_file('../data/sharedstreets_results/crash/2018/before_applying/gdf_crash_2018.geojson')
gdf_crash_2018_matched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_0_60000.matched.geojson')
gdf_crash_2018_matched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_60000_120000.matched.geojson')
gdf_crash_2018_matched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_120000_180000.matched.geojson')
gdf_crash_2018_matched_result3 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_180000_.matched.geojson')

In [23]:
# import geojson files that generated by Sharedstreets API
gdf_crash_2018_unmatched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_0_60000.unmatched.geojson')
gdf_crash_2018_unmatched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_60000_120000.unmatched.geojson')
gdf_crash_2018_unmatched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_120000_180000.unmatched.geojson')
gdf_crash_2018_unmatched_result3 = gpd.read_file('../data/sharedstreets_results/crash/2018/radius40/gdf_crash_2018_180000_.unmatched.geojson')

In [24]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2018_matched_result0 = gdf_crash_2018_matched_result0.reset_index().drop('index', axis=1)
gdf_crash_2018_matched_result1 = gdf_crash_2018_matched_result1.reset_index().drop('index', axis=1)
gdf_crash_2018_matched_result2 = gdf_crash_2018_matched_result2.reset_index().drop('index', axis=1)
gdf_crash_2018_matched_result3 = gdf_crash_2018_matched_result3.reset_index().drop('index', axis=1)

In [25]:
# merge matched crash datasets
gdf_crash_2018_matched_result = pd.concat([gdf_crash_2018_matched_result0,
                                           gdf_crash_2018_matched_result1,
                                           gdf_crash_2018_matched_result2,
                                           gdf_crash_2018_matched_result3])

In [26]:
# reset index of unmatched crashes to concatenate dataset
gdf_crash_2018_unmatched_result0 = gdf_crash_2018_unmatched_result0.reset_index().drop('index', axis=1)
gdf_crash_2018_unmatched_result1 = gdf_crash_2018_unmatched_result1.reset_index().drop('index', axis=1)
gdf_crash_2018_unmatched_result2 = gdf_crash_2018_unmatched_result2.reset_index().drop('index', axis=1)
gdf_crash_2018_unmatched_result3 = gdf_crash_2018_unmatched_result3.reset_index().drop('index', axis=1)

In [27]:
# merge unmatched crash datasets
gdf_crash_2018_unmatched_result = pd.concat([gdf_crash_2018_unmatched_result0,
                                             gdf_crash_2018_unmatched_result1,
                                             gdf_crash_2018_unmatched_result2,
                                             gdf_crash_2018_unmatched_result3])

In [28]:
# extract list of unmatched crahses
list_crash_2018_unmatched = gdf_crash_2018_unmatched_result['collision_id'].tolist()

In [29]:
# drop unmatched crashes from entire 2018 crash dataset
gdf_crash_2018_matched = gdf_crash_2018.loc[~gdf_crash_2018['collision_id'].isin(list_crash_2018_unmatched)]

In [30]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2018_matched = gdf_crash_2018_matched.reset_index().drop('index', axis=1)
gdf_crash_2018_matched_result = gdf_crash_2018_matched_result.reset_index().drop('index', axis=1)

In [31]:
# get geometries from the matched crash geojson file
gdf_crash_2018_matched['geometry'] = gdf_crash_2018_matched_result['geometry']

In [32]:
# rename a column
gdf_crash_2018_matched['geometry_id'] = gdf_crash_2018_matched_result['geometryId']

### 2019

Process is same as 2018

In [33]:
# make sure that the result files that imported in below were generated by Sharedstreets API
# import geojson files that generated by Sharedstreets API
gdf_crash_2019 = gpd.read_file('../data/sharedstreets_results/crash/2019/before_applying/gdf_crash_2019.geojson')
gdf_crash_2019_matched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_0_60000.matched.geojson')
gdf_crash_2019_matched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_60000_120000.matched.geojson')
gdf_crash_2019_matched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_120000_.matched.geojson')

In [34]:
# import geojson files that generated by Sharedstreets API
gdf_crash_2019_unmatched_result0 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_0_60000.unmatched.geojson')
gdf_crash_2019_unmatched_result1 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_60000_120000.unmatched.geojson')
gdf_crash_2019_unmatched_result2 = gpd.read_file('../data/sharedstreets_results/crash/2019/radius40/gdf_crash_2019_120000_.unmatched.geojson')

In [35]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2019_matched_result0 = gdf_crash_2019_matched_result0.reset_index().drop('index', axis=1)
gdf_crash_2019_matched_result1 = gdf_crash_2019_matched_result1.reset_index().drop('index', axis=1)
gdf_crash_2019_matched_result2 = gdf_crash_2019_matched_result2.reset_index().drop('index', axis=1)

In [36]:
# merge matched crash datasets
gdf_crash_2019_matched_result = pd.concat([gdf_crash_2019_matched_result0,
                                           gdf_crash_2019_matched_result1,
                                           gdf_crash_2019_matched_result2])

In [37]:
# reset index of unmatched crashes to concatenate dataset
gdf_crash_2019_unmatched_result0 = gdf_crash_2019_unmatched_result0.reset_index().drop('index', axis=1)
gdf_crash_2019_unmatched_result1 = gdf_crash_2019_unmatched_result1.reset_index().drop('index', axis=1)
gdf_crash_2019_unmatched_result2 = gdf_crash_2019_unmatched_result2.reset_index().drop('index', axis=1)

In [38]:
# merge unmatched crash datasets
gdf_crash_2019_unmatched_result = pd.concat([gdf_crash_2019_unmatched_result0,
                                             gdf_crash_2019_unmatched_result1,
                                             gdf_crash_2019_unmatched_result2])

In [39]:
# extract list of unmatched crahses
list_crash_2019_unmatched = gdf_crash_2019_unmatched_result['collision_id'].tolist()

In [40]:
# drop unmatched crashes from entire 2019 crash dataset
gdf_crash_2019_matched = gdf_crash_2019.loc[~gdf_crash_2019['collision_id'].isin(list_crash_2019_unmatched)]

In [41]:
# reset index of matched crashes to concatenate dataset
gdf_crash_2019_matched = gdf_crash_2019_matched.reset_index().drop('index', axis=1)
gdf_crash_2019_matched_result = gdf_crash_2019_matched_result.reset_index().drop('index', axis=1)

In [42]:
# get geometries from the matched crash geojson file
gdf_crash_2019_matched['geometry'] = gdf_crash_2019_matched_result['geometry']

In [43]:
# rename a column
gdf_crash_2019_matched['geometry_id'] = gdf_crash_2019_matched_result['geometryId']

## Merge crashes

In [44]:
# merge crash dataset of 2016,2017,2018 and 2019
gdf_matched_crashes = pd.concat([gdf_crash_2016_matched[['collision_id','geometry','geometry_id']],
                                 gdf_crash_2017_matched[['collision_id','geometry','geometry_id']],
                                 gdf_crash_2018_matched[['collision_id','geometry','geometry_id']],
                                 gdf_crash_2019_matched[['collision_id','geometry','geometry_id']]])           

In [45]:
# re-import previous dataset
df_crash = pd.read_csv('../data/cleaned_data/511_mv_collisions.csv')

In [46]:
# drop unnecessary columns and geometry column. 
# so this dataframe will contain necessary characteristics of crash events and its collision id
df_crash_wo_geometry = df_crash.drop(['longitude',
                                      'latitude',
                                      'geometry',
                                      'location',
                                      'index_right',
                                      'boro_code',
                                      'boro_name',
                                      'shape_area',
                                      'shape_leng'], axis=1)

In [47]:
# merge dataframe of matched crashes and the dataframe of necessary characteristics that was created in above cell
gdf_matched_crashes = gdf_matched_crashes.merge(df_crash_wo_geometry, left_on='collision_id', right_on='collision_id')

In [48]:
# change type of dataframe as GeoDataFrame
gdf_matched_crashes = gpd.GeoDataFrame(gdf_matched_crashes, geometry='geometry')

In [49]:
gdf_matched_crashes.head()

Unnamed: 0,collision_id,geometry,geometry_id,crash_date,crash_time,borough,zip_code,on_street_name,cross_street_name,off_street_name,...,contributing_factor_vehicle_1,contributing_factor_vehicle_2,contributing_factor_vehicle_3,contributing_factor_vehicle_4,contributing_factor_vehicle_5,vehicle_type_code_1,vehicle_type_code_2,vehicle_type_code_3,vehicle_type_code_4,vehicle_type_code_5
0,3531327,POINT (-74.00772 40.71152),ba4520777941a56b87f97a1d35dc2e20,2016-10-01,20:20,manhattan,10038.0,,,20 park row,...,following too closely,unspecified,,,,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
1,3530538,POINT (-74.00712 40.65499),da0bde3c3c147e230387851d1679e6bc,2016-10-01,1:40,,,gowanus expy (bqe),,,...,steering failure,,,,,PASSENGER VEHICLE,,,,
2,3534839,POINT (-73.94216 40.83779),aff3547e8a39ef2c07ad433655ca4d61,2016-10-01,22:30,manhattan,10032.0,west 163 street,broadway,,...,unspecified,,,,,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
3,3530778,POINT (-73.82967 40.76188),bffd718d6de85e2d72cef0de38afde42,2016-10-01,19:00,queens,11354.0,37 avenue,138 street,,...,failure to yield right-of-way,failure to yield right-of-way,,,,SPORT UTILITY / STATION WAGON,SPORT UTILITY / STATION WAGON,,,
4,3534283,POINT (-73.92940 40.65194),f95810bff51c60f287ea3a5eaf6923cc,2016-10-01,11:10,brooklyn,11203.0,church avenue,east 51 street,,...,driver inattention/distraction,driver inattention/distraction,,,,PASSENGER VEHICLE,PASSENGER VEHICLE,,,


In [50]:
# export the dataset
gdf_matched_crashes.to_file('../data/cleaned_data/mv_collisions_shst_matched.geojson', driver='GeoJSON')