<a href="https://colab.research.google.com/github/vanepsm/cs5262-cyclist-crashes-nyc/blob/main/New_York_City_Cyclist_Accidents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Investigating cyclist accidents in New York City

## Background
### Introduction
**The objective of this project is to identify factors contributing to cyclist injuries and fatalities.**

With the increasing popularity of bicycles and e-bikes, it's imperative for drivers, cyclists, pedestrians, and policymakers to recognize the risks associated with cyclists sharing the road and take appropriate measures.

## Project Description
### Topic
Our topic will focus on motor vehicle crashes that involve cyclist injuries or deaths, and attempt to analyze these data to determine what factors are correlated with these events.

### Data
This project seeks to gain insights into vehicle accidents involving cyclists extracted from the New York City vehicle collisions dataset.
>> https://catalog.data.gov/dataset/motor-vehicle-collisions-crashes

A subset of the data pertaining only to cyclist accidents has been extracted and is available here:
>> https://drive.google.com/file/d/1CFaRXe3Y6PWHpYOoGD7qih6-1oWzUi0e/view

#### Data sub-selection criteria
The original data set contains over two million rows, and encapsulates a broad spectrum of vehicle collisions. In order to narrow this dataset down the following was done:
- Import the original CSV data to a relational database for refinement.
- Idenify a subset of rows:
 - Include only the rows with cyclist injuries or fatalities
 - Remove all rows with more than 3 vehicles involved in the crash
 - Remove all rows with a null or empty zip code
- Identify a subset of columns:
 - Exclude all columns with details about a 4th or 5th vehicle
 - Exclude duplicate latitude/longitude columns
- Export the subset data to a new CSV for use in this project.

## Performance Metrics

### Binary Classification
- **Injured vs. Killed.** I would like to classify observations into two classes. "Cyclist Injured" vs "Cyclist Killed".

### Clustering
I am hoping to extract meaningful patterns from these data, some of which may include:
- **Physical location.** Intuitively there are locations that are more dangerous for cyclists than others. Intersections, curves, congested spaces, etc...
- **Time.** It seems likely that certain times will see more accidents than others.
- **Types of vehicles involved.** It's possible that certain vehicle types will result in more accidents overall, or perhaps result in death more often than injury.
- **Contributing factors for the accident.** What types of contributing factors are more likely to result in injury or death? Distracted driving, improper use of passing lanes or alcohol intoxication?

In [1]:
#tables and visualizations
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Load data
Here we will load the data into python using pandas and read it in as a dataframe.

In [6]:
url = 'https://drive.google.com/file/d/1CFaRXe3Y6PWHpYOoGD7qih6-1oWzUi0e/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
spreadsheet = pd.read_csv(url)
display(spreadsheet.head())


Unnamed: 0,crash-date,crash-time,zip-code,latitude,longitude,on-street-name,cross-street-name,off-street-name,persons-injured,persons-killed,...,cyclists-killed,motorists-injured,motorists-killed,contributing-factor-vehicle-1,contributing-factor-vehicle-2,contributing-factor-vehicle-3,collision-id,vehicle-type-code-1,vehicle-type-code-2,vehicle-type-code-3
0,2021-12-14,12:54,11217,40.687534,-73.9775,FULTON STREET,SAINT FELIX STREET,,1,0,...,0,0,0,Unspecified,Unspecified,,4487052,Sedan,Bike,
1,2022-04-24,15:35,10019,40.767242,-73.986206,WEST 56 STREET,9 AVENUE,,1,0,...,0,0,0,View Obstructed/Limited,Unspecified,,4521853,Station Wagon/Sport Utility Vehicle,Bike,
2,2021-12-09,23:15,11218,40.640835,-73.98967,12 AVENUE,41 STREET,,1,0,...,0,0,0,Driver Inattention/Distraction,Driver Inattention/Distraction,,4485355,Sedan,Bike,
3,2021-12-08,19:30,10022,40.76175,-73.96899,,,127 EAST 58 STREET,1,0,...,0,0,0,Following Too Closely,Reaction to Uninvolved Vehicle,,4484852,Station Wagon/Sport Utility Vehicle,Bike,
4,2021-12-08,12:00,10011,40.736614,-73.9951,,,44 WEST 14 STREET,1,0,...,0,0,0,Passing or Lane Usage Improper,Unspecified,,4485542,Box Truck,Bike,
