# Crashes Involving Cyclists in Raleigh

## Why this topic?

This topic was chosen because I enjoy riding bicycle a lot. However Chapel Hill seems to be more hilly and I usually visit Raleigh on weekend so this came up to my mind to find out whether it would be a good idea to do bike riding in Raleigh. As we would have assumed, when a crash occurs between a vehicle and a bike, it’s the cyclist who is most likely to be injured. Therefore, it’s only natural to want to get a clearer picture on facts about bicycle safety in Raleigh before putting action into it. Also I’m curious about the possible correlation between sun glares and the likelihood of cyclist crash occurrence.

## About this dataset

All the analysis are based on 'Dataset of Crashes Involving Cyclists' from Raleigh's Open Dataset.

Dataset link: https://data-ral.opendata.arcgis.com/datasets/crashes-involving-cyclists/explore?location=35.797487%2C-78.624284%2C10.72

Click on the above link to access ‘Crashes Involving Cyclists’ on Raleigh Open data website. Click on the ‘download’ button on the left then select download as ‘csv’ file.

Columns used (column name/column):
DateOfCrash/E, LocationRelationToRoad/G, TrafficControlType/V, Crash_Date_Day/AA, Crash_Date_DOW/AB, Crash_Date_Hour/AD, Crash_Date_Month_Num/AF, Crash_Date_Year/AG, killed/AN, type_a_injury/AO, type_b_injury/AP, type_c_injury/AQ, no_injury/AR, injury_unknown/AS, LocationLatitude/AT, LocationLongitude/AU

## About the Data Analysis and Visualization

This data-set only has partial data for year 2015 and year 2021, thus year 2015 and 2021 are excluded from analysis in this report due to incomplete coverage.
All data analysis and visualization are conducted using Python.

## The expected outcomes

1. Trend of Crash Occurrence in Raleigh over the years and what information could be inferred from the trend;

2. Crash vs. Traffic Control Type - Is crash rate is relevant to traffic control type;

3. Crash Severity & Injury Levels - How bad are the crashes involving cyclists in Raleigh? What are fatality rate and the injury levels?

4. Crash Heat-map - Where do most of the crashes take place? What information could be inferred from the geographical locations；

5. Crash vs. Time of the day - when do more crashes happen in the day and what can be inferred.

## Analysis and Figures

To do any data analyzing with Python the first step is to read the source data, data cleaning, data normalization, data analysis and data visualization.
In this project, the following packages are used to support the above data processing phases:
Pandas, Seaborn, Datetime, Folium, Matplotlib.pyplot, numpy, Astral and Astral.sun

What does the data look like:

In [5]:
import pandas as pd

Crash = pd.read_csv('Crashes_Involving_Cyclists.csv')
pd.reset_option("max_columns", 55)
Crash_cleaned = Crash[(Crash['Crash_Date_Year'] != 2015) & (Crash['Crash_Date_Year'] != 2021)]
Crash_cleaned.head()

Unnamed: 0,X,Y,OBJECTID,key_crash,DateOfCrash,LocalUse,LocationRelationToRoad,LocationInNearIndicator,LocationCity,LocationCounty,...,other_person_type,unknown_person_type,killed,type_a_injury,type_b_injury,type_c_injury,no_injury,injury_unknown,LocationLatitude,LocationLongitude
70,-78.673,35.7912,24441,146665,2016/01/07 04:36:00+00,P16001025,On Roadway (Surface) / Off Roadway,In,RALEIGH,Wake,...,0.0,0,0.0,0.0,0,1,1,0,35.7912,-78.673
71,-78.6717,35.7886,24443,146667,2016/01/07 06:20:00+00,P16001054,On Roadway (Surface) / Off Roadway,In,RALEIGH,Wake,...,0.0,0,0.0,0.0,0,0,2,0,35.7886,-78.6717
72,-78.6706,35.7883,24672,146903,2016/01/11 00:01:00+00,P16001748,On Roadway (Surface) / Off Roadway,In,RALEIGH,Wake,...,0.0,0,0.0,0.0,0,0,3,0,35.7883,-78.6706
73,-78.6417,35.7805,24893,147125,2016/01/14 03:18:00+00,P16002347,On Roadway (Surface) / Off Roadway,In,RALEIGH,Wake,...,0.0,0,0.0,0.0,0,1,1,0,35.7805,-78.6417
74,-78.6872,35.7843,25892,148133,2016/01/30 21:15:00+00,P16005326,Outside Trafficway,In,RALEIGH,Wake,...,0.0,0,0.0,0.0,0,0,2,0,35.7843,-78.6872


### 1. Crash occurrence trend per month, compared by year

The idea is to have a line plot that indicates the trend of the crashes over the years (2016-2020), and it also shows the monthly figures.
To make the plot more reader-friendly, the color of the lines are set to be from light color to dark color based on year. 
Outcomes:
- The overall trend of crash concurrence went downhill on a yearly basis 
- Autumn seems to be the most dangerous season for cycling since the number of crashes peaked in September and October for most of the years. 
- There’s this significant drop on count of crashes with the darkest color line, representing 2020, which could be explained by the outburst of the pandemic.

![Trend](figures/trend.png)

### 2. Crashes vs. Traffic Control Type

2.1 Crash occurrence at different traffic control type
The purpose of analyzing traffic control type is to compare the likelihood of crashes at each type of Traffic control intersections.
The result is assumable: intersection with no control present seems to be more dangerous than the others. 
However, there’s one limitation of the data-set which makes it difficult jump to the conclusion - missing information about the total umber of each Control type intersection in Raleigh. The result is most ideally concluded by the percentage, i.e., comparison of each traffic control type on (count of crashes at each traffic control type)/(number of each traffic control type)

![traffic_control_type](figures/traffic_control_type.png)

2.2 Distribution of Accumulated contribution per Traffic Control Type
Analysis: 
- For the year of 2016 to 2019, we see that biggest distributors matches with previous graph
- Year 2020, an increase in all other distributors, and a drop with ‘No control present’
Since in year 2020, total crash number dropped tremendously, one hypothesis would be in 2020 the city has set up signals or signs at many intersections which used to be ‘No control present’ to improve the traffic safety at these intersections

![contribution_of_control_types](figures/contribution_of_control_types.png)

### 3. Crash Severity & Injury Levels

Crash Severity measurement is greatly dependent on injury levels
Injury is categorized as 5 levels: Killed; Type a Injury (severely injured); Type b Injury (injured); Type c Injury (mild injury); No Injury; Injury Unknown;
With only 3 killed cases through 2016 to 2020, the fatality rate is extremely low (0.7%). Also count of severely injured is pretty little (4.2%). No one was injured in almost half of the crashes. Based on these data, it’s reasonable to believe that, to some extend, Raleigh is a friendly and safe city for bikers.

![injury_level](figures/injury_level.png)

### 4. Crashes vs. Location to Road

Most accidents happen on or off Roadway and the occurrence of such kind is predictable

#### Limitation of the data-set

On Roadway and Off Roadway are combined together, there would for sure be more insight if these 2 are separated as different categories.

![location_to_road](figures/location_to_road.png)

### 5. Crash vs. Time

One day is divided into 4 time periods: daytime, dawn, dusk, nighttime. Let's explore which period there’s more crashes and whether there’s any pattern to it.

In general:
- From April to September, more crashes during daytime than night time
- While from mid-autumn to early Spring, it is the opposite. This does make sense since longer night time in winter.

![crash_vs_time](figures/crash_vs_time.png)