# IBM Applied Data Science Capstone Project - Car Accident Severity
### By Yazan Hassan


## Introduction

The global epidemic of road crash fatalities and disabilities is gradually being recognized as a major public health concern. The first step to being informed about global road safety and to developing effective road safety interventions is to have access to facts. Approximately 1.35 million people die in road crashes each year, with the daily global death toll being about 3,700 people. An additional 20-50 million suffer non-fatal injuries every year, often resulting in long-term disabilities (https://www.asirt.org/safe-travel/road-safety-facts/).

Accidents that occur in junctions can be slight, fatal and serious. As such, a good mitigation tactic is to be able to reduce the risk of severity before accidents occur. In an effort to reduce the frequency of car collisions in a community, an algorithm can be developed to predict the severity of an accident given the weather, road and visibility conditions. When conditions are unfavourable, the model will alert drivers to remind them to be more careful or to take a safer route. The aim of this project is to build a model that can predict the severity of accidents in junctions around different types of address block and collision types using attributes such as weather and light conditions. For this model, Seattle will be used as a test case. 

### Target Audience

The target audiences of the project are local Seattle government, police, rescue groups, and car insurance institutes. The model and its results can provide some advice for the target audiences to make insightful decisions for reducing the number of accidents and injuries in the city. For instance, this project can help local authorities discover the address type where severity of a road accident is serious due to lack of road lighting, and subsequently apply new road safety measures. The Seattle government can prevent avoidable car accidents by employing methods that alert drivers, health system, and police to remind them to be more careful in critical situations.


## Data
The dataset has information gathered on the road traffic accidents of Seattle City. The initial dataset consists of 38 columns (features/attributes) and 194673 rows.  The data was collected by the Seattle Police Department and Accident Traffic Records Department from 2004 to present. The data consists of 37 independent variables and 194,673 rows. The dependent variable, “SEVERITYCODE”, contains numbers that correspond to different levels of severity caused by an accident from 0 to 4.

Python packages will be used to conduct this study. The dataset will be cleaned according to the requirements of this project. Furthermore, because of the existence of null values in some records, the data needs to be preprocessed before any further processing.

The features used will be: • SEVERITYCODE • ADDRTYPE • JUNCTIONTYPE • SDOT_COLDESC (Description of the collision) • WEATHER • LIGHTCOND

The data used for this study is given by the Applied Data Science Capstone course on Coursera via the following link https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv.


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

### Import and explore data

In [2]:
df = pd.read_csv("Data-Collisions.csv")
df

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.703140,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.334540,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194668,2,-122.290826,47.565408,219543,309534,310814,E871089,Matched,Block,,...,Dry,Daylight,,,,24,From opposite direction - both moving - head-on,0,0,N
194669,1,-122.344526,47.690924,219544,309085,310365,E876731,Matched,Block,,...,Wet,Daylight,,,,13,From same direction - both going straight - bo...,0,0,N
194670,2,-122.306689,47.683047,219545,311280,312640,3809984,Matched,Intersection,24760.0,...,Dry,Daylight,,,,28,From opposite direction - one left turn - one ...,0,0,N
194671,2,-122.355317,47.678734,219546,309514,310794,3810083,Matched,Intersection,24349.0,...,Dry,Dusk,,,,5,Vehicle Strikes Pedalcyclist,4308,0,N


In [3]:
# Shape of dataframe
df.shape

(194673, 38)

In [5]:
# Explore the columns of the dataframe
df.columns

Index(['SEVERITYCODE', 'X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO',
       'STATUS', 'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE',
       'EXCEPTRSNDESC', 'SEVERITYCODE.1', 'SEVERITYDESC', 'COLLISIONTYPE',
       'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INCDATE',
       'INCDTTM', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC',
       'INATTENTIONIND', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND',
       'PEDROWNOTGRNT', 'SDOTCOLNUM', 'SPEEDING', 'ST_COLCODE', 'ST_COLDESC',
       'SEGLANEKEY', 'CROSSWALKKEY', 'HITPARKEDCAR'],
      dtype='object')

### Data Wrangling

From the above attributes, we can see that many of them are irrelevant. As such, irrelevant attributes of data will be expunged from further consideration. We are only interested in the following attributes:

* SEVERITYCODE, ADDRTYPE, JUNCTIONTYPE, SDOT_COLDESC, WEATHER, LIGHTCOND 

More info can be found on the attributes here: https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf

As a reminder, in SEVERITYCODE:
* 3 — fatality
* 2b — serious injury
* 2 — injury
* 1 — property damage only
* 0 — unknown

In [6]:
# Drop irrelevant attribures
df.drop(df.columns.difference(['SEVERITYDESC', 'ADDRTYPE', 'JUNCTIONTYPE', 'SDOT_COLDESC', 'WEATHER', 'LIGHTCOND']), axis=1, inplace=True)
df.head()

Unnamed: 0,ADDRTYPE,SEVERITYDESC,JUNCTIONTYPE,SDOT_COLDESC,WEATHER,LIGHTCOND
0,Intersection,Injury Collision,At Intersection (intersection related),"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END ...",Overcast,Daylight
1,Block,Property Damage Only Collision,Mid-Block (not related to intersection),"MOTOR VEHICLE STRUCK MOTOR VEHICLE, LEFT SIDE ...",Raining,Dark - Street Lights On
2,Block,Property Damage Only Collision,Mid-Block (not related to intersection),"MOTOR VEHICLE STRUCK MOTOR VEHICLE, REAR END",Overcast,Daylight
3,Block,Property Damage Only Collision,Mid-Block (not related to intersection),"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END ...",Clear,Daylight
4,Intersection,Injury Collision,At Intersection (intersection related),"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END ...",Raining,Daylight
