<h2> Understanding Accidents - Natural v/s Human Factors </h2>
<h3> A Data Science Capstone Project </h3>
<h4> By Vihang Karekar </h4>

<h4> Business Understanding </h4>
    <p style="font-family: Calibri; font-style: normal;">The purpose of this capstone project is to understand and compare the impact of natural and human factors in any car accident. Also, do these factors contribute specifically to the severity of an accident. This analysis will hopefully help policymakers to make appropriate educated choices in policymaking to help prevent accidents in the future.</p>
    <p style="font-family: Calibri; font-style: normal;">Any accident has several contributing factors. Some of them are natural factors like heavy rain, snow, storm, etc. These can significantly affect driving conditions and can lead to accidents. Other factors that can cause accidents are human errors. The person driving can be under influence of alcohol or can be speeding, etc. When we have historical records of accidents with both natural and human factors noted, we can develop a model to understand the impact of these natural and human factors in car accidents. This can be used to predict the risk and severity of accidents given a set of factors in the future.</p>
    <p style="font-family: Calibri; font-style: normal;">For this capstone project, we will use the dataset of collision data in the Seattle area shared by the Seattle gov from 2004 to the present. This dataset has comprehensive historical data of collisions and the factors contributing to the accidents. Thus, this dataset is a good resource to develop our understanding of factors contributing to the collisions and to develop models to predict the risk of collisions in the future.</p>

<h4> Data Section </h4>
<p style="font-family: Calibri; font-style: normal;">Data Link: https://opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv </p>
<p style="font-family: Calibri; font-style: normal;">This dataset contains a sizeable historical data of accidents and the factors, both natural and human, associated with an accident. Several data points are of interest to us in this dataset.</p>
    <p style="font-family: Calibri; font-style: normal;">The 'SEVERITYCODE' is of utmost importance. This column will be used to label the historical accidents. We will use the existing dataset to develop a model that will be trained and tested to predict the severity of accidents. i.e. 'SEVERITYCODE' is our 'Y'. </p>
    <p style="font-family: Calibri; font-style: normal;">We select factors like 'WEATHER', 'ROADCOND', and 'LIGHTCOND' as natural factors during any accident. The model will try to analyze the impact of these natural factors on the severity of accidents.</p>
    <p style="font-family: Calibri; font-style: normal;">We select factors like 'INATTENTIONIND', 'UNDERINFL', 'PEDROWNOTGRNT', and 'SPEEDING' as human factors during an accident. The model will try also to analyze the impact of these human factors on the severity of accidents.</p>
    <p style="font-family: Calibri; font-style: normal;">We may make use of some additional data points like 'ADDRTYPE', 'SEVERITYDESC', and 'COLLISIONTYPE' to get a better understanding of the data.</p>

In [2]:
#Imports
import pandas as pd
import numpy as np

In [3]:
# Read Dataset using Pandas
coll_pd=pd.read_csv("https://opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv")
coll_pd.shape

(221738, 40)

In [4]:
coll_pd.head()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,LOCATION,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,-122.356511,47.517361,1,327920,329420,3856094,Matched,Intersection,34911.0,17TH AVE SW AND SW ROXBURY ST,...,Dry,Daylight,,,,10,Entering at angle,0,0,N
1,-122.361405,47.702064,2,46200,46200,1791736,Matched,Block,,HOLMAN RD NW BETWEEN 4TH AVE NW AND 3RD AVE NW,...,Wet,Dusk,,5101020.0,,13,From same direction - both going straight - bo...,0,0,N
2,-122.317414,47.664028,3,1212,1212,3507861,Matched,Block,,ROOSEVELT WAY NE BETWEEN NE 47TH ST AND NE 50T...,...,Dry,Dark - Street Lights On,,,,30,From opposite direction - all others,0,0,N
3,-122.318234,47.619927,4,327909,329409,EA03026,Matched,Intersection,29054.0,11TH AVE E AND E JOHN ST,...,Wet,Dark - Street Lights On,,,,0,Vehicle going straight hits pedestrian,0,0,N
4,-122.351724,47.560306,5,104900,104900,2671936,Matched,Block,,WEST MARGINAL WAY SW BETWEEN SW ALASKA ST AND ...,...,Ice,Dark - Street Lights On,,9359012.0,Y,50,Fixed object,0,0,N


In [5]:
coll_pd.dtypes

X                  float64
Y                  float64
OBJECTID             int64
INCKEY               int64
COLDETKEY            int64
REPORTNO            object
STATUS              object
ADDRTYPE            object
INTKEY             float64
LOCATION            object
EXCEPTRSNCODE       object
EXCEPTRSNDESC       object
SEVERITYCODE        object
SEVERITYDESC        object
COLLISIONTYPE       object
PERSONCOUNT          int64
PEDCOUNT             int64
PEDCYLCOUNT          int64
VEHCOUNT             int64
INJURIES             int64
SERIOUSINJURIES      int64
FATALITIES           int64
INCDATE             object
INCDTTM             object
JUNCTIONTYPE        object
SDOT_COLCODE       float64
SDOT_COLDESC        object
INATTENTIONIND      object
UNDERINFL           object
WEATHER             object
ROADCOND            object
LIGHTCOND           object
PEDROWNOTGRNT       object
SDOTCOLNUM         float64
SPEEDING            object
ST_COLCODE          object
ST_COLDESC          object
S

In [7]:
# Selecting relevant attributes from the dataset to build the model
coll_select_pd = coll_pd[['SEVERITYCODE','ADDRTYPE','SEVERITYDESC','COLLISIONTYPE','INATTENTIONIND','UNDERINFL','WEATHER','ROADCOND','LIGHTCOND','PEDROWNOTGRNT','SPEEDING']]
coll_select_pd.head()

Unnamed: 0,SEVERITYCODE,ADDRTYPE,SEVERITYDESC,COLLISIONTYPE,INATTENTIONIND,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SPEEDING
0,1,Intersection,Property Damage Only Collision,Angles,,N,Clear,Dry,Daylight,,
1,1,Block,Property Damage Only Collision,Rear Ended,Y,0,Raining,Wet,Dusk,,
2,2,Block,Injury Collision,Head On,,N,Clear,Dry,Dark - Street Lights On,,
3,2,Intersection,Injury Collision,Pedestrian,,N,Raining,Wet,Dark - Street Lights On,,
4,2,Block,Injury Collision,Other,,0,Clear,Ice,Dark - Street Lights On,,Y
