# Business Problem

In day-to-day life, it can be difficult for law enforcement to determine the severity of some accidents. Without a model to determine the severity of a traffic accident, officers can mistakenly send medical staff to areas where no assistance is needed, or have an ambulance called too late after precious minutes are used on investigating the area for harmed individuals.

The results of this project are best suited for law enforcement officers and local vigilantes; the kinds of people that can reliably witness and act on vehicular accidents.

# Data

The dataset in use comes from the recorded set of vehicular accidents in Seattle since 2004. It includes information such as the type and severity of the accident, what was involved, where it occurred, and other conditions.

## Dependency & Dataset Acquisition

In [3]:
import itertools
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import pandas as pd
import numpy as np
import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

In [4]:
!wget -O Data-Collisions.csv https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv

--2020-10-13 15:26:34--  https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv
Resolving s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)... 67.228.254.196
Connecting to s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73917638 (70M) [text/csv]
Saving to: ‘Data-Collisions.csv’


2020-10-13 15:26:36 (41.8 MB/s) - ‘Data-Collisions.csv’ saved [73917638/73917638]



In [5]:
df = pd.read_csv('Data-Collisions.csv')
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [6]:
df.describe()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,INTKEY,SEVERITYCODE.1,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,SDOT_COLCODE,SDOTCOLNUM,SEGLANEKEY,CROSSWALKKEY
count,194673.0,189339.0,189339.0,194673.0,194673.0,194673.0,65070.0,194673.0,194673.0,194673.0,194673.0,194673.0,194673.0,114936.0,194673.0,194673.0
mean,1.298901,-122.330518,47.619543,108479.36493,141091.45635,141298.811381,37558.450576,1.298901,2.444427,0.037139,0.028391,1.92078,13.867768,7972521.0,269.401114,9782.452
std,0.457778,0.029976,0.056157,62649.722558,86634.402737,86986.54211,51745.990273,0.457778,1.345929,0.19815,0.167413,0.631047,6.868755,2553533.0,3315.776055,72269.26
min,1.0,-122.419091,47.495573,1.0,1001.0,1001.0,23807.0,1.0,0.0,0.0,0.0,0.0,0.0,1007024.0,0.0,0.0
25%,1.0,-122.348673,47.575956,54267.0,70383.0,70383.0,28667.0,1.0,2.0,0.0,0.0,2.0,11.0,6040015.0,0.0,0.0
50%,1.0,-122.330224,47.615369,106912.0,123363.0,123363.0,29973.0,1.0,2.0,0.0,0.0,2.0,13.0,8023022.0,0.0,0.0
75%,2.0,-122.311937,47.663664,162272.0,203319.0,203459.0,33973.0,2.0,3.0,0.0,0.0,2.0,14.0,10155010.0,0.0,0.0
max,2.0,-122.238949,47.734142,219547.0,331454.0,332954.0,757580.0,2.0,81.0,6.0,2.0,12.0,69.0,13072020.0,525241.0,5239700.0


## Column Overview

SEVERITYCODE: Indicates collision severity. This dataset includes two categories: Property Damage (indicated by a 1) and Injury (indicated by a 2). This column will be used as the model's dependent variable.

### Independent, Apparent Variables

These columns correspond to variables that will be considered for the model which are immediately apparent, even to a bystander that arrives after the accident arrives.

ADDRTYPE: Indicates the type of location where the accident occurred. This includes Alleys, Blocks, and Intersections.

LIGHTCOND: Indicates the lighting conditions at the accident. Entries include Daylight, Dark (Street Lights On), Dark (Street Lights Off), Dark (No Street Lights), Dusk, Dawn, Dark (Unknown Lighting), and Other/Unknown.

ROADCOND: Indicates the road conditions at the accident. Entries include Wet, Dry, Snow/Slush, Ice, Sand/Mud/Dirt, Standing Water, Oil, and Other/Unknown.

WEATHER: Indicates the weather conditions at the accident. Entries include Overcase, Raining, Clear, Snowing, Fog/Smog/Smoke, Sleet/Hail/Freezing Rain, Blowing Sand/Dirt, Severe Crosswind, Partly Cloudy, and Other/Unknown.

PEDCOUNT: Indicates the number of pedestrians involved in the collision. Data for this ranges from 0 to 6.

PEDCYLCOUNT: Indicates the number of bicycles involved in the collision. Data for this ranges from 0 to 2.

VEHCOUNT: Indicates the number of vehicles involved in the collision. Data for this ranges from 0 to 12.

### Independent, Semi-Apparent Variables

These columns correspond to variables that will be considered for the model  which are apparent to a bystander at the time of the accident.

COLLISIONTYPE: Indicates the type of collision that occurred. Entries include Angles, Sideswipe, Parked Car, Cycles, Rear-Ended, Head On, Left Turn, Right Turn, Pedestrian, and Other.

HITPARKEDCAR: Indicates whether a parked car was involved in the collision, via a Yes/No entry.

PEDROWNOTGRNT: Indicates whether a pedestrian's right of way was not granted, via a Yes/No entry.

SPEEDING: Indicates whether a speeding vehicle was a key factor in the collision, via a Yes/No entry.

### Independent, Non-Apparent Variables

These columns correspond to variables that will be considered for the model which typically only become apparent after further investigation.

INATTENTIONIND: Indicates if inattention was a factor in the collision, via a Yes/No entry.

UNDERINFL: Indicates if a driver was under the influence of drugs or alcohol, via a Yes/No entry.

PERSONCOUNT: Indicates the number of total people involved in the collision, including drivers, passengers and vehicles. Data ranges from 0 to 81 people involved.

### Unused Variables

Other columns included aspects such as:

Exception Codes: Used to justify blank values in the table. These will help clean out the table.

Date/Time/Location: Used to describe the exact area and time where the collision occurred. The goal of this model is to provide generalized recommendations. Thus, these variables aren't relevant for the study.

Collision Keys/Codes/Identifiers: Used to distinguish the collision event from others. Row numbers will suffice for the model.

Lane/Segment/Intersection/Crosswalk Key: Used to identify specific traffic landmarks involved. Only the presence/involvement of such factors will suffice.

Descriptions: More detailed descriptions of certain columns. 

### Example Row

The dataset's first row shows data for a vehicular accident, where at least one injury occurred (due to the severity code of 2.)

From the data, one can tell that the incident involved an Angles type collision at an intersection. 2 people and 2 vehicles were involved. No bicycles or pedestrians were involved, and neither of the cars were parked. Neither of the drivers were subject to inattention or intoxication. At the time of the collision, the weather was overcast, involving sufficient daylight, but the road was wet.

From this, we can infer that the intersection was wet from a slight drizzle, before/after a rain period, or by artificial causes (say, a nearby car wash or spillage). One of the cars entered from an angle, lost control due to hydroplaning, and hit the other one, with no other cars suffering from the incident.