# Seattle Paramedic Deployment Optimization

## Data Section

### Import libraries to start

In [3]:
import pandas as pd

### In order to identify those key differentiators of severe accidents, a robust dataset of incidents is required. 

In [4]:
df = pd.read_csv('https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv')

In [5]:
df.head()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [9]:
df.dtypes

SEVERITYCODE        int64
X                 float64
Y                 float64
OBJECTID            int64
INCKEY              int64
COLDETKEY           int64
REPORTNO           object
STATUS             object
ADDRTYPE           object
INTKEY            float64
LOCATION           object
EXCEPTRSNCODE      object
EXCEPTRSNDESC      object
SEVERITYCODE.1      int64
SEVERITYDESC       object
COLLISIONTYPE      object
PERSONCOUNT         int64
PEDCOUNT            int64
PEDCYLCOUNT         int64
VEHCOUNT            int64
INCDATE            object
INCDTTM            object
JUNCTIONTYPE       object
SDOT_COLCODE        int64
SDOT_COLDESC       object
INATTENTIONIND     object
UNDERINFL          object
WEATHER            object
ROADCOND           object
LIGHTCOND          object
PEDROWNOTGRNT      object
SDOTCOLNUM        float64
SPEEDING           object
ST_COLCODE         object
ST_COLDESC         object
SEGLANEKEY          int64
CROSSWALKKEY        int64
HITPARKEDCAR       object
dtype: objec

### It should at foremost identify if there were injuries involved or somehow identify severity. 

In [10]:
df['SEVERITYDESC'].value_counts()

Property Damage Only Collision    136485
Injury Collision                   58188
Name: SEVERITYDESC, dtype: int64

### The data set should be robust in time with many years of data but does not necessarily need to enable time series evaluation. There should be categorical features, such as: time of day, road conditions, weather conditions, street location and types of vehicles involved.

In [13]:
feats_df = df[['ADDRTYPE', 'WEATHER', 'ROADCOND', 'LIGHTCOND', 
                'COLLISIONTYPE', 'JUNCTIONTYPE', 'UNDERINFL']]
feats_df.head()

Unnamed: 0,ADDRTYPE,WEATHER,ROADCOND,LIGHTCOND,COLLISIONTYPE,JUNCTIONTYPE,UNDERINFL
0,Intersection,Overcast,Wet,Daylight,Angles,At Intersection (intersection related),N
1,Block,Raining,Wet,Dark - Street Lights On,Sideswipe,Mid-Block (not related to intersection),0
2,Block,Overcast,Dry,Daylight,Parked Car,Mid-Block (not related to intersection),0
3,Block,Clear,Dry,Daylight,Other,Mid-Block (not related to intersection),N
4,Intersection,Raining,Wet,Daylight,Angles,At Intersection (intersection related),0


### Ordinal features should also be considered, such as: number of people, vehicles, pedestrians and bicyclists involved.

In [15]:
ords_df = df[['PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT']]
ords_df.head()

Unnamed: 0,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT
0,2,0,0,2
1,2,0,0,2
2,4,0,0,3
3,3,0,0,3
4,2,0,0,2


### There shouldn’t be a need for hospital outcomes data or insurance adjuster reports to further assess the severity, unless it is not provided in the core data. There’s no need for specific geographic location, since the location is already confined to the city of Seattle and would otherwise lead to a sparse model. The analysis also does not need to deep-dive into the accounts reported at the town hall nor personal information of the people involved.

### The data collection and feature engineering should allow for a sufficiently accurate and explainable model such that reasonable confidence can be achieved in its outcomes. The outcome of the model is to provide a yes/no answer on whether to send paramedics. The model will not to predict further levels of severity, number of injured people, estimated medical bills or predicted damage costs. Once the desired model is achieved, those key differentiators can be identified for reasonability checks and inclusion in the updated training guides.