Table: Crashes
- CrashID (Primary Key, Auto Increment) - System-generated unique identifying number for a crash
- UnitNumber - Unit number entered on crash report for a unit involved in the crash
- PersonNumber - Person number captured on the crash report
- PersonType - Type of person involved in the crash
- Location - The physical location of an occupant in, on, or outside of the motor vehicle prior to the First Harmful Event or loss of control
- InjurySeverity - Severity of injury to the occupant
- Age - Age of person involved in the crash
- Ethnicity - Ethnicity of person involved in the crash
- Gender - Gender of person involved in the crash
- BodyExpulsion - The extent to which the person's body was expelled from the vehicle during any part of the crash
- RestraintType - The type of restraint used by each occupant
- AirbagDeployment - Indicates whether a person's airbag deployed during the crash and in what manner
- HelmetWorn - Indicates if a helmet was worn at the time of the crash
- Solicitation - Solicitation information
- AlcoholSpecimenType - Type of alcohol specimen taken for analysis from the primary persons involved in the crash
- AlcoholResult - Numeric blood alcohol content test result for a primary person involved in the crash, using standardized alcohol breath results (i.e. .08 or .129)
- DrugSpecimenType - Type of drug specimen taken for analysis from the primary persons involved in the crash
- DrugTestResult - Primary person drug test result
- TimeOfDeath - Time of death

Table: InjuryCounts
- CrashID (Foreign Key) - CrashID referencing the CrashID in the Crashes table
- SuspectedSeriousInjuryCount - Count of suspected serious injuries
- NonIncapacitatingInjuryCount - Count of non-incapacitating injuries
- PossibleInjuryCount - Count of possible injuries
- NotInjuredCount - Count of individuals not injured
- UnknownInjuryCount - Count of individuals with unknown injuries
- TotalInjuryCount - Total count of injuries
- DeathCount - Count of deaths


In [2]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

In [3]:
df = pd.read_csv('2018_all_person.csv')

In [4]:
df.columns = df.columns.str.lower().str.replace(' ', '_').str.replace('-', '_')

In [5]:
len(df.columns)

26

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16058 entries, 0 to 16057
Data columns (total 26 columns):
 #   Column                                     Non-Null Count  Dtype 
---  ------                                     --------------  ----- 
 0   crash_id                                   16058 non-null  int64 
 1   charge                                     16052 non-null  object
 2   citation                                   16049 non-null  object
 3   person_age                                 16058 non-null  object
 4   person_airbag_deployed                     16058 non-null  object
 5   person_alcohol_result                      16058 non-null  object
 6   person_alcohol_specimen_type_taken         16058 non-null  object
 7   person_blood_alcohol_content_test_result   16058 non-null  object
 8   person_death_count                         16058 non-null  int64 
 9   person_drug_specimen_type                  16058 non-null  object
 10  person_drug_test_result           

In [7]:
df.crash_id.value_counts()

crash_id
16392971    14
16368940    12
16390704    11
16682272    10
16211951    10
            ..
16454055     1
16313213     1
16788102     1
16542587     1
16539799     1
Name: count, Length: 7988, dtype: int64

In [8]:
df.charge.replace('NO CHARGES', np.NAN, inplace=True)

In [9]:
# percent of people who are not charge.
(df.charge.isna().sum())/len(df) * 100 

76.64092664092664

    In charges, 77% walk away from the accident with out injury

In [10]:
((df.person_injury_severity.value_counts())/ len(df)) * 100 

person_injury_severity
N - NOT INJURED                 48.268776
B - SUSPECTED MINOR INJURY      19.305019
C - POSSIBLE INJURY             13.052684
A - SUSPECTED SERIOUS INJURY    12.193299
99 - UNKNOWN                     4.514884
K - FATAL INJURY                 2.665338
Name: count, dtype: float64

In [11]:
df.charge.value_counts()

charge
FAIL TO CONTROL SPEED                                                                   214
UNSAFE SPEED                                                                            142
FAILED TO CONTROL SPEED                                                                  59
DRIVING WHILE INTOXICATED                                                                57
DWI                                                                                      40
                                                                                       ... 
FAIL TO YIELD ROW TURNING ON RED SIGNAL                                                   1
WRONG SIDE OF THE ROADWAY                                                                 1
DISREGARD TRAFFIC CONTROL DEVICES- STRAIGHT IN LEFT TURN LN                               1
FAIL TO YIELD RIGHT OF WAY-STOP SIGN                                                      1
DRIVING WHILE LICENSE INVALID, DISPLAY EXPIRED LICENSE PLATES/ REGISTRATI

In [12]:
df.person_ethnicity.value_counts()

person_ethnicity
W - WHITE                          8801
H - HISPANIC                       3647
B - BLACK                          2132
99 - UNKNOWN                        623
98 - OTHER                          379
A - ASIAN                           346
No Data                              90
I - AMER. INDIAN/ALASKAN NATIVE      40
Name: count, dtype: int64

In [13]:
df.person_ethnicity 

0           B - BLACK
1        H - HISPANIC
2           B - BLACK
3        H - HISPANIC
4           W - WHITE
             ...     
16053    H - HISPANIC
16054       W - WHITE
16055       W - WHITE
16056       W - WHITE
16057       W - WHITE
Name: person_ethnicity, Length: 16058, dtype: object

In [14]:
injury_by_age = pd.crosstab(index = df.person_age, columns = df.person_injury_severity)

In [15]:
df.person_age

0        51
1        40
2        55
3        31
4        45
         ..
16053    17
16054    48
16055    39
16056    34
16057    22
Name: person_age, Length: 16058, dtype: object

In [16]:
# for cols in df.columns:
#     print(df[cols].value_counts())

In [17]:
(((df.person_injury_severity == '99 - UNKNOWN').sum()) / len(df)) * 100

4.514883547141612

In [18]:
(df.person_alcohol_specimen_type_taken.value_counts() / len(df)) * 100 

person_alcohol_specimen_type_taken
96 - NONE                            79.144352
No Data                              16.932370
2 - BLOOD                             2.982937
98 - OTHER (EXPLAIN IN NARRATIVE)     0.429692
4 - REFUSED                           0.261552
1 - BREATH                            0.230415
3 - URINE                             0.018682
Name: count, dtype: float64

In [19]:
unique_crash_id = df.crash_id.unique()