# Clean the Texas OIS dataset for analysis -- civilians shot data

* Inputs:
   * `OIS.xlsx`
   * `agencies_and_counties.csv` (used to add county information)
* Output: `shot_civilians.csv`

##### Author: Everett Wetchler (everett.wetchler@gmail.com)

## 1. Setup and read data

In [1]:
CLEANED_FILENAME = 'shot_civilians.csv'
DTW_OIS_PROJECT = 'tji/officer-involved-shootings'

In [2]:
import datadotworld as dw
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

%load_ext watermark
%watermark -a "Everett Wetchler" -d -t -z -r -g -w -p datadotworld,numpy,pandas

Everett Wetchler 2018-09-17 15:42:58 PDT

datadotworld 1.6.0
numpy 1.14.5
pandas 0.23.3
Git hash: fec42e11d9c956535a7938208aebfdd0f564d5cc
Git repo: git@github.com:texas-justice-initiative/data-processing.git
watermark 1.6.1


In [3]:
from lib.standardize_police_agency_names import standardize_agency_name
from lib.cleaning_tools import *

In [4]:
datasets = dw.load_dataset('tji/auxiliary-datasets', force_update=True)
agencies = datasets.dataframes['agencies_and_counties']

In [5]:
shootings = read_dtw_excel(DTW_OIS_PROJECT, 'original/OIS.xlsx')['OISTable']
print("Found %d OIS civilians-shot incidents from %s to %s" % (
    len(shootings),
    shootings['Date of Incident'].min().strftime('%Y-%m-%d'),
    shootings['Date of Incident'].max().strftime('%Y-%m-%d')))
shootings.head()

Writing excel file to temp file: /var/folders/dc/8cbxbsh515s908xl0zyprszm0000gn/T/tmppg50xrt0
Found 520 OIS civilians-shot incidents from 2015-09-02 to 2018-09-03


Unnamed: 0,No.,Number of Reports Filed,Date of Report 1,Date AG Received,Name of Agency 1,City of Agency 1,Zip code of Agency 1,Date of Incident,Time of Incident,Name of Person 1 Filling out Form,Email Address of Person 1 Filling out Form,Date of Report 2,Name of Agency 2,City of Agency 2,Zip code of Agency 2,Name of Person 2 Filling out form,Email Address of Person 2 Filling out Form,Date of Report 3,Name of Agency 3,City of Agency 3,Zip code of Agency 3,Name of Person 3 Filling out form,Email Address of Person 3 Filling out Form,Date of Report 4,Name of Agency 4,City of Agency 4,Zip code of Agency 4,Name of Person 4 Filling out form,Email Address of Person 4 Filling out Form,Date of Report 5,Name of Agency 5,City of Agency 5,Zip code of Agency 5,Name of Person 5 Filling out form,Email Address of Person 5 Filling out Form,Date of Report 6,Name of Agency 6,City of Agency 6,Zip code of Agency 6,Name of Person 6 Filling out form,Email Address of Person 6 Filling out Form,Date of Report 7,Name of Agency 7,City of Agency 7,Zip code of Agency 7,Name of Person 7 Filling out form,Email Address of Person 7 Filling out Form,Date of Report 8,Name of Agency 8,City of Agency 8,...,Peace Officer 1's Age,Peace Officer 1's Race/Ethnicity,Peace Officer 2's Gender,Peace Officer 2's Age,Peace Officer 2's Race/Ethnicity,Peace Officer 3's Gender,Peace Officer 3's Age,Peace Officer 3's Race/Ethnicity,Peace Officer 4's Gender,Peace Officer 4's Age,Peace Officer 4's Race/Ethnicity,Peace Officer 5's Gender,Peace Officer 5's Age,Peace Officer 5's Race/Ethnicity,Peace Officer 6's Gender,Peace Officer 6's Age,Peace Officer 6's Race/Ethnicity,Peace Officer 7's Gender,Peace Officer 7's Age,Peace Officer 7's Race/Ethnicity,Peace Officer 8's Gender,Peace Officer 8's Age,Peace Officer 8's Race/Ethnicity,Peace Officer 9's Gender,Peace Officer 9's Age,Peace Officer 9's Race/Ethnicity,Peace Officer 10's Gender,Peace Officer 10's Age,Peace Officer 10's Race/Ethnicity,Peace Officer 11's Gender,Peace Officer 11's Age,Peace Officer 11's Race/Ethnicity,On Duty or Off Duty,Peace Officer Responding With 1 or More Officers,Incident Occurred During or as a Result of,Incident Occurred During or as a Result of 2,Incident Occurred During or as a Result of 3,"If Other, Specify Type of Call",Deadly Weapon Description,NEWS 1,NEWS 2,NEWS 3,NEWS 4,CDR?,CDR Narrative,Narrative Published by Law Enforcement,Column1,Column2,SHORTER,EXTRAS
0,1,1,9/16/2015,NaT,Freeport Police Department,Freeport,77541,2015-09-02,,Pamela Morris,pmorris@freeport.tx.us,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,27,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Traffic stop,,,Narcotic Stop and Evading Arrest,,ABC 13,Your Southest Texas,,,,,,,,,
1,2,1,10/1/2015,NaT,Plano Police Department,Plano,75074,2015-09-03,,Curtis Howard,curtish@plano.gov,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,30,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,No,Other - Specify type of call,,,Accidental discharge ricochet during range act...,,,,,,,,,,,,
2,3,1,10/6/2015,NaT,Parker County Sheriff's Office,Weatherford,76086,2015-09-04,,Meredith Gray,meredith.gray@parkercountytx.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,27,Anglo or White,Male,56.0,Anglo or White,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Other - Specify type of call,,,Investigation of criminal activity,Firearm,WFAA,DFW CBS Local,Star Telegram,Fox 4 News,YES,Decedent shot a rifle at LE Officers whom retu...,,,,fired at officers,
3,4,1,9/11/2015,NaT,Houston Police Department,Houston,77002,2015-09-05,,Odon Belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,28,Anglo or White,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Emergency Call or Request for Assistance,,,,Firearm,Chron,ABC 13,Click 2 Houston,,,,An officer was dispatched to a weapons disturb...,,,,
4,5,1,10/15/2015,NaT,Irving Police Department,Irving,75061,2015-09-08,,Michael Coleman,mcoleman@cityofirving.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,38,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,No,Other - Specify type of call,,,Training Exercise - ricochet fragments resulti...,,,,,,,,,,,,


## 2. Begin cleaning

In [6]:
# Normalize column names
shootings.columns = [c.strip().lower() for c in shootings.columns]
# Normalize string cell values
upcase_strip_string_cells(shootings)
# Drop irrelevant columns
shootings.drop(['no.', 'column1', 'column2', 'extras'], axis=1, inplace=True)

In [7]:
max_reports_per_incident = max(int(c.strip().split()[-1]) for c in shootings.columns
                               if c.startswith("date of report"))
print("Deaths may have as many as %d OIS reports filed" % max_reports_per_incident)

Deaths may have as many as 11 OIS reports filed


In [8]:
# Make the column names more machine-friendly
col_renames = {
    "number of reports filed": "num_reports_filed",
    "date ag received": "date_ag_received",
    "date of incident": "date_incident",
    "time of incident": "time_incident",
    "injured or deceased's first name": "civilian_name_first",
    "injured or deceased's last name": "civilian_name_last",
    "injured or deceased's gender": "civilian_gender",
    "injured or deceased's age": "civilian_age",
    "injured or deceased's race/ethnicity": "civilian_race",
    "street address of incident": "incident_address",
    "city of incident": "incident_city",
    "county of incident": "incident_county",
    "zip code of incident": "incident_zip",
    "incident resulted in": "incident_resulted_in",
    "carried, exhibited or used deadly weapon": "deadly_weapon",
    "on duty or off duty": "officer_on_duty",
    "peace officer responding with 1 or more officers": "multiple_officers_involved",
    "incident occurred during or as a result of": "incident_result_of",
    "if other, specify type of call": "incident_call_other",
    "deadly weapon description": "weapon_reported_by_media",
    "cdr?": "custodial_death_report",
    "cdr narrative": "cdr_narrative",
    "narrative published by law enforcement": "lea_narrative_published",
    "shorter": "lea_narrative_shorter",
}

# Some columns have multiple copies for different individuals,
# e.g. agency_name_1 for the first officer's agency, then
# agency_name_2, for the second, etc. To avoid typing out
# all these numbers, we loop through such columns and rename
# them appropriately.
numerical_renames = {
    "date of report %d": "agency_report_date_%d",
    "name of agency %d": "agency_name_%d",
    "city of agency %d": "agency_city_%d",
    "zip code of agency %d": "agency_zip_%d",
    "name of person %d filling out form": "agency_name_person_filling_out_%d",
    "email address of person %d filling out form": "agency_email_person_filling_out_%d",
    "peace officer %d's gender": "officer_gender_%d",
    "peace officer %d's age": "officer_age_%d",
    "peace officer %d's race/ethnicity": "officer_race_%d",
    "incident occurred during or as a result of %d": "officer_caused_injury_%d",
    "news %d": "news_coverage_%d",
}
for i in range(1, max_reports_per_incident + 1):
    for k, v in numerical_renames.items():
        k = (k % i)
        v = v % i
        col_renames[k] = v

shootings.columns = [col_renames[c] for c in shootings.columns]
shootings.head()

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_zip_7,agency_name_person_filling_out_7,agency_email_person_filling_out_7,agency_report_date_8,agency_name_8,agency_city_8,agency_zip_8,...,incident_resulted_in,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter
0,1,9/16/2015,NaT,FREEPORT POLICE DEPARTMENT,FREEPORT,77541,2015-09-02,,PAMELA MORRIS,PMORRIS@FREEPORT.TX.US,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,INJURY,NO,MALE,27,HISPANIC OR LATINO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ON DUTY,YES,TRAFFIC STOP,,,NARCOTIC STOP AND EVADING ARREST,,ABC 13,YOUR SOUTHEST TEXAS,,,,,,
1,1,10/1/2015,NaT,PLANO POLICE DEPARTMENT,PLANO,75074,2015-09-03,,CURTIS HOWARD,CURTISH@PLANO.GOV,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,INJURY,YES,MALE,30,HISPANIC OR LATINO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ON DUTY,NO,OTHER - SPECIFY TYPE OF CALL,,,ACCIDENTAL DISCHARGE RICOCHET DURING RANGE ACT...,,,,,,,,,
2,1,10/6/2015,NaT,PARKER COUNTY SHERIFF'S OFFICE,WEATHERFORD,76086,2015-09-04,,MEREDITH GRAY,MEREDITH.GRAY@PARKERCOUNTYTX.COM,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,DEATH,YES,MALE,27,ANGLO OR WHITE,MALE,56.0,ANGLO OR WHITE,,,,,,,,,,,,,,,,,,,,,,,,,,,,ON DUTY,YES,OTHER - SPECIFY TYPE OF CALL,,,INVESTIGATION OF CRIMINAL ACTIVITY,FIREARM,WFAA,DFW CBS LOCAL,STAR TELEGRAM,FOX 4 NEWS,YES,DECEDENT SHOT A RIFLE AT LE OFFICERS WHOM RETU...,,FIRED AT OFFICERS
3,1,9/11/2015,NaT,HOUSTON POLICE DEPARTMENT,HOUSTON,77002,2015-09-05,,ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,INJURY,YES,MALE,28,ANGLO OR WHITE,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ON DUTY,YES,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,,,FIREARM,CHRON,ABC 13,CLICK 2 HOUSTON,,,,AN OFFICER WAS DISPATCHED TO A WEAPONS DISTURB...,
4,1,10/15/2015,NaT,IRVING POLICE DEPARTMENT,IRVING,75061,2015-09-08,,MICHAEL COLEMAN,MCOLEMAN@CITYOFIRVING.COM,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,INJURY,NO,MALE,38,HISPANIC OR LATINO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ON DUTY,NO,OTHER - SPECIFY TYPE OF CALL,,,TRAINING EXERCISE - RICOCHET FRAGMENTS RESULTI...,,,,,,,,,


### Standardize common column types

In [9]:
standardize_gender_cols(shootings)
standardize_race_cols(shootings)
numericalize_age_cols(shootings)
convert_date_cols(shootings)

Numericalizing column civilian_age
Replaced 1 bad values with NA:
Unique bad values: {'NOT AVAILABLE'}
Numericalizing column officer_age_1
Numericalizing column officer_age_2
Numericalizing column officer_age_3
Numericalizing column officer_age_4
Numericalizing column officer_age_5
Numericalizing column officer_age_6
Numericalizing column officer_age_7
Numericalizing column officer_age_8
Numericalizing column officer_age_9
Numericalizing column officer_age_10
Numericalizing column officer_age_11
Converting column agency_report_date_1 to datetime
Converting column date_ag_received to datetime
Converting column date_incident to datetime
Converting column agency_report_date_2 to datetime
Converting column agency_report_date_3 to datetime
Converting column agency_report_date_4 to datetime
Converting column agency_report_date_5 to datetime
Converting column agency_report_date_6 to datetime
Converting column agency_report_date_7 to datetime
Converting column agency_report_date_8 to datetime


### Create a column for the number of officers whose information was recorded here, `num_officers_recorded`

We use the presence of officer gender as an indicator. Not all incidents use a separate agency_name_X column per officer, so we can't use that.

In [10]:
officer_gender_cols = ['officer_gender_%d' % i for i in range(1, 11)]
shootings['num_officers_recorded'] = shootings[officer_gender_cols].notnull().sum(axis=1)
shootings[shootings['num_officers_recorded'] > 5].head()

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_zip_7,agency_name_person_filling_out_7,agency_email_person_filling_out_7,agency_report_date_8,agency_name_8,agency_city_8,agency_zip_8,...,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,num_officers_recorded
62,8,2016-01-05,NaT,ODESSA POLICE DEPARTMENT,ODESSA,78761,2015-12-23,,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761.0,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761.0,DAVID LARA,DLARA@ODESSA-TX.GOV,2016-01-05,ODESSA POLICE DEPARTMENT,ODESSA,79761.0,...,YES,MALE,26.0,HISPANIC,MALE,26.0,WHITE,MALE,40.0,WHITE,MALE,24.0,WHITE,MALE,28.0,HISPANIC,MALE,42.0,HISPANIC,MALE,27.0,HISPANIC,MALE,41.0,HISPANIC,,,,,,,,,,ON DUTY,YES,EXECUTION OF A WARRANT,,,,FIREARM,MY SAN ANTONIO,NEWS WEST 9,CBS 7,KSAT,,,,,8
157,10,2017-02-23,2017-02-23,DART POLICE DEPARTMENT,DALLAS,75203,2016-07-07,21:00:00,LT. SHERRI PLUNK #43,SPLUNK@DART.ORG,2017-02-23,DART POLICE DEPARTMENT,DALLAS,75203,LT. SHERRI PLUNK #43,SPLUNK@DART.ORG,2017-02-23,DART POLICE DEPARTMENT,DALLAS,75203,LT. SHERRI PLUNK #43,SPLUNK@DART.ORG,2017-02-23,DART POLICE DEPARTMENT,DALLAS,75203,LT. SHERRI PLUNK #43,SPLUNK@DART.ORG,2017-02-07,DALLAS POLICE DEPARTMENT,DALLAS,75215,SGT. E. MERRITT #8112,E.MERRITT@DPD.DALLASCITYHALL.COM,2017-02-07,DALLAS POLICE DEPARTMENT,DALLAS,75215.0,SGT. E. MERRITT #8112,E.MERRITT@DPD.DALLASCITYHALL.COM,2017-02-07,DALLAS POLICE DEPARTMENT,DALLAS,75215.0,SGT. E. MERRITT #8112,E.MERRITT@DPD.DALLASCITYHALL.COM,2017-02-07,DALLAS POLICE DEPARTMENT,DALLAS,75215.0,...,YES,MALE,63.0,WHITE,MALE,44.0,WHITE,MALE,43.0,WHITE,MALE,37.0,BLACK,MALE,27.0,WHITE,MALE,31.0,HISPANIC,MALE,34.0,OTHER,MALE,39.0,WHITE,MALE,41.0,WHITE,MALE,46.0,WHITE,,,,ON DUTY,YES,OTHER - SPECIFY TYPE OF CALL,,,AMBUSH OF OFFICERS BY SUSPECT. WOUNDED ON 7/7/...,,DPD PRESS RELEASE,,,,YES,"ON JULY 7, 2016, AT APPROXIMATELY 8:57 P.M., D...",,FIRED AT OFFICERS,10
271,8,2017-03-15,2017-03-15,CLUTE POLICE DEPARTMENT,CLUTE,77531,2017-02-24,14:20:00,CHIEF RANDY BRATTON,CHIEF RANDY BRATTON,2017-02-25,LAKE JACKSON POLICE DEPARTMENT,LAKE JACKSON,77566,CHIEF RICHARD J. PARK,RPARK@LAKEJACKSONPD.NET,2017-02-25,LAKE JACKSON POLICE DEPARTMENT,LAKE JACKSON,77566,CHIEF RICHARD J. PARK,RPARK@LAKEJACKSONPD.NET,2017-02-25,LAKE JACKSON POLICE DEPARTMENT,LAKE JACKSON,77566,CHIEF RICHARD J. PARK,RPARK@LAKEJACKSONPD.NET,2017-03-15,CLUTE POLICE DEPARTMENT,CLUTE,77531,CHIEF RANDY BRATTON,RBRATTON@CLUTEPD.COM,2017-02-25,LAKE JACKSON POLICE DEPARTMENT,LAKE JACKSON,77566.0,CHIEF RICHARD J. PARK,RPARK@LAKEJACKSONPD.NET,2017-03-07,FREEPORT POLICE DEPARTMENT,FREEPORT,77541.0,DET. COREY BRINKMAN,CBRINKMAN@FREEPORT.TX.US,2017-03-07,FREEPORT POLICE DEPARTMENT,FREEPORT,77541.0,...,YES,MALE,29.0,WHITE,MALE,32.0,WHITE,MALE,27.0,WHITE,MALE,26.0,WHITE,MALE,27.0,WHITE,FEMALE,36.0,WHITE,MALE,29.0,WHITE,MALE,35.0,BLACK,,,,,,,,,,ON DUTY,YES,TRAFFIC STOP,,,EXECUTION OF A WARRANT,HANDGUN,THE FACTS,KPRC,,,YES,"IN ANGLETON, THE SUSPECT FLED FROM OFFICERS IN...",,POINTED A GUN AT OFFICERS,8
295,6,2017-05-15,2017-05-16,WACO POLICE DEPARTMENT,WACO,76708,2017-04-10,21:12:00,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,2017-05-15,WACO POLICE DEPARTMENT,WACO,76708,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,2017-05-15,WACO POLICE DEPARTMENT,WACO,76708,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,2017-05-15,WACO POLICE DEPARTMENT,WACO,76708,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,2017-05-15,WACO POLICE DEPARTMENT,WACO,76708,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,2017-05-15,WACO POLICE DEPARTMENT,WACO,76708.0,SGT. V.R. PRICE JR.,JPRICE@WACOTX.GOV,NaT,,,,,,NaT,,,,...,YES,MALE,36.0,WHITE,MALE,26.0,WHITE,MALE,38.0,WHITE,MALE,50.0,WHITE,FEMALE,36.0,HISPANIC,MALE,27.0,WHITE,,,,,,,,,,,,,,,,ON DUTY,NO,"HOSTAGE, BARRICADE, OR OTHER EMERGENCY SITUATION",,VEHICLE PURSUIT OF ARMED ROBBERY SUSPECT WHICH...,OTHER - SPECIFY TYPE OF CALL,,,,,,,,,,6
303,6,2017-05-04,2017-05-08,HOUSTON POLICE DEPARTMENT,HOUSTON,77002,2017-04-24,22:00:00,SGT. ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,2017-05-04,HOUSTON POLICE DEPARTMENT,HOUSTON,77002,SGT. ODON BELMAREZ,ODON.BERLMAREZ@HOUSTONPOLICE.ORG,2017-05-04,HOUSTON POLICE DEPARTMENT,HOUSTON,77002,SGT. ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,2017-04-24,TEXAS DEPARTMENT OF PUBLIC SAFETY,HOUSTON,77065,DARON PARKER,DARON.PARKER@DPS.TEXAS.GOV,2017-04-24,TEXAS DEPARTMENT OF PUBLIC SAFETY,HOUSTON,77065,DARON PARKER,DARON.PARKER@DPS.TEXAS.GOV,2017-05-04,HOUSTON POLICE DEPARTMENT,HOUSTON,77002.0,SGT. ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,NaT,,,,,,NaT,,,,...,YES,MALE,57.0,WHITE,MALE,49.0,WHITE,MALE,31.0,WHITE,MALE,35.0,HISPANIC,MALE,29.0,WHITE,MALE,49.0,WHITE,,,,,,,,,,,,,,,,ON DUTY,YES,OTHER - SPECIFY TYPE OF CALL,,,ROBBERY STING,,HOUSTON CHRONICLE,KHOU,,,YES,THE DECEDENT AND TWO ACCOMPLICES ARMED WITH WE...,,,6


In [11]:
shootings['custodial_death_report'] = shootings['custodial_death_report'].fillna('n').apply(lambda c: c.strip().lower()[0] == 'y')
shootings['multiple_officers_involved'] = shootings['multiple_officers_involved'].fillna('n').apply(lambda c: c.strip().lower()[0] == 'y')

shootings['civilian_died'] = shootings['incident_resulted_in'].apply(lambda x: x.strip().lower()) == 'death'
shootings.drop('incident_resulted_in', axis=1, inplace=True)
shootings['incident_result_of'] = shootings['incident_result_of'].apply(lambda x: x.strip().lower())

shootings['officer_on_duty'] = shootings['officer_on_duty'].apply(lambda x: x if pd.isnull(x) else (x.strip().lower().startswith('on ')))
shootings['weapon_reported_by_media'] = shootings['weapon_reported_by_media'].apply(lambda w: w if pd.isnull(w) else w.strip().lower())

### NOTE: Data quirk

It's unclear how many officers were actually at the scene
* The `multiple_officer_involved` column is a yes/no column, but there are also columns to list the agency, gender, etc for each officer. These do not always agree. Sometimes `multiple_officer_involved` is yes, but only one officer's details are recorded. And sometimes we have details for many officers, but `multiple_officer_involved` is no. See below.
* The punchline is to interpret these columns with caution

In [12]:
pd.crosstab(shootings.multiple_officers_involved, shootings.num_officers_recorded)

num_officers_recorded,1,2,3,4,5,6,7,8,10
multiple_officers_involved,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
False,90,2,2,0,1,1,0,0,0
True,266,86,30,20,10,3,4,2,3


### Handle weapons-related questions

In [13]:
shootings['deadly_weapon'].value_counts()

YES                    434
NO                      85
YES - CHIEF SAID NO      1
Name: deadly_weapon, dtype: int64

In [14]:
# Convert yes/no to boolean
shootings['deadly_weapon'] = shootings['deadly_weapon'].apply(
    lambda weapon: weapon.lower().strip() == 'yes')
shootings['deadly_weapon'].value_counts()

True     434
False     86
Name: deadly_weapon, dtype: int64

In [15]:
# Check for mistakes. Sometimes the "was there a deadly weapon?" question
# is answered with "No" while a description of a deadly weapon is given.
pd.crosstab(shootings['deadly_weapon'], shootings['weapon_reported_by_media'].notnull())

weapon_reported_by_media,False,True
deadly_weapon,Unnamed: 1_level_1,Unnamed: 2_level_1
False,69,17
True,64,370


In [16]:
# Let's look at these instances to be sure
s = shootings[~shootings['deadly_weapon'] & pd.notnull(shootings['weapon_reported_by_media'])][['deadly_weapon', 'weapon_reported_by_media']]
print(len(s))
s

17


Unnamed: 0,deadly_weapon,weapon_reported_by_media
40,False,vehicle
56,False,firearm
162,False,bb gun
227,False,vehicle
252,False,vehicle
268,False,vehicle
286,False,vehicle
300,False,vehicle
321,False,took officer's taser
323,False,fell while getting out of car?


In [17]:
# What weapons do people use? Can we categorize them?
sorted(list(x for x in set(shootings.weapon_reported_by_media) if pd.notnull(x)))

['"agent was assaulted"',
 'air soft gun',
 'armed',
 'arms (choking)',
 'assault rifle',
 'axe',
 'baseball bat',
 'baseball bat and fireplace poker',
 'bb gun',
 'body',
 'bomb',
 'box cutter',
 'butcher knife',
 'car',
 'club, bat',
 "deputy's gun",
 'fell while getting out of car?',
 'firearm',
 'folding knife',
 'glock 40',
 'gun',
 'handgun',
 'hatchet',
 'imitation weapon',
 'knife',
 'knife - not opened',
 'knife, gun',
 'knives',
 'knives (2)',
 'long gun',
 'machete',
 'metal flashlight',
 "officer's gun",
 'pellet gun',
 'pencil',
 'pickaxe',
 'pistol',
 'reports say unarmed',
 'revolver',
 'rifle',
 'rock',
 'samurai sword',
 'sawed-off shotgun',
 'scissors, screwdriver',
 'semi-automatic rifle',
 'sharp metal object (piece of headphones)',
 'shotgun',
 'sword',
 "took officer's knife",
 "took officer's taser",
 'truck',
 'vehicle',
 'vehicle, gun',
 'weapon',
 'weed-cutter']

In [18]:
# Manual categorization of weapons
weapon_types = {
    'FIREARM': [
        'handgun', 'sawed-off shotgun', 'revolver', 'rifle',
        'assault rifle', 'firearm', 'shotgun', 'long gun', 'gun',
        'glock 40', 'pistol', 'knife, gun', "deputy's gun",
        'vehicle, gun', 'semi-automatic rifle'],
    'KNIFE/CUTTING': [
        'hatchet', 'butcher knife', 'knife', 'knives', 'box cutter',
        'knife - not opened', 'machete', 'sword', 'axe', 'knives (2)',
        "officer's gun", 'pickaxe', 'samurai sword', "took officer's knife",
        'sharp metal object (piece of headphones)', 'scissors, screwdriver'],
    'VEHICLE': ['car', 'truck', 'vehicle'],
}
type_lookup = {}
for w_type, weapons in weapon_types.items():
    for w in weapons:
        if w in type_lookup:
            print("DUPLICATE:", w_type, w)
        type_lookup[w] = w_type

weapons = []
for has_weapon, desc in zip(shootings['deadly_weapon'], shootings['weapon_reported_by_media']):
    if pd.isnull(desc) or not desc:
        if has_weapon:
            weapons.append('(DETAILS MISSING)')
        else:
            weapons.append(np.nan)
        continue
    weapons.append(type_lookup.get(desc, 'OTHER'))

shootings['weapon_reported_by_media_category'] = weapons    
shootings['weapon_reported_by_media_category'].value_counts()

FIREARM              258
(DETAILS MISSING)     64
KNIFE/CUTTING         62
VEHICLE               37
OTHER                 30
Name: weapon_reported_by_media_category, dtype: int64

In [19]:
sorted(set(shootings['weapon_reported_by_media'][shootings['weapon_reported_by_media_category'] == 'OTHER']))

['"agent was assaulted"',
 'air soft gun',
 'armed',
 'arms (choking)',
 'baseball bat',
 'baseball bat and fireplace poker',
 'bb gun',
 'body',
 'bomb',
 'club, bat',
 'fell while getting out of car?',
 'folding knife',
 'imitation weapon',
 'metal flashlight',
 'pellet gun',
 'pencil',
 'reports say unarmed',
 'rock',
 "took officer's taser",
 'weapon',
 'weed-cutter']

### Uppercase the content of all columns

In [20]:
print('Uppercasing columns: ', end='')
for col, dt in shootings.dtypes.items():
    if dt == 'object':
        print(col, end=' ')
        shootings[col] = shootings[col].apply(lambda s: s if type(s) != str else s.upper())

Uppercasing columns: agency_name_1 agency_city_1 agency_zip_1 time_incident agency_name_person_filling_out_1 agency_email_person_filling_out_1 agency_name_2 agency_city_2 agency_zip_2 agency_name_person_filling_out_2 agency_email_person_filling_out_2 agency_name_3 agency_city_3 agency_zip_3 agency_name_person_filling_out_3 agency_email_person_filling_out_3 agency_name_4 agency_city_4 agency_zip_4 agency_name_person_filling_out_4 agency_email_person_filling_out_4 agency_name_5 agency_city_5 agency_zip_5 agency_name_person_filling_out_5 agency_email_person_filling_out_5 agency_name_6 agency_city_6 agency_name_person_filling_out_6 agency_email_person_filling_out_6 agency_name_7 agency_city_7 agency_name_person_filling_out_7 agency_email_person_filling_out_7 agency_name_8 agency_city_8 agency_name_person_filling_out_8 agency_email_person_filling_out_8 agency_name_9 agency_city_9 agency_name_person_filling_out_9 agency_email_person_filling_out_9 agency_name_10 agency_city_10 agency_name_per

### Standardize police agency names

In [21]:
for i in range(1, 11):
    shootings['agency_name_%d' % i] = shootings['agency_name_%d' % i].apply(standardize_agency_name)

shootings.head()

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_zip_7,agency_name_person_filling_out_7,agency_email_person_filling_out_7,agency_report_date_8,agency_name_8,agency_city_8,agency_zip_8,...,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,num_officers_recorded,civilian_died,weapon_reported_by_media_category
0,1,2015-09-16,NaT,FREEPORT POLICE DEPT,FREEPORT,77541,2015-09-02,,PAMELA MORRIS,PMORRIS@FREEPORT.TX.US,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,...,27.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,True,TRAFFIC STOP,,,NARCOTIC STOP AND EVADING ARREST,,ABC 13,YOUR SOUTHEST TEXAS,,,False,,,,1,False,
1,1,2015-10-01,NaT,PLANO POLICE DEPT,PLANO,75074,2015-09-03,,CURTIS HOWARD,CURTISH@PLANO.GOV,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,...,30.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,False,OTHER - SPECIFY TYPE OF CALL,,,ACCIDENTAL DISCHARGE RICOCHET DURING RANGE ACT...,,,,,,False,,,,1,False,(DETAILS MISSING)
2,1,2015-10-06,NaT,PARKER CO SHERIFFS OFFICE,WEATHERFORD,76086,2015-09-04,,MEREDITH GRAY,MEREDITH.GRAY@PARKERCOUNTYTX.COM,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,...,27.0,WHITE,MALE,56.0,WHITE,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,True,OTHER - SPECIFY TYPE OF CALL,,,INVESTIGATION OF CRIMINAL ACTIVITY,FIREARM,WFAA,DFW CBS LOCAL,STAR TELEGRAM,FOX 4 NEWS,True,DECEDENT SHOT A RIFLE AT LE OFFICERS WHOM RETU...,,FIRED AT OFFICERS,2,True,FIREARM
3,1,2015-09-11,NaT,HOUSTON POLICE DEPT,HOUSTON,77002,2015-09-05,,ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,...,28.0,WHITE,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,True,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,,,FIREARM,CHRON,ABC 13,CLICK 2 HOUSTON,,False,,AN OFFICER WAS DISPATCHED TO A WEAPONS DISTURB...,,1,False,FIREARM
4,1,2015-10-15,NaT,IRVING POLICE DEPT,IRVING,75061,2015-09-08,,MICHAEL COLEMAN,MCOLEMAN@CITYOFIRVING.COM,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,,,NaT,,,,...,38.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,False,OTHER - SPECIFY TYPE OF CALL,,,TRAINING EXERCISE - RICOCHET FRAGMENTS RESULTI...,,,,,,False,,,,1,False,


### Add county information

In [22]:
dept_to_county = dict(zip(agencies.agency, agencies.county))
for i in range(1, 11):
    shootings = insert_col_after(
        shootings,
        shootings['agency_name_%d' % i].apply(lambda d: dept_to_county.get(d, np.nan)),
        'agency_county_%d' % i,
        'agency_city_%d' % i)

shootings.agency_county_1.isnull().value_counts()

False    515
True       5
Name: agency_county_1, dtype: int64

In [23]:
shootings[shootings.agency_county_1.isnull()].agency_name_1.tolist()

['JAL POLICE DEPT',
 'DART POLICE DEPT',
 'DRUG ENFORCEMENT ADMINISTRATION US DOJ',
 'TEXAS DEPT OF PUBLIC SAFETY CRIMINAL INVESTIGATIONS DIVISION',
 'CYPRESS-FAIRBANKS ISD POLICE DEPT']

### Other analysis revealed some typos with agency name. We'll demonstrate them here before correcting.

In [24]:
tmp = shootings.groupby(['incident_county', 'incident_city']).size().sort_values().unstack().T
tmax = tmp.max()
county_to_biggest_city = {}
for c in tmp.columns:
    x = tmp[c][tmp[c] == tmax[c]]
    county_to_biggest_city[c] = x.index[0]

In [25]:
TOP5 = list(shootings.incident_county.value_counts().head(5).index)
TOP5_CITIES = [county_to_biggest_city.get(c) for c in TOP5]
print(TOP5)
print(TOP5_CITIES)

['HARRIS', 'DALLAS', 'BEXAR', 'TARRANT', 'TRAVIS']
['HOUSTON', 'DALLAS', 'SAN ANTONIO', 'FORT WORTH', 'AUSTIN']


In [26]:
for county in TOP5:
    print("-- %s --" % county)
    print(shootings[shootings.incident_county == county]['agency_county_1'].value_counts())
    print()

-- HARRIS --
HARRIS        101
WALLER          1
MONTGOMERY      1
BRAZORIA        1
TEXAS           1
Name: agency_county_1, dtype: int64

-- DALLAS --
DALLAS     46
TEXAS       2
KAUFMAN     1
Name: agency_county_1, dtype: int64

-- BEXAR --
BEXAR     42
TEXAS      5
WILSON     1
Name: agency_county_1, dtype: int64

-- TARRANT --
TARRANT    36
DALLAS      3
PARKER      1
Name: agency_county_1, dtype: int64

-- TRAVIS --
TRAVIS    23
Name: agency_county_1, dtype: int64



### 'HARRIS' and 'HARRISON' counties are nowhere near each other, so it seems more likely that there is a typo or data error than that four incidents in HARRIS county involved officers from HARRISON county. Let's check them out.

In [27]:
shootings[(shootings.incident_county == 'HARRIS') & (shootings.agency_county_1 == 'HARRISON')]

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_county_7,...,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,num_officers_recorded,civilian_died,weapon_reported_by_media_category


### Yup. These are officers from Baytown and Spring, which border Houston (Harris, not Harrison county). Their emails are also from `hctx.net` which is Harris County. Their given agency names must be errors. 

In [28]:
city_county_corrections = [
    ['BAYTOWN', 'HARRISON', 'HARRIS'],
    ['SPRING', 'HARRISON', 'HARRIS'],
]
corrected = set()
for city, wrong_county, right_county in city_county_corrections:
    for i in range(1, 11):
        tmp = shootings[(shootings['agency_city_%d' % i] == city) & (shootings['agency_county_%d' % i] == wrong_county)]
        if len(tmp) == 0:
            break
        shootings.loc[tmp.index, 'agency_county_%d' % i] = right_county
        for idx, name in tmp['agency_name_%d' % i].items():
            shootings.loc[idx, 'agency_name_%d' % i] = name.replace(wrong_county, right_county)
            corrected.add(idx)

shootings.loc[corrected]

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_county_7,...,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,num_officers_recorded,civilian_died,weapon_reported_by_media_category


### While we're at it, are there any mistakes the other way? (Harris county officers showing up in Harrison county)
#### Answer: no.

In [29]:
shootings[(shootings.incident_county == 'HARRISON') & (shootings.agency_county_1 == 'HARRIS')]

Unnamed: 0,num_reports_filed,agency_report_date_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,agency_name_person_filling_out_1,agency_email_person_filling_out_1,agency_report_date_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,agency_report_date_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,agency_report_date_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,agency_name_person_filling_out_4,agency_email_person_filling_out_4,agency_report_date_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,agency_name_person_filling_out_5,agency_email_person_filling_out_5,agency_report_date_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,agency_name_person_filling_out_6,agency_email_person_filling_out_6,agency_report_date_7,agency_name_7,agency_city_7,agency_county_7,...,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,officer_gender_11,officer_age_11,officer_race_11,officer_on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,weapon_reported_by_media,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,num_officers_recorded,civilian_died,weapon_reported_by_media_category


### Flag records that are part of the same incident

In [30]:
incident_records = shootings.groupby(['incident_address', 'date_incident']).size().sort_values(ascending=False).to_dict()
num_records = []
for address, date in zip(shootings.incident_address, shootings.date_incident):
    if pd.notnull(address) and pd.notnull(date):
        num_records.append(incident_records[(address, date)])
    else:
        num_records.append(1)
# TODO(wetchler) -- add this step to the data cleaning
shootings['num_rows_about_this_incident'] = num_records

### Add a column for full name

In [31]:
s = shootings.civilian_name_first.fillna('') + ' ' + shootings.civilian_name_last.fillna('')
s = s.apply(lambda s: ' '.join(s.strip().split()))
s[s == ''] = np.nan
shootings = insert_col_after(shootings, s, 'civilian_name_full', 'civilian_name_last')

### Re-order columns more sensibly
* **Disclaimer**: This code is gross and probably an OCD attempt to make the final result look nice. Just destroy this cell if it proves too confusing or opaque.

In [32]:
section_order = ['date', 'incident', 'civilian', 'agency', 'news']
agency_fields = ['name', 'report_date', 'city', 'county', 'zip', 'name_person_filling_out', 'email_person_filling_out']
officer_fields = ['age', 'race', 'gender', 'on_duty', 'caused_injury']
new_col_order = []
remaining = set(shootings.columns)
for s in section_order:
    if s == 'agency':
        for i in range(1, 11):
            for field in officer_fields:
                c = 'officer_%s_%d' % (field, i)
                if c in remaining:
                    new_col_order.append(c)
                    remaining.remove(c)
            for field in agency_fields:
                c = 'agency_%s_%d' % (field, i)
                new_col_order.append(c)
                remaining.remove(c)
    else:
        this_section_cols = []
        for c in shootings.columns:
            if c in remaining and c.startswith(s + '_'):
                this_section_cols.append(c)
                remaining.remove(c)
        new_col_order.extend(this_section_cols)

new_col_order.extend(sorted(remaining))
shape_before = shootings.shape
shootings = shootings[new_col_order]
assert(shootings.shape == shape_before)

In [33]:
shootings.sort_values(['date_incident', 'incident_county', 'agency_county_1'], inplace=True)
shootings.head(100)

Unnamed: 0,date_ag_received,date_incident,incident_address,incident_city,incident_county,incident_zip,incident_result_of,incident_call_other,civilian_name_first,civilian_name_last,civilian_name_full,civilian_gender,civilian_age,civilian_race,civilian_died,officer_age_1,officer_race_1,officer_gender_1,agency_name_1,agency_report_date_1,agency_city_1,agency_county_1,agency_zip_1,agency_name_person_filling_out_1,agency_email_person_filling_out_1,officer_age_2,officer_race_2,officer_gender_2,officer_caused_injury_2,agency_name_2,agency_report_date_2,agency_city_2,agency_county_2,agency_zip_2,agency_name_person_filling_out_2,agency_email_person_filling_out_2,officer_age_3,officer_race_3,officer_gender_3,officer_caused_injury_3,agency_name_3,agency_report_date_3,agency_city_3,agency_county_3,agency_zip_3,agency_name_person_filling_out_3,agency_email_person_filling_out_3,officer_age_4,officer_race_4,officer_gender_4,...,agency_county_8,agency_zip_8,agency_name_person_filling_out_8,agency_email_person_filling_out_8,officer_age_9,officer_race_9,officer_gender_9,agency_name_9,agency_report_date_9,agency_city_9,agency_county_9,agency_zip_9,agency_name_person_filling_out_9,agency_email_person_filling_out_9,officer_age_10,officer_race_10,officer_gender_10,agency_name_10,agency_report_date_10,agency_city_10,agency_county_10,agency_zip_10,agency_name_person_filling_out_10,agency_email_person_filling_out_10,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,agency_city_11,agency_email_person_filling_out_11,agency_name_11,agency_name_person_filling_out_11,agency_report_date_11,agency_zip_11,cdr_narrative,custodial_death_report,deadly_weapon,lea_narrative_published,lea_narrative_shorter,multiple_officers_involved,num_officers_recorded,num_reports_filed,num_rows_about_this_incident,officer_age_11,officer_gender_11,officer_on_duty,officer_race_11,time_incident,weapon_reported_by_media,weapon_reported_by_media_category
0,NaT,2015-09-02,1010 MAGNOLIA STREET,FREEPORT,BRAZORIA,77541.0,TRAFFIC STOP,NARCOTIC STOP AND EVADING ARREST,RICKEY,MAYBERRY,RICKEY MAYBERRY,MALE,30.0,BLACK,False,27.0,HISPANIC,MALE,FREEPORT POLICE DEPT,2015-09-16,FREEPORT,BRAZORIA,77541,PAMELA MORRIS,PMORRIS@FREEPORT.TX.US,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,ABC 13,YOUR SOUTHEST TEXAS,,,,,,,NaT,,,False,False,,,True,1,1,1,,,True,,,,
1,NaT,2015-09-03,4840 E. PLANO PARKWAY,PLANO,COLLIN,75074.0,OTHER - SPECIFY TYPE OF CALL,ACCIDENTAL DISCHARGE RICOCHET DURING RANGE ACT...,,,,MALE,55.0,WHITE,False,30.0,HISPANIC,MALE,PLANO POLICE DEPT,2015-10-01,PLANO,COLLIN,75074,CURTIS HOWARD,CURTISH@PLANO.GOV,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,,,,,,,,,NaT,,,False,True,,,False,1,1,1,,,True,,,,(DETAILS MISSING)
2,NaT,2015-09-04,101 COUCH CT.,SPRINGTOWN,PARKER,76082.0,OTHER - SPECIFY TYPE OF CALL,INVESTIGATION OF CRIMINAL ACTIVITY,SULLY JOE,LANIER,SULLY JOE LANIER,MALE,36.0,WHITE,True,27.0,WHITE,MALE,PARKER CO SHERIFFS OFFICE,2015-10-06,WEATHERFORD,PARKER,76086,MEREDITH GRAY,MEREDITH.GRAY@PARKERCOUNTYTX.COM,56.0,WHITE,MALE,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,WFAA,DFW CBS LOCAL,STAR TELEGRAM,FOX 4 NEWS,,,,,NaT,,DECEDENT SHOT A RIFLE AT LE OFFICERS WHOM RETU...,True,True,,FIRED AT OFFICERS,True,2,1,1,,,True,,,FIREARM,FIREARM
3,NaT,2015-09-05,4926 CHENNAULT ROAD,HOUSTON,HARRIS,77033.0,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,,,,MALE,21.0,BLACK,False,28.0,WHITE,MALE,HOUSTON POLICE DEPT,2015-09-11,HOUSTON,HARRIS,77002,ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,CHRON,ABC 13,CLICK 2 HOUSTON,,,,,,NaT,,,False,True,AN OFFICER WAS DISPATCHED TO A WEAPONS DISTURB...,,True,1,1,1,,,True,,,FIREARM,FIREARM
4,NaT,2015-09-08,1500 RANGE ROAD,"DFW, AIRPORT",TARRANT,75261.0,OTHER - SPECIFY TYPE OF CALL,TRAINING EXERCISE - RICOCHET FRAGMENTS RESULTI...,,,,MALE,44.0,WHITE,False,38.0,HISPANIC,MALE,IRVING POLICE DEPT,2015-10-15,IRVING,DALLAS,75061,MICHAEL COLEMAN,MCOLEMAN@CITYOFIRVING.COM,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,,,,,,,,,NaT,,,False,False,,,False,1,1,1,,,True,,,,
5,NaT,2015-09-13,300 GALLEY WAY,FREEPORT,BRAZORIA,77541.0,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,JEFFREY,BROOKS,JEFFREY BROOKS,MALE,29.0,WHITE,True,42.0,WHITE,FEMALE,ALVIN POLICE DEPT,2015-10-01,ALVIN,BRAZORIA,77511,ROBERT E. LEE,RLEE@CITYOFALVIN.COM,41.0,HISPANIC,MALE,EXECUTION OF A WARRANT,PEARLAND POLICE DEPT,2015-10-05,PEARLAND,BRAZORIA,77584.0,J.C. DOYLE,CDOYLE@PEARLANDTX.GOV,51.0,HISPANIC,MALE,,GALVESTON CO SHERIFFS OFFICE,2015-10-13,GALVESTON,GALVESTON,77551.0,DARRELL ISAACKS,DARRELL.ISSACKS@CO.GALVESTON.TX.US,43.0,WHITE,MALE,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,CBS 19,TYLER PAPER,CHRON,FOX 10 TV,,,,,NaT,,"ON THURSDAY, 9/10/15, BROOKS COMMITTED AN AGGR...",True,True,,DROVE VEHICLE AT OFFICERS,True,4,4,1,,,True,,,FIREARM,FIREARM
6,NaT,2015-09-16,11513 MARRIOTT,BALCH SPRINGS,DALLAS,75180.0,OTHER - SPECIFY TYPE OF CALL,VICIOUS DOG,,,,FEMALE,,WHITE,False,42.0,HISPANIC,MALE,BALCH SPRINGS POLICE DEPT,2015-09-17,BALCH SPRINGS,DALLAS,75180,MARK MARET,MARET239@BALCHSPRINGSPD.COM,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,,,,,,,,,NaT,,,False,False,,,False,1,1,1,,,True,,,,
7,NaT,2015-09-18,5027 PECAN GROVE,SAN ANTONIO,BEXAR,78222.0,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,,,,MALE,22.0,BLACK,False,48.0,HISPANIC,MALE,BEXAR CO CONST PCT 4,2015-10-06,SAN ANTONIO,BEXAR,78222,FERNANDO MARTINEZ,FMARTINEZ@BEXAR.ORG,47.0,HISPANIC,MALE,,BEXAR CO CONST PCT 4,2015-10-06,SAN ANTONIO,BEXAR,78222.0,FERNANDO MARTINEZ,FMARTINEZ@BEXAR.ORG,38.0,HISPANIC,MALE,,BEXAR CO CONST PCT 4,2015-10-06,SAN ANTONIO,BEXAR,78222.0,FERNANDO MARTINEZ,FMARTINEZ@BEXAR.ORG,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,MY SAN ANTONIO,FOX SAN ANTONIO,NEWS 4 SAN ANTONIO,,,,,,NaT,,,False,True,,,True,3,3,1,,,True,,,FIREARM,FIREARM
9,NaT,2015-09-21,10800 STONE CANYON ROAD,DALLAS,DALLAS,75230.0,OTHER - SPECIFY TYPE OF CALL,OFFICER WORKING UNRELATED CALL WHEN THEY HEARD...,GERALDO,RAMIREZ,GERALDO RAMIREZ,MALE,24.0,HISPANIC,True,46.0,WHITE,MALE,DALLAS POLICE DEPT,2015-09-25,DALLAS,DALLAS,75215,E. MERRITT,E.MERRITT@DPD.DALLASCITYHALL.COM,32.0,WHITE,MALE,,DALLAS POLICE DEPT,2015-09-25,DALLAS,DALLAS,75215.0,E. MERRITT,E.MERRITT@DPD.DALLASCITYHALL.COM,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,DFW CBS LOCAL,NBC DFW,DALLAS NEWS,CHRON,,,,,NaT,,"ON SEPTEMBER 21, 2015, AT APPROXIMATELY 12:05 ...",True,True,"ON SUNDAY, SEPTEMBER 20, 2015, AT APPROXIMATEL...",FIRED AT OFFICERS,True,2,2,1,,,True,,,FIREARM,FIREARM
10,NaT,2015-09-21,9030 BETEL,EL PASO,EL PASO,79915.0,OTHER - SPECIFY TYPE OF CALL,CALL FOR SERVICE - BURGLARY OF VEHICLE IN PROG...,,,,MALE,21.0,HISPANIC,False,24.0,HISPANIC,FEMALE,EL PASO POLICE DEPT,2015-09-22,EL PASO,EL PASO,79905,DAVID CAMACHO,2689@ELPASOTEXAS.GOV,,,,,,NaT,,,,,,,,,,,NaT,,,,,,,,,...,,,,,,,,,NaT,,,,,,,,,,NaT,,,,,,KVIA,EL PASO TIMES,KFOX TV,,,,,,NaT,,,False,True,,,True,1,1,1,,,True,,,CAR,VEHICLE


### 3. Write

In [34]:
with dw.open_remote_file(DTW_OIS_PROJECT, CLEANED_FILENAME) as w:
    print("Writing to data.world: %s/%s" % (DTW_OIS_PROJECT, CLEANED_FILENAME))
    shootings.to_csv(w, index=False)

Writing to data.world: tji/officer-involved-shootings/shot_civilians.csv
