# Clean the Texas OIS dataset for analysis -- civilians shot data

### Latest run covers incidents from 2015-09-02 to 2018-04-16

* Inputs:
   * `OIS.xlsx` (currently local -- TODO add to data.world)
   * `texas_law_enforcement_agencies_and_counties.csv` (dtw - used to add county information)
* Output: `shot_civilians.csv`

##### Author: Everett Wetchler (everett.wetchler@gmail.com)

## 1. Setup and read data

In [1]:
DTW_PROJECT_KEY = 'tji/auxiliary-datasets'
AGENCY_COUNTY_DATAFRAME_NAME = 'texas_law_enforcement_agencies_and_counties'
# TODO(@wetchler): move OIS data to data.world
RAW_FILENAME = '../data/raw/OIS.xlsx'
CLEANED_FILENAME = '../data/clean/shot_civilians.csv'

In [2]:
import datadotworld as dw
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

%load_ext watermark
%watermark -a "Everett Wetchler" -d -t -z -r -g -w -p datadotworld,numpy,pandas

Everett Wetchler 2018-05-07 20:31:50 CDT

datadotworld 1.6.0
numpy 1.14.3
pandas 0.22.0
Git hash: e4fec23b644d31575a9c5facc6948ef54e21a178
Git repo: git@github.com:texas-justice-initiative/data-processing.git
watermark 1.6.0


In [3]:
from lib.standardize_police_agency_names import standardize_agency_name
from lib.cleaning_tools import *

In [4]:
datasets = dw.load_dataset(DTW_PROJECT_KEY, force_update=True)
agencies = datasets.dataframes[AGENCY_COUNTY_DATAFRAME_NAME]

In [5]:
shootings = pd.read_excel(RAW_FILENAME, sheetname='OISTable')
print("OIS civilians-shot incidents from %s to %s" % (
    shootings['Date of Incident'].min().strftime('%Y-%m-%d'),
    shootings['Date of Incident'].max().strftime('%Y-%m-%d')))
shootings.head()

  return func(*args, **kwargs)


OIS civilians-shot incidents from 2015-09-02 to 2018-04-16


Unnamed: 0,No.,Number of Reports Filed,Date of Report 1,Date AG Received,Name of Agency 1,City of Agency 1,Zip code of Agency 1,Date of Incident,Time of Incident,Name of Person 1 Filling out Form,Email Address of Person 1 Filling out Form,Date of Report 2,Name of Agency 2,City of Agency 2,Zip code of Agency 2,Name of Person 2 Filling out form,Email Address of Person 2 Filling out Form,Date of Report 3,Name of Agency 3,City of Agency 3,Zip code of Agency 3,Name of Person 3 Filling out form,Email Address of Person 3 Filling out Form,Date of Report 4,Name of Agency 4,City of Agency 4,Zip code of Agency 4,Name of Person 4 Filling out form,Email Address of Person 4 Filling out Form,Date of Report 5,Name of Agency 5,City of Agency 5,Zip code of Agency 5,Name of Person 5 Filling out form,Email Address of Person 5 Filling out Form,Date of Report 6,Name of Agency 6,City of Agency 6,Zip code of Agency 6,Name of Person 6 Filling out form,Email Address of Person 6 Filling out Form,Date of Report 7,Name of Agency 7,City of Agency 7,Zip code of Agency 7,Name of Person 7 Filling out form,Email Address of Person 7 Filling out Form,Date of Report 8,Name of Agency 8,City of Agency 8,...,Incident Resulted In,"Carried, Exhibited or Used Deadly Weapon",Peace Officer 1's Gender,Peace Officer 1's Age,Peace Officer 1's Race/Ethnicity,Peace Officer 2's Gender,Peace Officer 2's Age,Peace Officer 2's Race/Ethnicity,Peace Officer 3's Gender,Peace Officer 3's Age,Peace Officer 3's Race/Ethnicity,Peace Officer 4's Gender,Peace Officer 4's Age,Peace Officer 4's Race/Ethnicity,Peace Officer 5's Gender,Peace Officer 5's Age,Peace Officer 5's Race/Ethnicity,Peace Officer 6's Gender,Peace Officer 6's Age,Peace Officer 6's Race/Ethnicity,Peace Officer 7's Gender,Peace Officer 7's Age,Peace Officer 7's Race/Ethnicity,Peace Officer 8's Gender,Peace Officer 8's Age,Peace Officer 8's Race/Ethnicity,Peace Officer 9's Gender,Peace Officer 9's Age,Peace Officer 9's Race/Ethnicity,Peace Officer 10's Gender,Peace Officer 10's Age,Peace Officer 10's Race/Ethnicity,On Duty or Off Duty,Peace Officer Responding With 1 or More Officers,Incident Occurred During or as a Result of,Incident Occurred During or as a Result of 2,Incident Occurred During or as a Result of 3,"If Other, Specify Type of Call",Deadly Weapon Description,NEWS 1,NEWS 2,NEWS 3,NEWS 4,CDR?,CDR Narrative,Narrative Published by Law Enforcement,Column1,Column2,SHORTER,EXTRAS
0,1,1,9/16/2015,NaT,Freeport Police Department,Freeport,77541,2015-09-02,,Pamela Morris,pmorris@freeport.tx.us,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,Injury,No,Male,27,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Traffic stop,,,Narcotic Stop and Evading Arrest,,ABC 13,Your Southest Texas,,,,,,,,,
1,2,1,10/1/2015,NaT,Plano Police Department,Plano,75074,2015-09-03,,Curtis Howard,curtish@plano.gov,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,Injury,Yes,Male,30,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,No,Other - Specify type of call,,,Accidental discharge ricochet during range act...,,,,,,,,,,,,
2,3,1,10/6/2015,NaT,Parker County Sheriff's Office,Weatherford,76086,2015-09-04,,Meredith Gray,meredith.gray@parkercountytx.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,Death,Yes,Male,27,Anglo or White,Male,56.0,Anglo or White,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Other - Specify type of call,,,Investigation of criminal activity,Firearm,WFAA,DFW CBS Local,Star Telegram,Fox 4 News,YES,Decedent shot a rifle at LE Officers whom retu...,,,,fired at officers,
3,4,1,9/11/2015,NaT,Houston Police Department,Houston,77002,2015-09-05,,Odon Belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,Injury,Yes,Male,28,Anglo or White,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,Yes,Emergency Call or Request for Assistance,,,,Firearm,Chron,ABC 13,Click 2 Houston,,,,An officer was dispatched to a weapons disturb...,,,,
4,5,1,10/15/2015,NaT,Irving Police Department,Irving,75061,2015-09-08,,Michael Coleman,mcoleman@cityofirving.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,Injury,No,Male,38,Hispanic or Latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,On Duty,No,Other - Specify type of call,,,Training Exercise - ricochet fragments resulti...,,,,,,,,,,,,


## 2. Begin cleaning

In [6]:
# Remove whitespace from column names
shootings.columns = [c.strip() for c in shootings.columns]

In [7]:
# Drop irrelevant columns
shootings.drop(['No.', 'Column1', 'Column2'], axis=1, inplace=True)

In [8]:
# Get rid of any stray formatting on string values - remove whitespace, and lowercase
for c in shootings.columns:
    shootings[c] = shootings[c].apply(lambda s: s.strip().lower() if isinstance(s, str) else s)

In [9]:
# Make the column names more machine-friendly
col_renames = {
    'Number of Reports Filed': 'num_reports_filed',
    'Date AG Received': 'date_ag_received',
    'Date of Incident': 'date_incident',
    'Time of Incident': 'time_incident',
    "Injured or Deceased's First Name": 'civilian_name_first',
    "Injured or Deceased's Last Name": 'civilian_name_last',
    "Injured or Deceased's Gender": "civilian_gender",
    "Injured or Deceased's Age": "civilian_age",
    "Injured or Deceased's Race/Ethnicity": "civilian_race",
    "Street Address of Incident": "incident_address",
    "City of Incident": "incident_city",
    "County of Incident": "incident_county",
    "Zip Code of Incident": "incident_zip",
    "Incident Resulted In": "incident_resulted_in",
    "Carried, Exhibited or Used Deadly Weapon": "deadly_weapon",
    "On Duty or Off Duty": "on_duty",
    "Peace Officer Responding With 1 or More Officers": "multiple_officers_involved",
    "Incident Occurred During or as a Result of": "incident_result_of",
    "If Other, Specify Type of Call": "incident_call_other",
    "Deadly Weapon Description": "deadly_weapon_description",
    "CDR?": "custodial_death_report",
    "CDR Narrative": "cdr_narrative",
    "Narrative Published by Law Enforcement": "lea_narrative_published",
    "SHORTER": "lea_narrative_shorter",
}
colnames = list(shootings.columns)
newnames = []
for c in shootings.columns:
    if c in col_renames:
        newnames.append(col_renames[c])
    else:
        newnames.append(c)

shootings.columns = newnames
shootings.head()

Unnamed: 0,num_reports_filed,Date of Report 1,date_ag_received,Name of Agency 1,City of Agency 1,Zip code of Agency 1,date_incident,time_incident,Name of Person 1 Filling out Form,Email Address of Person 1 Filling out Form,Date of Report 2,Name of Agency 2,City of Agency 2,Zip code of Agency 2,Name of Person 2 Filling out form,Email Address of Person 2 Filling out Form,Date of Report 3,Name of Agency 3,City of Agency 3,Zip code of Agency 3,Name of Person 3 Filling out form,Email Address of Person 3 Filling out Form,Date of Report 4,Name of Agency 4,City of Agency 4,Zip code of Agency 4,Name of Person 4 Filling out form,Email Address of Person 4 Filling out Form,Date of Report 5,Name of Agency 5,City of Agency 5,Zip code of Agency 5,Name of Person 5 Filling out form,Email Address of Person 5 Filling out Form,Date of Report 6,Name of Agency 6,City of Agency 6,Zip code of Agency 6,Name of Person 6 Filling out form,Email Address of Person 6 Filling out Form,Date of Report 7,Name of Agency 7,City of Agency 7,Zip code of Agency 7,Name of Person 7 Filling out form,Email Address of Person 7 Filling out Form,Date of Report 8,Name of Agency 8,City of Agency 8,Zip code of Agency 8,...,Latitude of Incident,Longitude of Incident,incident_resulted_in,deadly_weapon,Peace Officer 1's Gender,Peace Officer 1's Age,Peace Officer 1's Race/Ethnicity,Peace Officer 2's Gender,Peace Officer 2's Age,Peace Officer 2's Race/Ethnicity,Peace Officer 3's Gender,Peace Officer 3's Age,Peace Officer 3's Race/Ethnicity,Peace Officer 4's Gender,Peace Officer 4's Age,Peace Officer 4's Race/Ethnicity,Peace Officer 5's Gender,Peace Officer 5's Age,Peace Officer 5's Race/Ethnicity,Peace Officer 6's Gender,Peace Officer 6's Age,Peace Officer 6's Race/Ethnicity,Peace Officer 7's Gender,Peace Officer 7's Age,Peace Officer 7's Race/Ethnicity,Peace Officer 8's Gender,Peace Officer 8's Age,Peace Officer 8's Race/Ethnicity,Peace Officer 9's Gender,Peace Officer 9's Age,Peace Officer 9's Race/Ethnicity,Peace Officer 10's Gender,Peace Officer 10's Age,Peace Officer 10's Race/Ethnicity,on_duty,multiple_officers_involved,incident_result_of,Incident Occurred During or as a Result of 2,Incident Occurred During or as a Result of 3,incident_call_other,deadly_weapon_description,NEWS 1,NEWS 2,NEWS 3,NEWS 4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,EXTRAS
0,1,9/16/2015,NaT,freeport police department,freeport,77541,2015-09-02,,pamela morris,pmorris@freeport.tx.us,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,28.944891,-95.356262,injury,no,male,27,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,traffic stop,,,narcotic stop and evading arrest,,abc 13,your southest texas,,,,,,,
1,1,10/1/2015,NaT,plano police department,plano,75074,2015-09-03,,curtis howard,curtish@plano.gov,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,33.008128,-96.642308,injury,yes,male,30,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,accidental discharge ricochet during range act...,,,,,,,,,,
2,1,10/6/2015,NaT,parker county sheriff's office,weatherford,76086,2015-09-04,,meredith gray,meredith.gray@parkercountytx.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.916724,-97.634193,death,yes,male,27,anglo or white,male,56.0,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,other - specify type of call,,,investigation of criminal activity,firearm,wfaa,dfw cbs local,star telegram,fox 4 news,yes,decedent shot a rifle at le officers whom retu...,,fired at officers,
3,1,9/11/2015,NaT,houston police department,houston,77002,2015-09-05,,odon belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,29.681655,-95.344966,injury,yes,male,28,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,emergency call or request for assistance,,,,firearm,chron,abc 13,click 2 houston,,,,an officer was dispatched to a weapons disturb...,,
4,1,10/15/2015,NaT,irving police department,irving,75061,2015-09-08,,michael coleman,mcoleman@cityofirving.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.899809,-97.040335,injury,no,male,38,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,training exercise - ricochet fragments resulti...,,,,,,,,,,


In [10]:
# Some columns have multiple copies for different individuals,
# e.g. agency_name_1 for the first officer's agency, then
# agency_name_2, for the second, etc. To avoid typing out
# all these numbers, we loop through such columns and rename
# them appropriately.
numerical_renames = {
    "Date of Report %d":"date_report_",
    "Name of Agency %d":"agency_name_",
    "City of Agency %d":"agency_city_",
    "Zip code of Agency %d":"agency_zip_",
    "Name of Person %d Filling out Form":"name_person_filling_out_",
    "Email Address of Person %d Filling out Form":"email_person_filling_out_",
    "Peace Officer %d's Gender":"officer_gender_",
    "Peace Officer %d's Age":"officer_age_",
    "Peace Officer %d's Race/Ethnicity":"officer_race_",
    "Incident Occurred During or as a Result of %d":"officer_caused_injury_",
    "NEWS %d": "news_coverage_",
}
renames = {}
for i in range(1, 11):
    for k, v in numerical_renames.items():
        k = (k % i).lower()
        v = v + str(i)
        renames[k] = v
shootings.columns = [c.lower().strip() for c in shootings.columns]
shootings.rename(columns=renames, inplace=True)
shootings.head()

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_zip_7,name_person_filling_out_7,email_person_filling_out_7,date_report_8,agency_name_8,agency_city_8,agency_zip_8,...,latitude of incident,longitude of incident,incident_resulted_in,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras
0,1,9/16/2015,NaT,freeport police department,freeport,77541,2015-09-02,,pamela morris,pmorris@freeport.tx.us,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,28.944891,-95.356262,injury,no,male,27,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,traffic stop,,,narcotic stop and evading arrest,,abc 13,your southest texas,,,,,,,
1,1,10/1/2015,NaT,plano police department,plano,75074,2015-09-03,,curtis howard,curtish@plano.gov,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,33.008128,-96.642308,injury,yes,male,30,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,accidental discharge ricochet during range act...,,,,,,,,,,
2,1,10/6/2015,NaT,parker county sheriff's office,weatherford,76086,2015-09-04,,meredith gray,meredith.gray@parkercountytx.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.916724,-97.634193,death,yes,male,27,anglo or white,male,56.0,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,other - specify type of call,,,investigation of criminal activity,firearm,wfaa,dfw cbs local,star telegram,fox 4 news,yes,decedent shot a rifle at le officers whom retu...,,fired at officers,
3,1,9/11/2015,NaT,houston police department,houston,77002,2015-09-05,,odon belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,29.681655,-95.344966,injury,yes,male,28,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,emergency call or request for assistance,,,,firearm,chron,abc 13,click 2 houston,,,,an officer was dispatched to a weapons disturb...,,
4,1,10/15/2015,NaT,irving police department,irving,75061,2015-09-08,,michael coleman,mcoleman@cityofirving.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.899809,-97.040335,injury,no,male,38,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,training exercise - ricochet fragments resulti...,,,,,,,,,,


In [11]:
shootings.head()

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_zip_7,name_person_filling_out_7,email_person_filling_out_7,date_report_8,agency_name_8,agency_city_8,agency_zip_8,...,latitude of incident,longitude of incident,incident_resulted_in,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras
0,1,9/16/2015,NaT,freeport police department,freeport,77541,2015-09-02,,pamela morris,pmorris@freeport.tx.us,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,28.944891,-95.356262,injury,no,male,27,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,traffic stop,,,narcotic stop and evading arrest,,abc 13,your southest texas,,,,,,,
1,1,10/1/2015,NaT,plano police department,plano,75074,2015-09-03,,curtis howard,curtish@plano.gov,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,33.008128,-96.642308,injury,yes,male,30,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,accidental discharge ricochet during range act...,,,,,,,,,,
2,1,10/6/2015,NaT,parker county sheriff's office,weatherford,76086,2015-09-04,,meredith gray,meredith.gray@parkercountytx.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.916724,-97.634193,death,yes,male,27,anglo or white,male,56.0,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,other - specify type of call,,,investigation of criminal activity,firearm,wfaa,dfw cbs local,star telegram,fox 4 news,yes,decedent shot a rifle at le officers whom retu...,,fired at officers,
3,1,9/11/2015,NaT,houston police department,houston,77002,2015-09-05,,odon belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,29.681655,-95.344966,injury,yes,male,28,anglo or white,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,yes,emergency call or request for assistance,,,,firearm,chron,abc 13,click 2 houston,,,,an officer was dispatched to a weapons disturb...,,
4,1,10/15/2015,NaT,irving police department,irving,75061,2015-09-08,,michael coleman,mcoleman@cityofirving.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,32.899809,-97.040335,injury,no,male,38,hispanic or latino,,,,,,,,,,,,,,,,,,,,,,,,,,,,on duty,no,other - specify type of call,,,training exercise - ricochet fragments resulti...,,,,,,,,,,


### Validate gender columns

In [12]:
validate_gender_cols(shootings)

### Make age columns numerical (float)

In [13]:
numericalize_age_cols(shootings)

### Simplify race names

In [14]:
standardize_race_cols(shootings)

### Create a column for the number of officers whose information was recorded here, `num_officers_recorded`

We use the presence of officer gender as an indicator. Not all incidents use a separate agency_name_X column per officer, so we can't use that.

In [15]:
officer_gender_cols = ['officer_gender_%d' % i for i in range(1, 11)]
shootings['num_officers_recorded'] = shootings[officer_gender_cols].notnull().sum(axis=1)
shootings[shootings['num_officers_recorded'] > 5].head()

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_zip_7,name_person_filling_out_7,email_person_filling_out_7,date_report_8,agency_name_8,agency_city_8,agency_zip_8,...,longitude of incident,incident_resulted_in,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded
62,8,1/5/2016,NaT,odessa police department,odessa,78761,2015-12-23,,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761.0,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761.0,david lara,dlara@odessa-tx.gov,1/5/2016,odessa police department,odessa,79761.0,...,-102.33585,injury,yes,male,26.0,HISPANIC,male,26.0,WHITE,male,40.0,WHITE,male,24.0,WHITE,male,28.0,HISPANIC,male,42.0,HISPANIC,male,27.0,HISPANIC,male,41.0,HISPANIC,,,,,,,on duty,yes,execution of a warrant,,,,firearm,my san antonio,news west 9,cbs 7,ksat,,,,,,8
157,10,2/23/2017,2017-02-23,dart police department,dallas,75203,2016-07-07,21:00:00,lt. sherri plunk #43,splunk@dart.org,2/23/2017,dart police department,dallas,75203,lt. sherri plunk #43,splunk@dart.org,2/23/2017,dart police department,dallas,75203,lt. sherri plunk #43,splunk@dart.org,2/23/2017,dart police department,dallas,75203,lt. sherri plunk #43,splunk@dart.org,2/7/2017,dallas police department,dallas,75215,sgt. e. merritt #8112,e.merritt@dpd.dallascityhall.com,2/7/2017,dallas police department,dallas,75215.0,sgt. e. merritt #8112,e.merritt@dpd.dallascityhall.com,2/7/2017,dallas police department,dallas,75215.0,sgt. e. merritt #8112,e.merritt@dpd.dallascityhall.com,2/7/2017,dallas police department,dallas,75215.0,...,-96.805209,death,yes,male,63.0,WHITE,male,44.0,WHITE,male,43.0,WHITE,male,37.0,BLACK,male,27.0,WHITE,male,31.0,HISPANIC,male,34.0,OTHER,male,39.0,WHITE,male,41.0,WHITE,male,46.0,WHITE,on duty,yes,other - specify type of call,,,ambush of officers by suspect. wounded on 7/7/...,,dpd press release,,,,yes,"on july 7, 2016, at approximately 8:57 p.m., d...",,fired at officers,,10
271,8,3/15/2017,2017-03-15,clute police department,clute,77531,2017-02-24,14:20:00,chief randy bratton,chief randy bratton,2/25/2017,lake jackson police department,lake jackson,77566,chief richard j. park,rpark@lakejacksonpd.net,2/25/2017,lake jackson police department,lake jackson,77566,chief richard j. park,rpark@lakejacksonpd.net,2/25/2017,lake jackson police department,lake jackson,77566,chief richard j. park,rpark@lakejacksonpd.net,3/15/2017,clute police department,clute,77531,chief randy bratton,rbratton@clutepd.com,2/25/2017,lake jackson police department,lake jackson,77566.0,chief richard j. park,rpark@lakejacksonpd.net,3/7/2017,freeport police department,freeport,77541.0,det. corey brinkman,cbrinkman@freeport.tx.us,3/7/2017,freeport police department,freeport,77541.0,...,-95.4353,death,yes,male,29.0,WHITE,male,32.0,WHITE,male,27.0,WHITE,male,26.0,WHITE,male,27.0,WHITE,female,36.0,WHITE,male,29.0,WHITE,male,35.0,BLACK,,,,,,,on duty,yes,traffic stop,,,execution of a warrant,handgun,the facts,kprc,,,yes,"in angleton, the suspect fled from officers in...",,pointed a gun at officers,,8
295,6,5/15/2017,2017-05-16,waco police department,waco,76708,2017-04-10,21:12:00,sgt. v.r. price jr.,jprice@wacotx.gov,5/15/2017,waco police department,waco,76708,sgt. v.r. price jr.,jprice@wacotx.gov,5/15/2017,waco police department,waco,76708,sgt. v.r. price jr.,jprice@wacotx.gov,5/15/2017,waco police department,waco,76708,sgt. v.r. price jr.,jprice@wacotx.gov,5/15/2017,waco police department,waco,76708,sgt. v.r. price jr.,jprice@wacotx.gov,5/15/2017,waco police department,waco,76708.0,sgt. v.r. price jr.,jprice@wacotx.gov,,,,,,,,,,,...,-97.170468,injury,yes,male,36.0,WHITE,male,26.0,WHITE,male,38.0,WHITE,male,50.0,WHITE,female,36.0,HISPANIC,male,27.0,WHITE,,,,,,,,,,,,,on duty,no,"hostage, barricade, or other emergency situation",,vehicle pursuit of armed robbery suspect which...,other - specify type of call,,,,,,,,,,,6
303,6,5/4/2017,2017-05-08,houston police department,houston,77002,2017-04-24,22:00:00,sgt. odon belmarez,odon.belmarez@houstonpolice.org,5/4/2017,houston police department,houston,77002,sgt. odon belmarez,odon.berlmarez@houstonpolice.org,5/4/2017,houston police department,houston,77002,sgt. odon belmarez,odon.belmarez@houstonpolice.org,4/24/2017,texas department of public safety,houston,77065,daron parker,daron.parker@dps.texas.gov,4/24/2017,texas department of public safety,houston,77065,daron parker,daron.parker@dps.texas.gov,5/4/2017,houston police department,houston,77002.0,sgt. odon belmarez,odon.belmarez@houstonpolice.org,,,,,,,,,,,...,-95.644635,death,yes,male,57.0,WHITE,male,49.0,WHITE,male,31.0,WHITE,male,35.0,HISPANIC,male,29.0,WHITE,male,49.0,WHITE,,,,,,,,,,,,,on duty,yes,other - specify type of call,,,robbery sting,,houston chronicle,khou,,,yes,the decedent and two accomplices armed with we...,,,,6


In [16]:
shootings['custodial_death_report'] = shootings['custodial_death_report'].fillna('n').apply(lambda c: c.strip().lower()[0] == 'y')
shootings['multiple_officers_involved'] = shootings['multiple_officers_involved'].fillna('n').apply(lambda c: c.strip().lower()[0] == 'y')

shootings['civilian_died'] = shootings['incident_resulted_in'].apply(lambda x: x.strip().lower()) == 'death'
shootings.drop('incident_resulted_in', axis=1, inplace=True)
shootings['incident_result_of'] = shootings['incident_result_of'].apply(lambda x: x.strip().lower())

shootings['on_duty'] = shootings['on_duty'].apply(lambda x: x if pd.isnull(x) else (x.strip().lower() == 'on duty'))
shootings['deadly_weapon_description'] = shootings['deadly_weapon_description'].apply(lambda w: w if pd.isnull(w) else w.strip().lower())

### NOTE: Data quirk

It's unclear how many officers were actually at the scene
* The `multiple_officer_involved` column is a yes/no column, but there are also columns to list the agency, gender, etc for each officer. These do not always agree. Sometimes `multiple_officer_involved` is yes, but only one officer's details are recorded. And sometimes we have details for many officers, but `multiple_officer_involved` is no. See below.
* The punchline is to interpret these columns with caution

In [17]:
pd.crosstab(shootings.multiple_officers_involved, shootings.num_officers_recorded)

num_officers_recorded,1,2,3,4,5,6,7,8,10
multiple_officers_involved,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
False,77,2,2,0,1,1,0,0,0
True,234,75,28,17,9,1,4,2,1


### Handle weapons-related questions

In [18]:
# Convert yes/no to boolean
shootings['deadly_weapon'] = shootings['deadly_weapon'].apply(lambda x: x if pd.isnull(x) else (x.strip().lower() == 'yes'))
shootings['deadly_weapon'].value_counts()

True     377
False     77
Name: deadly_weapon, dtype: int64

In [19]:
# Check for mistakes. Sometimes the "was there a deadly weapon?" question
# is answered with "No" while a description of a deadly weapon is given.
pd.crosstab(shootings['deadly_weapon'], shootings['deadly_weapon_description'].notnull())

deadly_weapon_description,False,True
deadly_weapon,Unnamed: 1_level_1,Unnamed: 2_level_1
False,62,15
True,58,319


In [20]:
# Let's look at these instances to be sure
s = shootings[~shootings['deadly_weapon'] & pd.notnull(shootings['deadly_weapon_description'])][['deadly_weapon', 'deadly_weapon_description']]
print(len(s))
s

15


Unnamed: 0,deadly_weapon,deadly_weapon_description
40,False,vehicle
56,False,firearm
162,False,bb gun
227,False,vehicle
252,False,vehicle
268,False,vehicle
286,False,vehicle
300,False,vehicle
321,False,took officer's taser
323,False,fell while getting out of car?


In [21]:
# It seems clear that we should correct the 'deadly_weapon' category
# in these cases.
#
# For cases where the user said there WAS a deadly weapon,
# but did not give a weapon description, we'll assume there really
# was a weapon but use a special category (see cells below).
shootings['deadly_weapon'] = (shootings['deadly_weapon'] | pd.notnull(shootings['deadly_weapon_description']))
shootings['deadly_weapon'].value_counts()

True     392
False     62
Name: deadly_weapon, dtype: int64

In [22]:
# What weapons to people use? Can we categorize them?
sorted(list(x for x in set(shootings.deadly_weapon_description) if pd.notnull(x)))

['"agent was assaulted"',
 'air soft gun',
 'armed',
 'arms (choking)',
 'assault rifle',
 'axe',
 'baseball bat',
 'baseball bat and fireplace poker',
 'bb gun',
 'body',
 'bomb',
 'box cutter',
 'butcher knife',
 'car',
 'club, bat',
 "deputy's gun",
 'fell while getting out of car?',
 'firearm',
 'glock 40',
 'gun',
 'handgun',
 'hatchet',
 'imitation weapon',
 'knife',
 'knife - not opened',
 'knife, gun',
 'knives',
 'long gun',
 'machete',
 'metal flashlight',
 'pellet gun',
 'pencil',
 'pickaxe',
 'pistol',
 'reports say unarmed',
 'revolver',
 'rifle',
 'rock',
 'sawed-off shotgun',
 'scissors, screwdriver',
 'semi-automatic rifle',
 'sharp metal object (piece of headphones)',
 'shotgun',
 'sword',
 "took officer's knife",
 "took officer's taser",
 'truck',
 'vehicle',
 'weapon',
 'weed-cutter']

In [23]:
# Manual categorization of weapons
weapon_types = {
    'FIREARM': ['handgun', 'sawed-off shotgun', 'revolver', 'rifle',
                'assault rifle', 'firearm', 'shotgun', 'long gun', 'gun',
                'glock 40', 'pistol', 'knife, gun', "deputy's gun"],
    'KNIFE/CUTTING': ['hatchet', 'butcher knife', 'knife', 'knives', 'box cutter',
                'knife - not opened', 'machete', 'sword', 'axe'],
    'VEHICLE': ['car', 'truck', 'vehicle'],
}
type_lookup = {}
for k, v in weapon_types.items():
    for w in v:
        if w in type_lookup:
            print("DUPLICATE:", k, w)
        type_lookup[w] = k

weapons = []
for has_weapon, desc in zip(shootings['deadly_weapon'], shootings['deadly_weapon_description']):
    if pd.isnull(desc) or not desc:
        if has_weapon:
            weapons.append('(DETAILS MISSING)')
        else:
            weapons.append(np.nan)
        continue
    weapons.append(type_lookup.get(desc, 'OTHER'))

shootings['deadly_weapon_category'] = weapons    
shootings['deadly_weapon_category'].value_counts()

FIREARM              221
(DETAILS MISSING)     58
KNIFE/CUTTING         46
OTHER                 34
VEHICLE               33
Name: deadly_weapon_category, dtype: int64

In [24]:
sorted(list(shootings['deadly_weapon_description'][shootings['deadly_weapon_category'] == 'OTHER']))

['"agent was assaulted"',
 'air soft gun',
 'armed',
 'arms (choking)',
 'baseball bat',
 'baseball bat and fireplace poker',
 'bb gun',
 'bb gun',
 'bb gun',
 'bb gun',
 'body',
 'body',
 'bomb',
 'club, bat',
 'fell while getting out of car?',
 'imitation weapon',
 'metal flashlight',
 'pellet gun',
 'pellet gun',
 'pencil',
 'pickaxe',
 'reports say unarmed',
 'rock',
 'scissors, screwdriver',
 'semi-automatic rifle',
 'semi-automatic rifle',
 'sharp metal object (piece of headphones)',
 "took officer's knife",
 "took officer's taser",
 "took officer's taser",
 'weapon',
 'weapon',
 'weapon',
 'weed-cutter']

### Uppercase the content of all columns

In [25]:
print('Uppercasing columns: ', end='')
for col, dt in shootings.dtypes.items():
    if dt == 'object':
        print(col, end=' ')
        shootings[col] = shootings[col].apply(lambda s: s if type(s) != str else s.upper())

Uppercasing columns: date_report_1 agency_name_1 agency_city_1 agency_zip_1 time_incident name_person_filling_out_1 email_person_filling_out_1 date_report_2 agency_name_2 agency_city_2 agency_zip_2 name_person_filling_out_2 email_person_filling_out_2 date_report_3 agency_name_3 agency_city_3 agency_zip_3 name_person_filling_out_3 email_person_filling_out_3 date_report_4 agency_name_4 agency_city_4 agency_zip_4 name_person_filling_out_4 email_person_filling_out_4 date_report_5 agency_name_5 agency_city_5 agency_zip_5 name_person_filling_out_5 email_person_filling_out_5 date_report_6 agency_name_6 agency_city_6 name_person_filling_out_6 email_person_filling_out_6 date_report_7 agency_name_7 agency_city_7 name_person_filling_out_7 email_person_filling_out_7 date_report_8 agency_name_8 agency_city_8 name_person_filling_out_8 email_person_filling_out_8 date_report_9 agency_name_9 agency_city_9 name_person_filling_out_9 email_person_filling_out_9 date_report_10 agency_name_10 agency_city_10 

### Standardize police agency names

In [26]:
for i in range(1, 11):
    shootings['agency_name_%d' % i] = shootings['agency_name_%d' % i].apply(standardize_agency_name)

shootings.head()

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_zip_7,name_person_filling_out_7,email_person_filling_out_7,date_report_8,agency_name_8,agency_city_8,agency_zip_8,...,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded,civilian_died,deadly_weapon_category
0,1,9/16/2015,NaT,FREEPORT POLICE DEPT,FREEPORT,77541,2015-09-02,,PAMELA MORRIS,PMORRIS@FREEPORT.TX.US,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,False,MALE,27.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,True,TRAFFIC STOP,,,NARCOTIC STOP AND EVADING ARREST,,ABC 13,YOUR SOUTHEST TEXAS,,,False,,,,,1,False,
1,1,10/1/2015,NaT,PLANO POLICE DEPT,PLANO,75074,2015-09-03,,CURTIS HOWARD,CURTISH@PLANO.GOV,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,True,MALE,30.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,False,OTHER - SPECIFY TYPE OF CALL,,,ACCIDENTAL DISCHARGE RICOCHET DURING RANGE ACT...,,,,,,False,,,,,1,False,(DETAILS MISSING)
2,1,10/6/2015,NaT,PARKER CO SHERIFFS OFFICE,WEATHERFORD,76086,2015-09-04,,MEREDITH GRAY,MEREDITH.GRAY@PARKERCOUNTYTX.COM,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,True,MALE,27.0,WHITE,MALE,56.0,WHITE,,,,,,,,,,,,,,,,,,,,,,,,,True,True,OTHER - SPECIFY TYPE OF CALL,,,INVESTIGATION OF CRIMINAL ACTIVITY,FIREARM,WFAA,DFW CBS LOCAL,STAR TELEGRAM,FOX 4 NEWS,True,DECEDENT SHOT A RIFLE AT LE OFFICERS WHOM RETU...,,FIRED AT OFFICERS,,2,True,FIREARM
3,1,9/11/2015,NaT,HOUSTON POLICE DEPT,HOUSTON,77002,2015-09-05,,ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,True,MALE,28.0,WHITE,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,True,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,,,FIREARM,CHRON,ABC 13,CLICK 2 HOUSTON,,False,,AN OFFICER WAS DISPATCHED TO A WEAPONS DISTURB...,,,1,False,FIREARM
4,1,10/15/2015,NaT,IRVING POLICE DEPT,IRVING,75061,2015-09-08,,MICHAEL COLEMAN,MCOLEMAN@CITYOFIRVING.COM,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,False,MALE,38.0,HISPANIC,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,False,OTHER - SPECIFY TYPE OF CALL,,,TRAINING EXERCISE - RICOCHET FRAGMENTS RESULTI...,,,,,,False,,,,,1,False,


### Add county information

In [27]:
dept_to_county = dict(zip(agencies.agency, agencies.county))
for i in range(1, 11):
    shootings = insert_col_after(
        shootings,
        shootings['agency_name_%d' % i].apply(lambda d: dept_to_county.get(d, np.nan)),
        'agency_county_%d' % i,
        'agency_city_%d' % i)

shootings.agency_county_1.isnull().value_counts()

False    450
True       4
Name: agency_county_1, dtype: int64

In [28]:
shootings[shootings.agency_county_1.isnull()].agency_name_1.tolist()

['JAL POLICE DEPT',
 'DART POLICE DEPT',
 'DRUG ENFORCEMENT ADMINISTRATION US DOJ',
 'TEXAS DEPT OF PUBLIC SAFETY CRIMINAL INVESTIGATIONS DIVISION']

### Other analysis revealed some typos with agency name. We'll demonstrate them here before correcting.

In [29]:
tmp = shootings.groupby(['incident_county', 'incident_city']).size().sort_values().unstack().T
tmax = tmp.max()
county_to_biggest_city = {}
for c in tmp.columns:
    x = tmp[c][tmp[c] == tmax[c]]
    county_to_biggest_city[c] = x.index[0]

In [30]:
TOP5 = list(shootings.incident_county.value_counts().head(5).index)
TOP5_CITIES = [county_to_biggest_city.get(c) for c in TOP5]
print(TOP5)
print(TOP5_CITIES)

['HARRIS', 'BEXAR', 'DALLAS', 'TARRANT', 'TRAVIS']
['HOUSTON', 'SAN ANTONIO', 'DALLAS', 'FORT WORTH', 'AUSTIN']


In [31]:
for county in TOP5:
    print("-- %s --" % county)
    print(shootings[shootings.incident_county == county]['agency_county_1'].value_counts())
    print()

-- HARRIS --
HARRIS        90
MONTGOMERY     1
WALLER         1
BRAZORIA       1
STATE          1
Name: agency_county_1, dtype: int64

-- BEXAR --
BEXAR     36
STATE      5
WILSON     1
Name: agency_county_1, dtype: int64

-- DALLAS --
DALLAS     37
KAUFMAN     1
STATE       1
Name: agency_county_1, dtype: int64

-- TARRANT --
TARRANT    34
PARKER      1
DALLAS      1
Name: agency_county_1, dtype: int64

-- TRAVIS --
TRAVIS    21
Name: agency_county_1, dtype: int64



### 'HARRIS' and 'HARRISON' counties are nowhere near each other, so it seems more likely that there is a typo or data error than that four incidents in HARRIS county involved officers from HARRISON county. Let's check them out.

In [32]:
shootings[(shootings.incident_county == 'HARRIS') & (shootings.agency_county_1 == 'HARRISON')]

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_county_7,...,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded,civilian_died,deadly_weapon_category


### Yup. These are officers from Baytown and Spring, which border Houston (Harris, not Harrison county). Their emails are also from `hctx.net` which is Harris County. Their given agency names must be errors. 

In [33]:
city_county_corrections = [
    ['BAYTOWN', 'HARRISON', 'HARRIS'],
    ['SPRING', 'HARRISON', 'HARRIS'],
]
corrected = set()
for city, wrong_county, right_county in city_county_corrections:
    for i in range(1, 11):
        tmp = shootings[(shootings['agency_city_%d' % i] == city) & (shootings['agency_county_%d' % i] == wrong_county)]
        if len(tmp) == 0:
            break
        shootings.loc[tmp.index, 'agency_county_%d' % i] = right_county
        for idx, name in tmp['agency_name_%d' % i].items():
            shootings.loc[idx, 'agency_name_%d' % i] = name.replace(wrong_county, right_county)
            corrected.add(idx)

shootings.loc[corrected]

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_county_7,...,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded,civilian_died,deadly_weapon_category


### While we're at it, are there any mistakes the other way? (Harris county officers showing up in Harrison county)
#### Answer: no.

In [34]:
shootings[(shootings.incident_county == 'HARRISON') & (shootings.agency_county_1 == 'HARRIS')]

Unnamed: 0,num_reports_filed,date_report_1,date_ag_received,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,date_incident,time_incident,name_person_filling_out_1,email_person_filling_out_1,date_report_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,date_report_3,agency_name_3,agency_city_3,agency_county_3,agency_zip_3,name_person_filling_out_3,email_person_filling_out_3,date_report_4,agency_name_4,agency_city_4,agency_county_4,agency_zip_4,name_person_filling_out_4,email_person_filling_out_4,date_report_5,agency_name_5,agency_city_5,agency_county_5,agency_zip_5,name_person_filling_out_5,email_person_filling_out_5,date_report_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,date_report_7,agency_name_7,agency_city_7,agency_county_7,...,deadly_weapon,officer_gender_1,officer_age_1,officer_race_1,officer_gender_2,officer_age_2,officer_race_2,officer_gender_3,officer_age_3,officer_race_3,officer_gender_4,officer_age_4,officer_race_4,officer_gender_5,officer_age_5,officer_race_5,officer_gender_6,officer_age_6,officer_race_6,officer_gender_7,officer_age_7,officer_race_7,officer_gender_8,officer_age_8,officer_race_8,officer_gender_9,officer_age_9,officer_race_9,officer_gender_10,officer_age_10,officer_race_10,on_duty,multiple_officers_involved,incident_result_of,officer_caused_injury_2,officer_caused_injury_3,incident_call_other,deadly_weapon_description,news_coverage_1,news_coverage_2,news_coverage_3,news_coverage_4,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded,civilian_died,deadly_weapon_category


### Neatly order columns and save the cleaned file.

In [35]:
order = []
numbered = []
for c in shootings.columns:
    if c[-1].isdigit():
        numbered.append(c)
    else:
        order.append(c)

order = order + sorted(numbered, key=lambda c: int(c.split('_')[-1]))
shootings = shootings[order]
shootings.head()

Unnamed: 0,num_reports_filed,date_ag_received,date_incident,time_incident,civilian_name_first,civilian_name_last,civilian_gender,civilian_age,civilian_race,incident_address,incident_city,incident_county,incident_zip,latitude of incident,longitude of incident,deadly_weapon,on_duty,multiple_officers_involved,incident_result_of,incident_call_other,deadly_weapon_description,custodial_death_report,cdr_narrative,lea_narrative_published,lea_narrative_shorter,extras,num_officers_recorded,civilian_died,deadly_weapon_category,date_report_1,agency_name_1,agency_city_1,agency_county_1,agency_zip_1,name_person_filling_out_1,email_person_filling_out_1,officer_gender_1,officer_age_1,officer_race_1,news_coverage_1,date_report_2,agency_name_2,agency_city_2,agency_county_2,agency_zip_2,name_person_filling_out_2,email_person_filling_out_2,officer_gender_2,officer_age_2,officer_race_2,...,date_report_6,agency_name_6,agency_city_6,agency_county_6,agency_zip_6,name_person_filling_out_6,email_person_filling_out_6,officer_gender_6,officer_age_6,officer_race_6,date_report_7,agency_name_7,agency_city_7,agency_county_7,agency_zip_7,name_person_filling_out_7,email_person_filling_out_7,officer_gender_7,officer_age_7,officer_race_7,date_report_8,agency_name_8,agency_city_8,agency_county_8,agency_zip_8,name_person_filling_out_8,email_person_filling_out_8,officer_gender_8,officer_age_8,officer_race_8,date_report_9,agency_name_9,agency_city_9,agency_county_9,agency_zip_9,name_person_filling_out_9,email_person_filling_out_9,officer_gender_9,officer_age_9,officer_race_9,date_report_10,agency_name_10,agency_city_10,agency_county_10,agency_zip_10,name_person_filling_out_10,email_person_filling_out_10,officer_gender_10,officer_age_10,officer_race_10
0,1,NaT,2015-09-02,,RICKEY,MAYBERRY,MALE,30.0,BLACK,1010 MAGNOLIA STREET,FREEPORT,BRAZORIA,77541.0,28.944891,-95.356262,False,True,True,TRAFFIC STOP,NARCOTIC STOP AND EVADING ARREST,,False,,,,,1,False,,9/16/2015,FREEPORT POLICE DEPT,FREEPORT,BRAZORIA,77541,PAMELA MORRIS,PMORRIS@FREEPORT.TX.US,MALE,27.0,HISPANIC,ABC 13,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,1,NaT,2015-09-03,,,,MALE,55.0,WHITE,4840 E. PLANO PARKWAY,PLANO,COLLIN,75074.0,33.008128,-96.642308,True,True,False,OTHER - SPECIFY TYPE OF CALL,ACCIDENTAL DISCHARGE RICOCHET DURING RANGE ACT...,,False,,,,,1,False,(DETAILS MISSING),10/1/2015,PLANO POLICE DEPT,PLANO,COLLIN,75074,CURTIS HOWARD,CURTISH@PLANO.GOV,MALE,30.0,HISPANIC,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,1,NaT,2015-09-04,,SULLY JOE,LANIER,MALE,36.0,WHITE,101 COUCH CT.,SPRINGTOWN,PARKER,76082.0,32.916724,-97.634193,True,True,True,OTHER - SPECIFY TYPE OF CALL,INVESTIGATION OF CRIMINAL ACTIVITY,FIREARM,True,DECEDENT SHOT A RIFLE AT LE OFFICERS WHOM RETU...,,FIRED AT OFFICERS,,2,True,FIREARM,10/6/2015,PARKER CO SHERIFFS OFFICE,WEATHERFORD,PARKER,76086,MEREDITH GRAY,MEREDITH.GRAY@PARKERCOUNTYTX.COM,MALE,27.0,WHITE,WFAA,,,,,,,,MALE,56.0,WHITE,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,1,NaT,2015-09-05,,,,MALE,21.0,BLACK,4926 CHENNAULT ROAD,HOUSTON,HARRIS,77033.0,29.681655,-95.344966,True,True,True,EMERGENCY CALL OR REQUEST FOR ASSISTANCE,,FIREARM,False,,AN OFFICER WAS DISPATCHED TO A WEAPONS DISTURB...,,,1,False,FIREARM,9/11/2015,HOUSTON POLICE DEPT,HOUSTON,HARRIS,77002,ODON BELMAREZ,ODON.BELMAREZ@HOUSTONPOLICE.ORG,MALE,28.0,WHITE,CHRON,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,1,NaT,2015-09-08,,,,MALE,44.0,WHITE,1500 RANGE ROAD,"DFW, AIRPORT",TARRANT,75261.0,32.899809,-97.040335,False,True,False,OTHER - SPECIFY TYPE OF CALL,TRAINING EXERCISE - RICOCHET FRAGMENTS RESULTI...,,False,,,,,1,False,,10/15/2015,IRVING POLICE DEPT,IRVING,DALLAS,75061,MICHAEL COLEMAN,MCOLEMAN@CITYOFIRVING.COM,MALE,38.0,HISPANIC,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### 3. Write

In [36]:
shootings.to_csv(CLEANED_FILENAME, index=False)
print('Done')

Done
