# Clean and reformat CDR data from its multi-tab excel file into a single csv

### About the data

CDR data is tricky -- the form used by law enforcement has changed over time, first in 2005, then again in 2016. The data before 2005 is known to be be sparse and poorly enforced, so we ignore those entries. The 2005 and 2016 versions of the form have some overlap and some differences, so we must be careful in how we merge them.

**In this repo you can find blank versions of the [2005](https://github.com/texas-justice-initiative/data-processing/blob/master/forms/CDR%20Form%20Version%202005.pdf) and [2016](https://github.com/texas-justice-initiative/data-processing/blob/master/forms/CDR%20Form%20Version%202016.pdf) forms, to see for yourself exactly what fields are collected and how.**

### Datasets used


* Input:
  * `tji/tx-deaths-in-custody-2005-2015/CDR - All Reports.xlsx`
  * `tji/auxiliary-datasets/agencies_and_counties`
* Output:
  * `tji/tx-deaths-in-custody-2005-2015/cleaned_custodial_death_reports.csv`
  
##### Author: Everett Wetchler (everett.wetchler@gmail.com)

## I. Setup and read data

In [1]:
DTW_PROJECT_KEY_CDR = 'tji/tx-deaths-in-custody-2005-2015'
RAW_FILENAME = 'original/CDR Reports All.xlsx'
SHEETNAMES = ['Form Version 2005', 'Form Version 2016']
CLEANED_FILENAME = 'cleaned_custodial_death_reports.csv'

In [2]:
import datadotworld as dw
import json
import numpy as np
import pandas as pd

from lib import cleaning_tools

pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

%load_ext watermark
%watermark -a "Everett Wetchler" -d -t -z -w -p pandas,datadotworld

Everett Wetchler 2018-05-21 11:03:44 CDT

pandas 0.22.0
datadotworld 1.6.0
watermark 1.6.0


In [3]:
from lib.standardize_police_agency_names import standardize_agency_name

In [4]:
datasets = dw.load_dataset('tji/auxiliary-datasets', force_update=True)
agency_county = datasets.dataframes['agencies_and_counties']
agency_county = agency_county.set_index('agency')['county'].to_dict()

In [5]:
df0, df1 = [cleaning_tools.read_dtw_excel(DTW_PROJECT_KEY_CDR, RAW_FILENAME, sheet_name=name) for name in SHEETNAMES]
df0['form_version'] = 'V_2005'
df1['form_version'] = 'V_2016'

Writing excel file to temp file: /var/folders/dc/8cbxbsh515s908xl0zyprszm0000gn/T/tmp7ngp255k
Writing excel file to temp file: /var/folders/dc/8cbxbsh515s908xl0zyprszm0000gn/T/tmp2rfcywwt


### A quick look at the raw data

In [6]:
df0.head()

Unnamed: 0,CDR: CDR Name,Report Date,Status,Version Type,Version Number,Department Type,Agency Name,Agency Phone,Agency Address,Agency City,Agency County,Agency State,Agency Zip,Director Salutation,Director First Name,Director Middle Name,Director Last Name,Reporter Name Original CDR,Reporter Email,Street Address,City,County,Longitude,Latitude,Census Tract,Type of Custody,Specific Type of Custody/Facility,Custody Type Facility,Entry Date Time,Entry Date Time N/A,Death Location,Death Location Elsewhere,First Name,Middle Name,Last Name,Suffix,Date of Birth,Sex,Ethnicity,Ethnicity Other,Death Date and Time,Age At Time Of Death,Date/Time of Custody or Incident,Custody Date NA,Medical Examinor/Coroner Evalution?,Manner of Death,Manner of Death Description,Death Reason,Medical Cause of Death,Medical Treatment,Medical Treatment Description,Who caused the death?,Death Causer Other,Type of Death Weapon,Death Weapon Other Description,Pre existing medical condition?,Means of Death,Means of Death Other,Offense 1,Offense 2,Offense 3,Were the Charges:,Type of Offense,"Type of Offense, Other",Injured By,Threaten the officer(s) involved,Resist being handcuffed or arrested?,Try to escape/flee from custody,"Grab, hit or fight with the officer(s)",Other Behavior,Specify Other Behavior,Appear intoxicated (alcohol or drugs),Use weapon threaten/assault officer(s),Entry Behavior,Under Restraint,Type of Restraint,"Other device, specify",form_version
0,PA05001C,2005-03-02 14:48:00,Submitted,ORIGINAL VERSION,1,POLICE,Fort Worth Police Dept.,8178778022,350 W. Belknap,Fort Worth,TARRANT,TX,76102,Chief,Ralph,,Mendoza,Renee Gray,renee.gray@oag.state.tx.us,1509 W. Hammond,Fort Worth,Tarrant,,,,Police Custody (pre-booking),Custody of Peace Officer during/fleeing arrest,,NaT,1,At the crime/arrest scene,,Dino,,Gomez,,1964-04-03,Male,Hispanic,,2005-01-01 15:04:00,40,2005-01-01 15:04:00,0,"Yes, results are available",Justifiable Homicide,,Injuries only,Gunshot Wound to the Chest,Not Applicable,,Law enforcement/correctional staff,,Handgun,,Not Applicable; cause of death was accidental ...,Firearm,,Aggravated Assault,,,Not filed at time of death,,,Injured by Officer,Yes,Yes,Yes,No,0,,No,1,,No,,,V_2005
1,PA05002CJ,2005-03-03 14:07:00,Submitted,ORIGINAL VERSION,1,SHERIFF,Travis County Sheriff's Dept.,5128549770,P. O. Box 1748,Austin,TRAVIS,TX,78767,Sheriff,Margo,L.,Frasier,Renee Gray,renee.gray@oag.state.tx.us,3614 Bill Price Road,Del Valle,Travis,,,,County Jail,Jail - multiple occupancy cell,,2004-10-20 20:56:00,0,At medical facility,,Michael,Darnell,Dickson,,1953-03-04,Male,African-American,,2005-01-01 15:59:00,51,2004-10-20 20:56:00,0,"Yes, results are available",Natural Causes/Illness,Cardiac Arrest,Medical condition only (e.g. heart attack),Myocardial infarction,Yes,"Treatment for hypertension, administered Proca...","Not applicable; cause of death was suicide, in...",,Not Applicable,,Pre-existing medical condition,Not applicable; cause of death was intoxicatio...,,Man/Del/Sell/Poss Controlled Substance,,,Filed,,,Injured by NA,No,No,No,No,0,,No,0,Medical,No,,,V_2005
2,PA05003P,2005-03-03 14:16:00,Submitted,ORIGINAL VERSION,1,STAGENCY,Texas Department Of Criminal Justice,9364376716,P.O. Box 4003,Huntsville,TRAVIS,TX,773424003,Mr.,Chris,H.,Stallings,Renee Gray,renee.gray@oag.state.tx.us,21 FM 247,Huntsville,Walker,,,,Penitentiary,TDCJ,Byrd,2004-12-17 09:00:00,0,At medical facility,,Darryl,Glenn,Wallace,,1960-12-03,Male,African-American,,2005-01-02 04:35:00,44,2004-12-17 09:00:00,0,"Yes, results are available",Natural Causes/Illness,Cardiac,Medical condition only (e.g. heart attack),Cardiac death secondary to cardiac ischemia th...,Yes,Nitroglycerin,"Not applicable; cause of death was suicide, in...",,Not Applicable,,Pre-existing medical condition,Not applicable; cause of death was intoxicatio...,,Burglary of Habitation,,,Convicted,,,Injured by NA,No,No,No,No,0,,No,0,,No,,,V_2005
3,PA05004C,2005-03-03 14:40:00,Submitted,ORIGINAL VERSION,1,POLICE,San Antonio Police Dept.,2102077449,"214 W. Nueva, Suite 331",San Antonio,BEXAR,TX,78207,Chief,Albert,,Ortiz,Renee Gray,renee.gray@oag.state.tx.us,2102 Goliad Road,San Antonio,Bexar,,,,Police Custody (pre-booking),Custody of Peace Officer during/fleeing arrest,,NaT,1,At the crime/arrest scene,,Albert,Chavez,Enriquez,,1983-11-28,Male,Hispanic,,2005-01-03 15:00:00,21,2005-01-03 14:50:00,0,"Yes, results are available",Justifiable Homicide,,Injuries only,Multiple Gunshot Wounds,Not Applicable,,Law enforcement/correctional staff,,Handgun,,Not Applicable; cause of death was accidental ...,Firearm,,Aggravated Robbery,,,Not filed at time of death,,,Injured by Officer,Yes,Yes,Yes,Yes,0,,No,1,,No,,,V_2005
4,PA05005C,2005-03-03 14:51:00,Submitted,ORIGINAL VERSION,1,POLICE,Houston Police Dept.,7133081778,"1200 Travis, 17th Floor",Houston,HARRIS,TX,77002,Chief,Clarence,O.,Bradford,Renee Gray,renee.gray@oag.state.tx.us,3600 block of Telephone Rd,Houston,Harris,,,,Police Custody (pre-booking),Custody of Peace Officer during/fleeing arrest,,NaT,1,At the crime/arrest scene,,Alex,,Mendez,,1978-12-30,Male,Hispanic,,2005-01-03 17:15:00,26,2005-01-03 17:15:00,0,"Yes, results are available",Justifiable Homicide,,Injuries only,Multiple Gunshot Wounds,Not Applicable,,Law enforcement/correctional staff,,Handgun,,Not Applicable; cause of death was accidental ...,Firearm,,Traffic Violation,,,Not filed at time of death,,Traffic Violation,Injured by Officer,Yes,No,No,No,0,,No,1,,No,,,V_2005


In [7]:
df1.head()

Unnamed: 0,CDR: CDR Name,Version Type,Version Number,Report Date,Status,Agency Name,Agency Address,Agency City,Agency State,Agency Zip,Director Salutation,Director First Name,Director Middle Name,Director Last Name,Reporter Name,Reporter Email,First Name,Middle Name,Last Name,Suffix,Date of Birth,Sex,Race,Age At Time Of Death,Date/Time of Custody or Incident,Death Date and Time,Medical Examinor/Coroner Evalution?,Manner of Death,Manner of Death Description,Medical Cause of Death,Medical Treatment,Who caused the death?,Type of weapon that caused death?,"Other weapon, specify",Pre existing medical condition?,Means of Death,Means of Death Other,Street Address,City,County,Zip,Longitude,Latitude,Census Tract,Location Category,Other Location Category,Type of Custody,Specific Type of Custody/Facility,TDCJ - Specify Unit,Entry Date Time,Entry Date Time N/A,Death Location,Death Location Elsewhere,Other Agencies Respond?,Offense 1,Offense 2,Offense 3,Were the Charges:,Type of Offense,"Type of Offense, Other",Decedent display/use of weapons,Decedent Display or Use Weapon Details,Specify Weapon Used,Attempt to Injure Others?,Appear intoxicated (alcohol or drugs),Make suicidal statements?,Exhibit any mental health problems?,Exhibit any medical problems?,Barricade self or initiate standoff?,Resist being handcuffed or arrested?,Physically attempt/assault officer(s),Gain possession of officer's weapon,Verbally threaten other(s) including law,Escape or attempt to escape/flee custody,Attempt gain possession officer's weapon,Under Restraint,Type of Restraint,"Other device, specify",form_version
0,16-3-C,AMENDED,4,2016-12-12 13:02:00,Submitted,Texas Department Of Public Safety,PO BOX 4087,Austin,TX,78773,Director,Steven,,McCraw,Joanne Scarbrough,joanne.scarbrough@dps.texas.gov,Ivory,Charles,Pantallion,III,1980-09-14,Male,Black or African American,36,2016-11-22 09:26:00,2016-11-22 09:26:00,"Yes, results are available",Homicide (includes Justifiable Homicide),,Multiple Gunshot Wounds,Not Applicable,Law enforcement/correctional personnel,Handgun; Rifle/shotgun,,Not Applicable; cause of death was accidental ...,Firearm,,7300 Interstate 10 W,Baytown,Harris,77521.0,,,,Roadway/highway/street/sidewalk,,Police Custody (pre-booking),Custody of Law Enforcement Personnel during/fl...,,NaT,1,Scene of incident,,Yes,Aggravated Assault on Peace Officer,Evading Arrest or Detention,,Not filed at time of death,Violent Crime Against Persons,,"Yes, mark all that apply",Displayed firearm without discharge,,Yes (select all that apply),No,No,No,No,Unknown,Yes,Yes,No,Unknown,Yes,No,No,,,V_2016
1,14-1-C,AMENDED,2,2016-12-13 16:20:00,Submitted,Texas Department Of Public Safety,PO BOX 4087,Austin,TX,78773,Director,Steven,,McCraw,Joanne Scarbrough,joanne.scarbrough@dps.texas.gov,James,Earl,Nicholas,,1966-04-24,Male,Anglo or White,48,2014-09-11 17:00:00,2014-09-11 17:00:00,"Yes, results are available",Could not be determined,,Multiple Gunshot Wounds,Not Applicable,Law enforcement/correctional personnel,"Firearm, unspecified",,Not Applicable; cause of death was accidental ...,Firearm,,2030 Jacintoport Blvd.,Houston,Harris,77015.0,,,,Roadway/highway/street/sidewalk,,Police Custody (pre-booking),Custody of Law Enforcement Personnel during/fl...,,NaT,1,Scene of incident,,Yes,Capital Murder - Filed,Aggravated Assault on a Peace Officer - Not Filed,Evading Arrest or Detention - Not Filed,Filed,Violent Crime Against Persons,,"Yes, mark all that apply",Discharged firearm,,Yes (select all that apply),No,No,No,No,No,Yes,Yes,No,Unknown,Yes,No,No,,,V_2016
2,16-4-P,ORIGINAL VERSION,1,2016-12-14 15:27:00,Submitted,TDCJ/Office of the Inspector General,"2503 Lake Road, Suite 5",Huntsville,TX,77340,Other,John,,West,Analou Sievers,analou.sievers@tdcj.texas.gov,Percy,,Froman,,1969-12-19,Male,Anglo or White,46,2000-01-28 00:00:00,2016-11-17 08:25:00,"No, evaluation not planned",Natural,,Colon Cancer,No,Not applicable,Not Applicable,,Pre-existing medical condition,"Not applicable, cause of death was illness/nat...",,8602 Peach Street,Lubbock,Lubbock,79404.0,,,,Law Enforcement Facility,,Penitentiary,"TDCJ, specify",Montford,2000-01-28 00:00:00,0,Medical facility,,No,Aggravated Robbery,,,Convicted,Violent Crime Against Persons,,No,,,No,,,,,,,,,,,,No,,,V_2016
3,16-5-C,ORIGINAL VERSION,1,2016-12-14 18:37:00,Submitted,San Antonio Police Dept.,315 S. Santa Rosa,San Antonio,TX,78207,Chief,William,,McManus,Leroy Carrion,leroy.carrion@sanantonio.gov,Andrew,,Moreno,,1991-12-23,Male,Hispanic or Latino,24,2016-11-19 21:00:00,2016-11-20 05:34:00,"Yes, results are available",Homicide (includes Justifiable Homicide),,Multiple gunshot wounds,No,Law enforcement/correctional personnel,Rifle/shotgun,,Not Applicable; cause of death was accidental ...,Firearm,,5814 Shadow Glen #4,San Antonio,Bexar,78238.0,,,,Residence/Home,,Police Custody (pre-booking),Custody of Law Enforcement Personnel during/fl...,,2016-11-19 21:00:00,0,Scene of incident,,Yes,Aggravated Assault with a Deadly Weapon,,,Filed,Violent Crime Against Persons,,"Yes, mark all that apply",Displayed firearm without discharge,,Yes (select all that apply),Unknown,Yes,No,No,Yes,No,Yes,No,Yes,No,No,No,,,V_2016
4,16-6-MJ,ORIGINAL VERSION,1,2016-12-15 11:20:00,Submitted,Rosenberg Police Dept.,2120 Fourth St.,Rosenberg,TX,77471,Chief,Dallis,,Warren,Charles Crocker,justin.crocker@rosenbergtx.gov,Roberto,Eduardo,Velasquez,,1977-01-31,Male,Hispanic or Latino,39,2016-12-02 22:32:00,2016-12-03 09:15:00,"Yes, results pending",Pending autopsy results,,Pending autopsy results.,No,Not applicable,Not Applicable,,Pre-existing medical condition,"Not applicable, cause of death was illness/nat...",,1910 Louise #41,Rosenberg,Fort Bend,77471.0,,,,Residence/Home,,Municipal Jail,Jail - detox cell,,2016-12-02 22:32:00,0,Medical facility,,No,TCIC warrant - Failure to Appear (Child Neglect),,,Filed,"Other, specify",Investigation of disturbance call at residence...,No,,,No,Yes,No,No,No,No,No,No,No,No,No,No,No,,,V_2016


## II. Cleaning

### 1. Merge the two sheets into one, keeping the columns we care about

In [8]:
keep_text = '''Both forms

- Age At Time Of Death
- Agency Address
- Agency City
- Agency Name
- Agency Zip
- CDR: CDR Name
- City
- County
- Date of Birth
- Date/Time of Custody or Incident
- Death Date and Time
- Death Location
- Death Location Elsewhere
- Entry Date Time
- Entry Date Time N/A
- First Name
- Middle Name
- Last Name
- Suffix
- Manner of Death
- Manner of Death Description
- Means of Death
- Means of Death Other
- Medical Cause of Death
- Medical Examinor/Coroner Evalution?
- Medical Treatment
- Offense 1
- Offense 2
- Offense 3
- Pre existing medical condition?
- Sex
- Specific Type of Custody/Facility
- Street Address
- Type of Custody
- Type of Offense
- Type of Offense, Other
- Version Number
- Version Type
- Were the Charges:
- Who caused the death?
- form_version

2005 form

- Agency County
- Custody Date NA
- Custody Type Facility
- Death Causer Other
- Death Reason
- Department Type
- Entry Behavior
- Ethnicity
- Ethnicity Other
- Other Behavior
- Specify Other Behavior

2016 form

- Exhibit any medical problems?
- Exhibit any mental health problems?
- Make suicidal statements?
- Race'''
keep_cols = []
for line in keep_text.splitlines():
    if line.startswith('- '):
        keep_cols.append(line[2:])

In [9]:
col_renames = {}
for c in keep_cols:
    new_name = ''.join([ch if ch.isalnum() else ' ' for ch in c.lower()])
    new_name = '_'.join(new_name.strip().split())
    col_renames[c] = new_name

In [10]:
cdr = pd.concat([df0, df1])
cdr = cdr[list(col_renames.keys())]
cdr.rename(col_renames, inplace=True, axis=1)

### 1.b Summarize what columns are populated and how frequently

In [11]:
notnull = cdr.notnull().mean()
notnull05 = cdr[cdr.form_version == 'V_2005'].notnull().mean()
notnull16 = cdr[cdr.form_version == 'V_2016'].notnull().mean()
frame = pd.concat([notnull, notnull05, notnull16], axis=1)
frame.columns = ['all data', '2005 forms', '2016 forms']
frame.sort_index(inplace=True)
frame

Unnamed: 0,all data,2005 forms,2016 forms
age_at_time_of_death,1.0,1.0,1.0
agency_address,1.0,1.0,1.0
agency_city,1.0,1.0,1.0
agency_county,0.83153,0.999612,0.0
agency_name,1.0,1.0,1.0
agency_zip,1.0,1.0,1.0
cdr_cdr_name,1.0,1.0,1.0
city,0.998869,0.998835,0.999039
county,1.0,1.0,1.0
custody_date_na,0.831853,1.0,0.0


### 1.c Upcase everything

In [12]:
cleaning_tools.upcase_cells(cdr)

### 1.d get rid of everything before 2005

In [13]:
before = len(cdr)
cdr = cdr[cdr.date_time_of_custody_or_incident >= '2005']
after = len(cdr)
print('Dropped %d (of %d) reports for deaths before 2005, leaving %d' % (before - after, before, after))

Dropped 1527 (of 6191) reports for deaths before 2005, leaving 4664


### 2. Merge race columns -- the 2005 form calls it 'ethnicity', the 2016 'race'

#### Have a look at the values first

In [14]:
cdr.race.value_counts()

ANGLO OR WHITE               344
HISPANIC OR LATINO           256
BLACK OR AFRICAN AMERICAN    195
OTHER                          5
ASIAN OR PACIFIC ISLANDER      2
Name: race, dtype: int64

In [15]:
cdr.ethnicity.value_counts()

ANGLO                               1562
HISPANIC                            1128
AFRICAN-AMERICAN                    1101
ASIAN                                 24
OTHER                                 19
MIDDLE EAST                           12
AMERICAN INDIAN/ALASKA NATIVE         11
NATIVE HAWAIIAN/PACIFIC ISLANDER       5
Name: ethnicity, dtype: int64

In [16]:
# When choosing the 'Other' ethnicity in the 2005, the form has a
# subsequent field to specify. Though clearly some of them are not
# truly 'other' ethnicities. See:
cdr.ethnicity_other.value_counts()

WHITE                  7
CAUCASIAN              4
UNKNOWN                3
SUDANESE BLACK         1
WHITE NON HISPANIC     1
UNITED KINGDOM         1
ANGLO & MIDDLE EAST    1
CUBAN                  1
Name: ethnicity_other, dtype: int64

In [17]:
# Let's make sure nobody is filling out the "other ethnicity" column when they shouldn't...
cdr[((cdr.ethnicity != 'OTHER') & cdr.ethnicity_other.notnull())][['ethnicity', 'ethnicity_other']]

Unnamed: 0,ethnicity,ethnicity_other


In [18]:
# Good. Let's transfer those specified ethnicity_other values into
# the 'ethnicity' column, so we can merge everything at once.
other_eth = (cdr.ethnicity == 'OTHER')
print('Merging %d "ethnicity_other" values into the main "ethnicity" column' % other_eth.sum())
cdr.loc[other_eth, 'ethnicity'] = cdr.ethnicity_other[other_eth]
cdr.drop('ethnicity_other', axis=1, inplace=True)

Merging 19 "ethnicity_other" values into the main "ethnicity" column


In [19]:
# Make a single 'race' column that has merged, simplified values of race or ethnicity.
race_eth_list = []
for race, eth in zip(cdr.race, cdr.ethnicity):
    # Only one of (race, eth) should be set
    assert pd.isnull(race) or pd.isnull(eth)
    if pd.isnull(race):
        if pd.isnull(eth):
            race_eth_list.append(None)
            continue
        x = eth
    else:
        x = race
    race_eth_list.append(cleaning_tools.standardize_race(x))

In [20]:
cdr['race'] = race_eth_list
cdr.drop('ethnicity', axis=1, inplace=True)
cdr.race.value_counts()

WHITE       1919
HISPANIC    1384
BLACK       1297
OTHER         64
Name: race, dtype: int64

### 3. Fix agency-related columns

In [21]:
# Standardize agency name (so we can join/compare across datasets)
cdr['agency_name'] = cdr['agency_name'].apply(standardize_agency_name)

# Lookup county name by agency name. If this fails, fall back
# on the county specified in the form, if it exists.
cdr['agency_county'] = cdr['agency_county'].str.upper()
county_lookup = cdr['agency_name'].apply(lambda name: agency_county.get(name, np.nan))
cdr['agency_county'] = county_lookup.fillna(cdr['agency_county'])

# Manually handle one major agency
cdr.loc[cdr['agency_name'] == 'TDCJOFFICE OF THE INSPECTOR GENERAL', 'agency_county'] = 'STATE'

In [22]:
# Check that we are only missing counties for a paltry few records now.
cdr[cdr['agency_county'].isnull()]['agency_name'].value_counts()

TEXAS JUVENILE JUSTICE DEPT OFFICE OF INSPECTOR GENERAL     2
ECTOR CO SHERIFFS OFC                                       1
METROPOLITAN TRANSIT AUTH OF HARRIS CO                      1
ELLIS CO SHERIFFS OFFICE WAYNE MCCOLLUM DETENTION CENTER    1
Name: agency_name, dtype: int64

### 4. For death-information related columns, group and rename values to stabilize across forms

In [23]:
cdr['death_location'].value_counts()

AT MEDICAL FACILITY                         2243
AT LAW ENFORCEMENT FACILITY                  817
AT THE CRIME/ARREST SCENE                    667
MEDICAL FACILITY                             515
SCENE OF INCIDENT                            131
LAW ENFORCEMENT FACILITY/BOOKING CENTER      119
ELSEWHERE                                     82
EN ROUTE TO MEDICAL FACILITY                  49
ELSEWHERE, SPECIFY                            19
DEAD ON ARRIVAL AT MEDICAL FACILITY           18
EN ROUTE TO BOOKING CENTER/POLICE LOCKUP       4
Name: death_location, dtype: int64

In [24]:
replacements = {
    'AT MEDICAL FACILITY': 'MEDICAL FACILITY',
    'AT LAW ENFORCEMENT FACILITY': 'LAW ENFORCEMENT FACILITY',
    'AT THE CRIME/ARREST SCENE': 'CRIME/ARREST SCENE',
    'LAW ENFORCEMENT FACILITY/BOOKING CENTER': 'LAW ENFORCEMENT FACILITY',
    'DEAD ON ARRIVAL AT MEDICAL FACILITY': 'EN ROUTE TO MEDICAL FACILITY',
    'EN ROUTE TO BOOKING CENTER/POLICE LOCKUP': 'EN ROUTE TO LAW ENFORCEMENT FACILITY',
    'ELSEWHERE, SPECIFY': 'ELSEWHERE',
}
cdr['death_location'] = cdr['death_location'].apply(lambda x: replacements.get(x.strip(), x))
cdr['death_location'].value_counts()

MEDICAL FACILITY                        2758
LAW ENFORCEMENT FACILITY                 936
CRIME/ARREST SCENE                       667
SCENE OF INCIDENT                        131
ELSEWHERE                                101
EN ROUTE TO MEDICAL FACILITY              67
EN ROUTE TO LAW ENFORCEMENT FACILITY       4
Name: death_location, dtype: int64

In [25]:
pd.crosstab(cdr.means_of_death, cdr.form_version)

form_version,V_2005,V_2016
means_of_death,Unnamed: 1_level_1,Unnamed: 2_level_1
BATON / BLUNT INSTRUMENT,0,1
BLUNT INSTRUMENT,7,0
DON'T KNOW,94,0
DON\'T KNOW,32,0
DRUG OVERDOSE,141,22
FIREARM,888,167
"HANGING, STRANGULATION",440,71
KNIFE / EDGED INSTRUMENT,0,3
"KNIFE, CUTTING INSTRUMENT",25,0
"NOT APPLICABLE, CAUSE OF DEATH WAS ILLNESS/NATURAL CAUSE",2,442


In [26]:
replacements = {
    'NOT APPLICABLE, CAUSE OF DEATH WAS ILLNESS/NATURAL CAUSE': 'NOT APPLICABLE',
    'NOT APPLICABLE; CAUSE OF DEATH WAS INTOXICATION OR ILLNESS/NATURAL CAUSES': 'NOT APPLICABLE',
    'OTHER': 'OTHER, SPECIFY',
    'KNIFE, CUTTING INSTRUMENT': 'KNIFE / EDGED INSTRUMENT',
    'BLUNT INSTRUMENT': 'BATON / BLUNT INSTRUMENT',
    "DON'T KNOW": 'UNKNOWN',
    "DON\\'T KNOW": 'UNKNOWN',
}
cdr['means_of_death'] = cdr['means_of_death'].apply(lambda x: replacements.get(x.strip(), x))
pd.crosstab(cdr.means_of_death, cdr.form_version)

form_version,V_2005,V_2016
means_of_death,Unnamed: 1_level_1,Unnamed: 2_level_1
BATON / BLUNT INSTRUMENT,7,1
DRUG OVERDOSE,141,22
FIREARM,888,167
"HANGING, STRANGULATION",440,71
KNIFE / EDGED INSTRUMENT,25,3
NOT APPLICABLE,2034,442
"OTHER, SPECIFY",200,23
UNKNOWN,126,66
VEHICLE ACCIDENT,1,7


In [27]:
pd.crosstab(cdr.form_version, cdr.manner_of_death_description.notnull()).T

form_version,V_2005,V_2016
manner_of_death_description,Unnamed: 1_level_1,Unnamed: 2_level_1
False,1909,790
True,1953,12


In [28]:
pd.crosstab(cdr.form_version, cdr.manner_of_death).T

form_version,V_2005,V_2016
manner_of_death,Unnamed: 1_level_1,Unnamed: 2_level_1
ACCIDENTAL,2,31
ACCIDENTAL INJURY CAUSED BY OTHERS,20,0
ACCIDENTAL INJURY TO SELF,128,0
ALCOHOL/DRUG INTOXICATION,287,6
COULD NOT BE DETERMINED,3,7
HOMICIDE (INCLUDES JUSTIFIABLE HOMICIDE),15,126
JUSTIFIABLE HOMICIDE,677,0
NATURAL,12,399
NATURAL CAUSES/ILLNESS,1812,0
OTHER,120,0


In [29]:
replacements = {
    'NATURAL': 'NATURAL CAUSES/ILLNESS',
    'JUSTIFIABLE HOMICIDE': 'HOMICIDE',
    'OTHER HOMICIDE': 'HOMICIDE',
    'HOMICIDE (INCLUDES JUSTIFIABLE HOMICIDE)': 'HOMICIDE',
    'ACCIDENTAL INJURY CAUSED BY OTHERS': 'ACCIDENTAL',
    'ACCIDENTAL INJURY TO SELF': 'ACCIDENTAL',
    'OTHER': 'OTHER, SPECIFY',
    'OTHER - SPECIFY': 'OTHER, SPECIFY'
}
cdr['manner_of_death'] = cdr['manner_of_death'].apply(lambda x: replacements.get(x.strip(), x))
pd.crosstab(cdr.form_version, cdr.manner_of_death).T

form_version,V_2005,V_2016
manner_of_death,Unnamed: 1_level_1,Unnamed: 2_level_1
ACCIDENTAL,150,31
ALCOHOL/DRUG INTOXICATION,287,6
COULD NOT BE DETERMINED,3,7
HOMICIDE,784,126
NATURAL CAUSES/ILLNESS,1824,399
"OTHER, SPECIFY",122,9
PENDING AUTOPSY RESULTS,0,121
SUICIDE,692,103


In [30]:
cdr[cdr['manner_of_death'] == 'OTHER, SPECIFY'][['manner_of_death', 'manner_of_death_description']].head()

Unnamed: 0,manner_of_death,manner_of_death_description
201,"OTHER, SPECIFY",UNDETERMINED
230,"OTHER, SPECIFY",UNDETERMINED
329,"OTHER, SPECIFY",UNDETERMINED
392,"OTHER, SPECIFY",BLUNT FORCE HEAD AND NECK INJURY
509,"OTHER, SPECIFY",PENDING TOXICOLOGY


In [31]:
undetermined = cdr['manner_of_death'] == 'COULD NOT BE DETERMINED'
cdr.loc[undetermined, 'manner_of_death'] = 'OTHER, SPECIFY'
cdr.loc[undetermined, 'manner_of_death_description'] = 'UNDETERMINED'
pd.crosstab(cdr.form_version, cdr.manner_of_death).T

form_version,V_2005,V_2016
manner_of_death,Unnamed: 1_level_1,Unnamed: 2_level_1
ACCIDENTAL,150,31
ALCOHOL/DRUG INTOXICATION,287,6
HOMICIDE,784,126
NATURAL CAUSES/ILLNESS,1824,399
"OTHER, SPECIFY",125,16
PENDING AUTOPSY RESULTS,0,121
SUICIDE,692,103


In [32]:
pd.crosstab(cdr.form_version, cdr.pre_existing_medical_condition).T

form_version,V_2005,V_2016
pre_existing_medical_condition,Unnamed: 1_level_1,Unnamed: 2_level_1
COULD NOT BE DETERMINED,0,386
DECEASED DEVELOPED CONDITION AFTER ADMISSION,0,30
DEVELOPED CONDITION AFTER ADMISSION,63,0
DON'T KNOW,736,0
DON\'T KNOW,534,0
"NOT APPLICABLE; CAUSE OF DEATH WAS ACCIDENTAL INJURY, INTOXICATION, SUICIDE OR HOMICIDE",1870,297
PRE-EXISTING MEDICAL CONDITION,659,89


In [33]:
replacements = {
    'DECEASED DEVELOPED CONDITION AFTER ADMISSION': 'DEVELOPED CONDITION AFTER ADMISSION',
    "DON'T KNOW": 'UNKNOWN',
    "DON\\'T KNOW": 'UNKNOWN',
    'NOT APPLICABLE; CAUSE OF DEATH WAS ACCIDENTAL INJURY, INTOXICATION, SUICIDE OR HOMICIDE': 'NOT APPLICABLE',
    'COULD NOT BE DETERMINED': 'UNKNOWN',
    'PRE-EXISTING MEDICAL CONDITION': 'YES',
}
cdr['pre_existing_medical_condition'] = cdr['pre_existing_medical_condition'].apply(lambda x: replacements.get(x.strip(), x))
pd.crosstab(cdr.form_version, cdr.pre_existing_medical_condition).T

form_version,V_2005,V_2016
pre_existing_medical_condition,Unnamed: 1_level_1,Unnamed: 2_level_1
DEVELOPED CONDITION AFTER ADMISSION,63,30
NOT APPLICABLE,1870,297
UNKNOWN,1270,386
YES,659,89


In [34]:
pd.crosstab(cdr['who_caused_the_death'], cdr.form_version)

form_version,V_2005,V_2016
who_caused_the_death,Unnamed: 1_level_1,Unnamed: 2_level_1
DECEASED,141,0
DECEDENT,0,146
DON'T KNOW,103,0
DON\'T KNOW,33,0
LAW ENFORCEMENT/CORRECTIONAL PERSONNEL,1,134
LAW ENFORCEMENT/CORRECTIONAL STAFF,726,0
NOT APPLICABLE,0,500
"NOT APPLICABLE; CAUSE OF DEATH WAS SUICIDE, INTOXICATION OR ILLNESS/NATURAL CAUSES",2786,0
OTHER CIVILIAN(S),0,1
OTHER DETAINEE(S),0,7


In [35]:
replacements = {
    'DECEASED': 'DECEDENT',
    "DON'T KNOW": 'UNKNOWN',
    "DON\\'T KNOW": 'UNKNOWN',
    'LAW ENFORCEMENT/CORRECTIONAL STAFF': 'LAW ENFORCEMENT/CORRECTIONAL PERSONNEL',
    'NOT APPLICABLE; CAUSE OF DEATH WAS SUICIDE, INTOXICATION OR ILLNESS/NATURAL CAUSES': 'NOT APPLICABLE',
    'OTHER DETAINEES': 'OTHER DETAINEE(S)',
    'OTHER PERSONS': 'OTHER CIVILIAN(S)',
    'ACCIDENTAL INJURY TO SELF': 'ACCIDENTAL',
    'UNKNOWN PERSON(S) CAUSED THE INJURY': 'UNKNOWN',
    'UNKNOWN WHETHER DECEDENT SUSTAINED A FATAL INJURY': 'UNKNOWN',
}
cdr['who_caused_the_death'] = cdr['who_caused_the_death'].apply(lambda x: replacements.get(x.strip(), x))
pd.crosstab(cdr['who_caused_the_death'], cdr.form_version)

form_version,V_2005,V_2016
who_caused_the_death,Unnamed: 1_level_1,Unnamed: 2_level_1
DECEDENT,141,146
LAW ENFORCEMENT/CORRECTIONAL PERSONNEL,727,134
NOT APPLICABLE,2786,500
OTHER CIVILIAN(S),38,1
OTHER DETAINEE(S),34,7
UNKNOWN,136,14


In [36]:
pd.crosstab(cdr['were_the_charges'], cdr.form_version)

form_version,V_2005,V_2016
were_the_charges,Unnamed: 1_level_1,Unnamed: 2_level_1
A PROBATION/PAROLE VIOLATION,0,12
CONVICTED,1579,471
FILED,830,89
NOT FILED AT TIME OF DEATH,1342,230
PROBATION/PAROLE,111,0


In [37]:
replacements = {
    'PROBATION/PAROLE': 'PROBATION/PAROLE VIOLATION',
    'A PROBATION/PAROLE VIOLATION': 'PROBATION/PAROLE VIOLATION',
}
cdr['were_the_charges'] = cdr['were_the_charges'].apply(lambda x: replacements.get(x.strip(), x))
pd.crosstab(cdr['were_the_charges'], cdr.form_version)

form_version,V_2005,V_2016
were_the_charges,Unnamed: 1_level_1,Unnamed: 2_level_1
CONVICTED,1579,471
FILED,830,89
NOT FILED AT TIME OF DEATH,1342,230
PROBATION/PAROLE VIOLATION,111,12


#### Identify and drop a range of unnecessary columns

In [38]:
cdr.groupby([cdr.entry_date_time.isnull(), cdr.entry_date_time_n_a]).size().unstack()

entry_date_time_n_a,0,1
entry_date_time,Unnamed: 1_level_1,Unnamed: 2_level_1
False,2893.0,
True,,1771.0


In [39]:
pd.crosstab(cdr.custody_date_na, cdr['date_time_of_custody_or_incident'].isnull())

date_time_of_custody_or_incident,False
custody_date_na,Unnamed: 1_level_1
0.0,3861
1.0,1


In [40]:
cdr.drop(['entry_date_time_n_a', 'custody_date_na'], axis=1, inplace=True)

In [41]:
pd.crosstab(cdr.type_of_offense.notnull(), cdr.form_version)

form_version,V_2005,V_2016
type_of_offense,Unnamed: 1_level_1,Unnamed: 2_level_1
False,3852,1
True,10,801


In [42]:
pd.crosstab(cdr.type_of_offense_other.notnull(), cdr.form_version)

form_version,V_2005,V_2016
type_of_offense_other,Unnamed: 1_level_1,Unnamed: 2_level_1
False,3138,679
True,724,123


In [43]:
cdr.drop(['type_of_offense', 'type_of_offense_other'], axis=1, inplace=True)

In [44]:
cdr['other_behavior'].value_counts()

0.0    3687
1.0     175
Name: other_behavior, dtype: int64

In [45]:
pd.crosstab(cdr['other_behavior'], cdr['specify_other_behavior'].notnull())

specify_other_behavior,False,True
other_behavior,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,3687,0
1.0,0,175


In [46]:
cdr['other_behavior'] = cdr['specify_other_behavior']
cdr.drop('specify_other_behavior', axis=1, inplace=True)

### Tweak bookkeping columns

In [47]:
cdr['num_revisions'] = cdr['version_number'] - 1
cdr.drop(['version_type', 'version_number'], axis=1, inplace=True)

In [48]:
col_renames = {
    'first_name': 'name_first',
    'middle_name': 'name_middle',
    'last_name': 'name_last',
    'suffix': 'name_suffix',
    'cdr_cdr_name': 'record_number',
    'death_location': 'death_location_description',
    'death_location_elsewhere': 'death_location_description_elsewhere',
    'city': 'death_location_city',
    'county': 'death_location_county',
    'street_address': 'death_location_street_address',
    'entry_date_time': 'facility_entry_date_time',
    'department_type': 'agency_type',
    'pre_existing_medical_condition': 'death_from_pre_existing_medical_condition'
}

In [49]:
cdr.rename(col_renames, axis=1, inplace=True)

In [50]:
before = cdr.shape
cdr = cdr[[
    # Record indexing columns
    'record_number',
    'num_revisions',
    'form_version',
    'date_time_of_custody_or_incident',

    # Deceased personal information, demographics
    'name_first',
    'name_last',
    'name_middle',
    'name_suffix',
    'date_of_birth',
    'age_at_time_of_death',
    'sex',
    'race',

    # Death event information
    'death_date_and_time',
    'death_location_county',
    'death_location_city',
    'death_location_street_address',
    'death_location_description',
    'death_location_description_elsewhere',
    'death_reason',
    'death_causer_other',
    'death_from_pre_existing_medical_condition',
    'manner_of_death',
    'manner_of_death_description',
    'means_of_death',
    'means_of_death_other',
    'medical_cause_of_death',
    'medical_examinor_coroner_evalution',
    'medical_treatment',
    'who_caused_the_death',

    # Criminal information on deceased
    'offense_1',
    'offense_2',
    'offense_3',
    'were_the_charges',

    # Facility and agency information
    'facility_entry_date_time',
    'type_of_custody',
    'custody_type_facility',
    'specific_type_of_custody_facility',
    'agency_address',
    'agency_city',
    'agency_county',
    'agency_name',
    'agency_zip',
    'agency_type',
    
    # Deceased behavior upon entry or custody
    'entry_behavior',
    'other_behavior',
    'exhibit_any_medical_problems',
    'exhibit_any_mental_health_problems',
    'make_suicidal_statements',
]]
after = cdr.shape
assert before == after

In [51]:
cdr.sample(10)

Unnamed: 0,record_number,num_revisions,form_version,date_time_of_custody_or_incident,name_first,name_last,name_middle,name_suffix,date_of_birth,age_at_time_of_death,sex,race,death_date_and_time,death_location_county,death_location_city,death_location_street_address,death_location_description,death_location_description_elsewhere,death_reason,death_causer_other,death_from_pre_existing_medical_condition,manner_of_death,manner_of_death_description,means_of_death,means_of_death_other,medical_cause_of_death,medical_examinor_coroner_evalution,medical_treatment,who_caused_the_death,offense_1,offense_2,offense_3,were_the_charges,facility_entry_date_time,type_of_custody,custody_type_facility,specific_type_of_custody_facility,agency_address,agency_city,agency_county,agency_name,agency_zip,agency_type,entry_behavior,other_behavior,exhibit_any_medical_problems,exhibit_any_mental_health_problems,make_suicidal_statements
48,16-52-P,0,V_2016,2014-08-01 00:00:00,KELLY,BABB,,,1958-03-07,58,MALE,WHITE,2016-12-20 08:08:00,GALVESTON,DICKINSON,5509 ATTWATER AVENUE,MEDICAL FACILITY,,,,UNKNOWN,NATURAL CAUSES/ILLNESS,,NOT APPLICABLE,,SEPTIC SHOCK,"NO, EVALUATION NOT PLANNED",UNKNOWN,NOT APPLICABLE,SEXUAL ASSAULT OF A CHILD,,,CONVICTED,2014-08-01 00:00:00,PENITENTIARY,,"TDCJ, SPECIFY","2503 LAKE ROAD, SUITE 5",HUNTSVILLE,STATE,TDCJOFFICE OF THE INSPECTOR GENERAL,77340,,,,UNKNOWN,UNKNOWN,UNKNOWN
4215,PA15155P,0,V_2005,2015-04-01 00:00:00,MATTHEW,NELSON,,,1977-07-27,37,MALE,WHITE,2015-04-01 18:42:00,ANDERSON,PALESTINE,1385 FM 3328,MEDICAL FACILITY,,NOT APPLICABLE,,NOT APPLICABLE,SUICIDE,,"HANGING, STRANGULATION",,ASPHYXIATION DUE TO HANGING,"YES, RESULTS ARE AVAILABLE",NOT APPLICABLE,NOT APPLICABLE,POSSESSION OF A CONTROLLED SUBSTANCE PG1 4,,,CONVICTED,2015-04-01 00:00:00,PENITENTIARY,GURNEY UNIT,TDCJ,"2503 LAKE ROAD, SUITE 5",HUNTSVILLE,WALKER,TEXAS DEPT OF CRIMINAL JUSTICE,77340,STAGENCY,,,,,
3494,PA14261P,0,V_2005,2014-01-13 00:00:00,BRAD,EASTMAN,,,1982-09-29,31,MALE,BLACK,2014-05-25 03:32:00,GALVESTON,GALVESTON,809 HARBORSIDE DRIVE,MEDICAL FACILITY,,MEDICAL CONDITION ONLY (E.G. HEART ATTACK),,YES,NATURAL CAUSES/ILLNESS,MULTIORGAN FAILURE,NOT APPLICABLE,,MULTIORGAN FAILURE (DUE TO HIV),"NO, EVALUATION NOT PLANNED",YES,NOT APPLICABLE,BURGLARY OF A HABITATION,,,CONVICTED,2014-01-13 00:00:00,PENITENTIARY,HOSPITAL GALVESTON,TDCJ,"2503 LAKE ROAD, SUITE 5",HUNTSVILLE,WALKER,TEXAS DEPT OF CRIMINAL JUSTICE,77340,STAGENCY,,,,,
31,16-35-P,1,V_2016,2016-10-27 00:00:00,RENE,DE LA CERDA,,,1948-08-29,68,MALE,HISPANIC,2016-12-08 07:18:00,ANGELINA,DIBOLL,1502 SOUTH 1ST STREET,MEDICAL FACILITY,,,,UNKNOWN,NATURAL CAUSES/ILLNESS,,NOT APPLICABLE,,SUDDEN CARDIAC DEATH DUE TO ATHEROSCLEROTIC CO...,"YES, RESULTS ARE AVAILABLE",UNKNOWN,NOT APPLICABLE,INDECENCY WITH A CHILD,,,CONVICTED,2016-10-27 00:00:00,PENITENTIARY,,"TDCJ, SPECIFY","2503 LAKE ROAD, SUITE 5",HUNTSVILLE,STATE,TDCJOFFICE OF THE INSPECTOR GENERAL,77340,,,,YES,UNKNOWN,NO
4539,PA15688P,0,V_2005,2015-08-31 00:00:00,ALVIN,HUDSON,,,1967-09-24,48,MALE,BLACK,2015-12-28 18:17:00,WOOD,WINNSBORO,703 AIRPORT ROAD,MEDICAL FACILITY,,MEDICAL CONDITION ONLY (E.G. HEART ATTACK),,UNKNOWN,NATURAL CAUSES/ILLNESS,BACTERIAL INFECTION,NOT APPLICABLE,,"WIDELY-SYSTEMIC BACTERIAL INFECTION, PROBABLY ...","YES, RESULTS ARE AVAILABLE",NOT APPLICABLE,NOT APPLICABLE,DRIVING WHILE INTOXICATED - 3RD,,,CONVICTED,2015-08-31 00:00:00,PENITENTIARY,JOHNSTON,TDCJ,"2503 LAKE ROAD, SUITE 5",HUNTSVILLE,WALKER,TEXAS DEPT OF CRIMINAL JUSTICE,77340,STAGENCY,,,,,
4863,PA16312CJ,0,V_2005,2016-06-30 10:22:00,MORGAN,ANGERBAUER,CHRISTY-RUTH,,1995-07-09,20,FEMALE,WHITE,2016-07-01 05:07:00,BOWIE,TEXARKANA,100 NORTH STATE LINE,LAW ENFORCEMENT FACILITY,,NOT APPLICABLE,,YES,NATURAL CAUSES/ILLNESS,DIABETIC KETOACIDOSIS,NOT APPLICABLE,,DIABETIC KETOACIDOSIS,"YES, RESULTS ARE AVAILABLE",YES,NOT APPLICABLE,POSSESSION OF A CONTROLLED SUBSTANCE,CONTEMPT OF COURT,,FILED,2016-06-30 10:22:00,COUNTY JAIL,,JAIL - SINGLE CELL,100 NORTH STATE LINE AVE.,TEXARKANA,BOWIE,BOWIE CO SHERIFFS OFFICE,75501,SHERIFF,,,,,
1707,PA10077C,0,V_2005,2010-04-01 23:42:00,ZACKERY,WELCH,WAYLAND,,1987-04-12,22,MALE,WHITE,2010-04-01 23:46:00,LIVE_OAK,5.8 MILES S. GEORGE WEST,"US 281, 5.8 MILES S. GEORGE WEST",CRIME/ARREST SCENE,,INJURIES ONLY,,NOT APPLICABLE,SUICIDE,,FIREARM,,GUNSHOT WOUND TO THE HEAD,"YES, RESULTS ARE AVAILABLE",NOT APPLICABLE,NOT APPLICABLE,AGGRAVATED ASSAULT AGAINST A POLICE OFFICER,ATTEMPTED CAPITAL MURDER,FLEEING WITH VEHICLE,NOT FILED AT TIME OF DEATH,NaT,POLICE CUSTODY (PRE-BOOKING),,CUSTODY OF PEACE OFFICER DURING/FLEEING ARREST,P. O. BOX 4087,AUSTIN,STATE,TEXAS DEPT OF PUBLIC SAFETY,78773,STAGENCY,,,,,
268,PA05266P,0,V_2005,2005-11-29 12:00:00,HOWARD,ELLIS,LAMAR,,1972-02-25,33,MALE,BLACK,2005-12-13 00:34:00,WALKER,HUNTSVILLE,21 FM 247,MEDICAL FACILITY,,MEDICAL CONDITION ONLY (E.G. HEART ATTACK),,YES,NATURAL CAUSES/ILLNESS,CARDIO-PULMONARY ARREST,NOT APPLICABLE,,DIALYSIS PATIENT REFUSED DIALYSIS. CARDIO-PULM...,"YES, RESULTS ARE AVAILABLE",YES,NOT APPLICABLE,POSSESSION OF COCAINE,,,CONVICTED,2005-11-29 12:00:00,PENITENTIARY,BYRD UNIT,TDCJ,P.O. BOX 4003,HUNTSVILLE,TRAVIS,TEXAS DEPT OF CRIMINAL JUSTICE,773424003,STAGENCY,MEDICAL,,,,
3728,PA14497P,0,V_2005,2010-06-03 00:00:00,RANDY,SYLVESTER,,,1980-10-13,33,MALE,BLACK,2014-09-26 22:54:00,TYLER,WOODVILLE,777 FM 3497,MEDICAL FACILITY,,NOT APPLICABLE,,NOT APPLICABLE,SUICIDE,,"HANGING, STRANGULATION",,ASPHYXIATION BY HANGING,"YES, RESULTS ARE AVAILABLE",NOT APPLICABLE,NOT APPLICABLE,CAPITAL MURDER/MULTIPLE,,,CONVICTED,2010-06-03 00:00:00,PENITENTIARY,LEWIS UNIT,TDCJ,"2503 LAKE ROAD, SUITE 5",HUNTSVILLE,WALKER,TEXAS DEPT OF CRIMINAL JUSTICE,77340,STAGENCY,,,,,
448,PA06162C,0,V_2005,2006-07-13 01:20:00,RENE,SANCHEZ,MIGUEL,,1971-06-09,35,MALE,HISPANIC,2006-07-13 01:20:00,HARRIS,HOUSTON,9715 KIRKVILLE,CRIME/ARREST SCENE,,INJURIES ONLY,,NOT APPLICABLE,HOMICIDE,,FIREARM,,GUNSHOT WOUND TO LEFT SIDE OF NECK AND HEAD,"YES, RESULTS ARE AVAILABLE",NOT APPLICABLE,LAW ENFORCEMENT/CORRECTIONAL PERSONNEL,AGG. ASSAULT - POLICE OFFICER WITH A DEADLY WE...,,,NOT FILED AT TIME OF DEATH,NaT,POLICE CUSTODY (PRE-BOOKING),,CUSTODY OF PEACE OFFICER DURING/FLEEING ARREST,1200 TRAVIS,HOUSTON,HARRIS,HOUSTON POLICE DEPT,77002,POLICE,,,,,


## Write

In [52]:
with dw.open_remote_file(DTW_PROJECT_KEY_CDR, CLEANED_FILENAME) as w:
    print("Writing to data.world:", CLEANED_FILENAME)
    cdr.to_csv(w, index=False)

Writing to data.world: cleaned_custodial_death_reports.csv
