# Inputs to crime severity from scraped sources

TODO: REVISE

Not every crime is created equal, so when gauging the "amount" of crime in aggregate, we need an appropriate weighting. This severity weighting can be measured in many ways, here we adopt an approach where we presume the punishment as according to the law appropriately classifies the severity of a crime.

TODO: Explain use of convictions dataset

In [1]:
import pandas as pd
import requests
import numpy as np
import io
import re
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Crime severity according to 2005-2009 convictions

In [2]:
humble_factor_for_non_prison_sentence = 0.0

In [3]:
from external_lib import statute

In [4]:
crime_convictions = pd.read_csv('./data/cook_county_conviction_records__2005_2009.csv')

In [5]:
x = list(map(str, crime_convictions['chrgdesc'].unique()))

In [10]:
crime_convictions.columns

Index(['case_number', 'sequence_number', 'st_address', 'city', 'state',
       'zipcode', 'arrest_date', 'initial_date', 'sex', 'statute', 'chrgdesc',
       'chrgtype', 'chrgtype2', 'chrgclass', 'chrgdisp', 'chrgdispdate',
       'ammndchargstatute', 'ammndchrgdescr', 'ammndchrgtype',
       'ammndchrgclass', 'minsent_years', 'minsent_months', 'minsent_days',
       'minsent_life', 'minsent_death', 'maxsent_years', 'maxsent_months',
       'maxsent_days', 'maxsent_life', 'maxsent_death', 'amtoffine'],
      dtype='object')

In [38]:
tmp = crime_convictions.copy()
tmp['chrgdesc'] = np.where(tmp['chrgdesc'].isnull(), '', tmp['chrgdesc'])
tmp[tmp['chrgdesc'].str.contains('ASS')]['chrgdesc']

45                  AGG CRIM SEX ASSAULT
46                  AGG CRIM SEX ASSAULT
47                  AGG CRIM SEX ASSAULT
54                  AGG CRIM SEX ASSAULT
55                  AGG CRIM SEX ASSAULT
92                  AGG CRIM SEX ASSAULT
93                  AGG CRIM SEX ASSAULT
126                 AGG CRIM SEX ASSAULT
127                 AGG CRIM SEX ASSAULT
221                 AGG CRIM SEX ASSAULT
232               ATT (AGG CRIM SEX ASSA
233               ATT (CRIM SEX ASSAULT)
258                 AGG CRIM SEX ASSAULT
259                 AGG CRIM SEX ASSAULT
289               ATT (AGG CRIM SEX ASSA
290               ATT (AGG CRIM SEX ASSA
291               ATT (AGG CRIM SEX ASSA
292               ATT (AGG CRIM SEX ASSA
293               ATT (AGG CRIM SEX ASSA
294               ATT (AGG CRIM SEX ASSA
299                 AGG CRIM SEX ASSAULT
343                 AGG CRIM SEX ASSAULT
467                          AGG ASSAULT
468                          AGG ASSAULT
550             

In [16]:
# types of crimes
crime_convictions.groupby('chrgdisp').count()['case_number']

chrgdisp
COND DISCH AND OTHER DISC COND       271
CREDIT DEFENDANT FOR TIME SERV    109213
DEF SENT CCDOC, PERIODIC IMP          18
DEF SENT CONDITIONAL DISCHARGE      2211
DEF SENT TO HOME CONFINEMENT         298
DEF SENT TO INDETERMINATE TERM         6
DEF SENT TO LIFE IMPRISONMENT        346
DEF SENT TO PAY RESTITUTION         4214
DEF SENTENCED ILLINOIS DOC        101995
DEF SENTENCED TO COOK CNTY DOC     19827
DEF SENTENCED TO DEATH                 8
DEF SENTENCED TO PROBATION         42445
ELIGIBLE IMPACT PROG-BOOT CAMP      3075
PROB AND OTHER DISC CONDS          22724
SENT PER IMP + OTH DISC COND          10
SENT/CRT SUPV-SOC SERV-SPC CND        37
SENT/CRT SUPV/SPEC CONDS             116
SENTENCED CCDOC BOOT CAMP           4460
SENTENCED TO COMMUNITY SERVICE     10103
SUPERVISION - SOCIAL SERV DEPT       213
Name: case_number, dtype: int64

In [17]:
crime_convictions_no_credit = crime_convictions[~crime_convictions['chrgdisp'].str.contains('CREDIT')]

In [18]:
def parse_code(cd):
    try:
        return statute.parse_statute(' '.join(cd.split()))
    except:
        return []
    
def parse_iucr(cd):
    try:
        return statute.get_iucr(statute.parse_statute(' '.join(cd.split())))
    except:
        return []

In [19]:
def parse_term(days, months, yrs, life_sent):
    return days / 360 + months / 12 + yrs + (80 if life_sent else 0)

def humbled_term(days, months, yrs, life_sent, chrgdisp):
    humble_factor = \
        1.0 if chrgdisp in ['DEF SENTENCED TO COOK CNTY DOC', 'DEF SENTENCED ILLINOIS DOC',
                            'DEF SENT TO LIFE IMPRISONMENT', 'DEF SENTENCED TO DEATH']\
        else humble_factor_for_non_prison_sentence
    return parse_term(days, months, yrs, life_sent) * humble_factor

In [20]:
parsed_convictions_degenerate_rows = crime_convictions_no_credit.copy()[['statute']]
parsed_convictions_degenerate_rows['statute'] = parsed_convictions_degenerate_rows['statute'].map(parse_iucr)

In [21]:
parsed_convictions_degenerate_rows['min_sent'] =\
    crime_convictions_no_credit[['minsent_days', 'minsent_months', 'minsent_years', 'minsent_life', 'chrgdisp']]\
    .apply(lambda row: humbled_term(*row), axis = 1)
# humble all non - imprisonment sentences

In [22]:
parsed_convictions_degenerate_rows['max_sent'] =\
    crime_convictions_no_credit[['maxsent_days', 'maxsent_months', 'maxsent_years', 'maxsent_life', 'chrgdisp']]\
    .apply(lambda row: humbled_term(*row), axis = 1)

Split the degenerate IUCR codes to their own rows.

Note that some convictions have unreasonably large values (999) or missing convictions (NaN), lets cut those out

In [23]:
iucr_row_split = [ [offense.code] + list(row[1:])\
                   for row in parsed_convictions_degenerate_rows.values for offense in row[0] ]

In [24]:
parsed_convictions = pd.DataFrame(iucr_row_split)
parsed_convictions.columns = ['iucr', 'minsent_yrs', 'maxsent_yrs']
parsed_convictions['iucr_category'] = parsed_convictions['iucr'].str[:2]
parsed_convictions = parsed_convictions[(parsed_convictions['minsent_yrs'] < 100) &\
                                        ~parsed_convictions['minsent_yrs'].isnull()]
# Include category-level for matching later on. IUCR categories are defined by the first 2 digits in the IUCR code
parsed_convictions_category_level = parsed_convictions.copy()
parsed_convictions_category_level['iucr'] = parsed_convictions_category_level['iucr'].str[:2] + 'XX'
parsed_convictions = pd.concat([parsed_convictions, parsed_convictions_category_level])

In [25]:
parsed_convictions_minsent = parsed_convictions.groupby('iucr')['minsent_yrs']\
    .agg(['mean', 'median', 'min', 'max'])

In [26]:
parsed_convictions_minsent

Unnamed: 0_level_0,mean,median,min,max
iucr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0110,26.341289,22.000000,0.000000,95.0
0115,23.722222,30.000000,0.000000,50.0
0130,9.500000,7.000000,4.000000,20.0
0141,3.738734,3.000000,0.000000,20.0
0142,3.738734,3.000000,0.000000,20.0
0160,4.013523,5.000000,0.000000,14.0
0165,6.600000,5.000000,4.000000,14.0
0170,7.400000,7.000000,4.000000,10.0
01XX,23.678793,20.000000,0.000000,95.0
0260,5.079697,4.000000,0.000000,30.0


In [27]:
parsed_convictions_minsent.head()

Unnamed: 0_level_0,mean,median,min,max
iucr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
110,26.341289,22.0,0.0,95.0
115,23.722222,30.0,0.0,50.0
130,9.5,7.0,4.0,20.0
141,3.738734,3.0,0.0,20.0
142,3.738734,3.0,0.0,20.0


In [28]:
parsed_convictions_minsent.to_csv('./data/parsed_convictions_minsent.csv')