# Data Normalization & Classification: 
# Police Violence in North America

### Workshop/Lecture by June Skeeter

# Learning Outcomes:
1) Investigate how data normalization impacts the way we perceive patterns in a dataset

2) Look at different data classification methods and how they impact the way we perceive patterns in a dataset
    
* A) Revisit measurement scales, how are they related to classification methods?
* B) Choropleth mapping

# Content Warning:
* This lecture/workshop deals with a difficult and painful subject that may be triggering to some people.  The datasets we're using today describe incidents of police killings in Canada and the United States

# Pre-Lecture Poll questions:

### 1) Which country has a higher frequency of police violence?
    A) Canada
    B) The United States
    C) They're about equal
    
### 2) Which country has a greater racial disparity in incidetns of police violence?
    A) Canada
    B) The United States
    C) They're about equal

In [9]:
## This module aggregates the data into a format that's easy for us to work with
import ParseData_BU as ParseData
import numpy as np
import pandas as pd
import scipy.stats as stats
## We'll use maptloltib to make some plots
import matplotlib
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import pytablewriter
%matplotlib notebook
## Calling "GetData" does all of our prepprocessing
Data = ParseData.GetData()#start_year=2016)

In [10]:
# print(Data.CA_PoliceKillings.loc[Data.CA_PoliceKillings['prov']=='SK'])
print(Data.CA_PoliceKillings.race.unique())
# Saskatoon Police Service

['Indigenous' 'White' 'Unknown' 'Asian' 'Black' 'Middle Eastern'
 'South Asian' 'Visible minority, n.i.e' 'Latin American']


# Canadian Police Violence Data


This data was collected by the CBC and is available for download here: 
    https://newsinteractives.cbc.ca/fatalpoliceencounters/
    
* "There is no government database listing deaths at the hands of the police available to the public in Canada, so CBC News created its own. The CBC’s research librarians have collected detailed information on each case, such as ethnicity, the role of mental illness or substance abuse, the type of weapon used and the police service involved, to create a picture of who is dying in police encounters. "
    
    
* This is not an official count because police departments in Canada are not mandated to collect all of this information.  Rather this dataset is a collection of second hand information in the form of press releases, news articles, etc.  Some records are incomplete, and the total number of incidents is likely higher than detailed here.

# 1) Police killings by year
* There were 556 killings between January 2000 - June 2020
    * Increasing trend0.85 killings/year.
    * 2020 is on pace to be a record breaking year.

In [13]:
CA_Total=Data.CA_PoliceKillings['prov'].count()
print('Total Police Killings in Canada: Jan 2000 - June 2020: '+str(CA_Total.sum()))
ByYear=Data.CA_PoliceKillings.resample('YS').count()['prov']

Data.CA_PoliceKillings['Year']=Data.CA_PoliceKillings.index.year
PID_Canada = Data.CA_PoliceKillings.groupby(['Year','data_source']).count()['prov'].unstack().fillna(0)

print(PID_Canada)
LR = stats.linregress(ByYear.index.year,ByYear.values)
# print(LR)

# print(Year_Data)
# # print(CBC_2018)

fig,ax=plt.subplots(1,1)
ax.set_title('Police Involved Deaths by Year',loc='left')
ax.bar(PID_Canada.index,PID_Canada['CBC: Deadly Force'],
       color=[1,0.5,0.5],edgecolor='k',label='CBC: Deadly Force')
ax.bar(PID_Canada.index,PID_Canada['Both'],bottom = PID_Canada['CBC: Deadly Force'],
       color=[1,0,0],edgecolor='k',label='Both')
ax.bar(PID_Canada.index,PID_Canada['KillerCopsCanada'],bottom = PID_Canada['CBC: Deadly Force']+PID_Canada['Both'],
       color=[0.5,0.5,0.5],edgecolor='k',label='KillerCopsCanada')
# ax.bar([2020],[ByYear.values[-1]],color=[1,.5,.5],edgecolor='k',label='Total Jan-Nov')
ax.plot(PID_Canada.index,PID_Canada.index*LR[0]+LR[1],
        color='k',label = 'Trend Line: '+str(np.round(LR[0],2))+' killings per year')
plt.grid()
plt.legend()
plt.tight_layout()
plt.savefig('Content/CA_Trendline.png')

Total Police Killings in Canada: Jan 2000 - June 2020: 797
data_source  Both  CBC: Deadly Force  KillerCopsCanada
Year                                                  
2000          0.0               19.0               0.0
2001          0.0               15.0               0.0
2002          0.0               14.0               0.0
2003          0.0               15.0               0.0
2004          0.0               27.0               0.0
2005          0.0               31.0               0.0
2006          0.0               22.0               0.0
2007          1.0               27.0               0.0
2008          1.0               24.0               0.0
2009          0.0               25.0               1.0
2010          1.0               28.0               2.0
2011          0.0               29.0               0.0
2012          1.0               22.0               5.0
2013          3.0               21.0               1.0
2014          7.0               17.0               3.0
2015  

<IPython.core.display.Javascript object>

# 2) Age distribution of victims

Histograms show the shape and spread of a dataset.
* Here we see the age distribution of victims in 5 year increments.
    * The youngest was 15 and the oldest was 77
    * The mean age is 35.6, the standard deviation is 11.6
* The histogram shows us that the age is slightly skewed towards older ages
    * The distribution has a tail

In [6]:
print(Data.CA_PoliceKillings['age'].describe())
fig,ax=plt.subplots(1,1)
Data.CA_PoliceKillings['age'].hist(bins = np.arange(0,85,5),color=[1,0,0],edgecolor='k',ax=ax)

plt.axvline(Data.CA_PoliceKillings['age'].mean(), color='k', linestyle='dashed', linewidth=2,label='Mean')
plt.axvline(Data.CA_PoliceKillings['age'].mean()+Data.CA_PoliceKillings['age'].std(), 
            color='b', linestyle='dashed', linewidth=2,label='1 Standard Deviation')
plt.axvline(Data.CA_PoliceKillings['age'].mean()-Data.CA_PoliceKillings['age'].std(), 
            color='b', linestyle='dashed', linewidth=2)
# plt.axvline(Data.CA_PoliceKillings['age'].quantile(.25), color='b', linestyle='dashed', linewidth=1)
# plt.axvline(Data.CA_PoliceKillings['age'].quantile(.25), color='b', linestyle='dashed', linewidth=1)
ax.set_title('Age of Victims',loc='left')
plt.legend()
plt.tight_layout()
plt.savefig('Content/CA_AgeHist.png')

count    743.000000
mean      37.059219
std       13.157423
min        1.000000
25%       27.000000
50%       36.000000
75%       45.000000
max       94.000000
Name: age, dtype: float64


<IPython.core.display.Javascript object>

# 3) What type of weapon (if any) did the victim have?
* Nearly 30% of victims were unarmed.
    * Note - Being Armed is does not justify any individual police killing.
    * However, in aggregate a higher number of killings of unarmed people can indicate a predisposition towards excessive use of force.

In [7]:
fig,ax=plt.subplots(1,1)
ax.set_title('Weapon Type',loc='left')
Weaopn_Type=Data.CA_PoliceKillings.groupby('armed_type').count()['prov'].sort_values()
# print(Weaopn_Type)
ax.pie(Weaopn_Type.values,labels=Weaopn_Type.index,
    autopct='%1.1f%%')
plt.tight_layout()
plt.savefig('Content/CA_Weapon.png')

<IPython.core.display.Javascript object>

In [6]:
print(Data.CA_PoliceKillings.loc[((Data.CA_PoliceKillings.prov=='QC') &
                                  (Data.CA_PoliceKillings.index.year == 2019))])

                 date  first_name  last_name  middle_name   age gender  \
date                                                                     
2019-07-05 2019-07-05       Sandy      Alaku  Unspecified  48.0   Male   
2019-06-28 2019-06-28       Denis  Chalifoux  Unspecified  50.0   Male   
2019-03-30 2019-03-30  Jean-Louis    D'amour          NaN  77.0   Male   

                  race prov                              department  \
date                                                                  
2019-07-05  Indigenous   QC           Kativik Regional Police Force   
2019-06-28     Unknown   QC  Service de police de la Ville de Laval   
2019-03-30     Unknown   QC                        Sûreté du Québec   

           cause_death  ... sixth KCC ID seventh KCC link seventh KCC ID  \
date                    ...                                                
2019-07-05     Vehicle  ...          NaN              NaN            NaN   
2019-06-28         NaN  ...          NaN     

In [7]:
# Data.CA_PoliceKillings.race.fillna('Unknown',inplace=True)
Cat = 'race'
Departments=Data.CA_PoliceKillings.groupby(['department','prov',Cat]).count()['summary'].unstack()
Departments=Departments.reset_index().set_index('department')
Departments=Departments.fillna(value=0)

Departments['Total'] = Departments[Data.CA_PoliceKillings[Cat].unique()].sum(axis=1)

Departments['NAME']=Departments.index
Departments['NAME']=Departments['NAME'].str.replace(' Department','')
Departments['NAME']=Departments['NAME'].str.replace(' Services','')
Departments['NAME']=Departments['NAME'].str.replace(' Service','')
Departments['NAME']=Departments['NAME'].str.replace(' Force','')
Departments['NAME']=Departments['NAME'].str.replace('Service de police de la Ville de ','')
Departments['NAME']=Departments['NAME'].str.replace('Service de la sécurité publique de ','')
Departments['NAME']=Departments['NAME'].str.replace('Service de police de ','')
Departments['NAME']=Departments['NAME'].str.replace('Régie intermunicipale de police ','')
Departments['NAME']=Departments['NAME'].str.replace('Service de sécurité publique de ','')
Departments['NAME']=Departments['NAME'].str.replace('Sécurité publique de ','')

Departments['City']=Departments['NAME'].str.replace('Ontario Provincial Police','')
Departments['City']=Departments['City'].str.replace(' Police','')
Departments['City']=Departments['City'].str.replace('RCMP','')
Departments['City']=Departments['City'].str.replace('Sûreté du Québec','')
Departments['City']=Departments['City'].str.replace(' Regional','')
Departments['City']=Departments['City'].str.replace('Royal Newfoundland Constabulary','')
Departments['City']=Departments['City'].str.replace(' Community','')
Departments['City']=Departments['City'].str.replace('South Coast British Columbia Transit Authority','')
Departments['City']=Departments['City'].str.replace("l'agglomération de ",'')
Departments['City']=Departments['City'].str.replace('du ','')

Departments['TYPE']='Municipal/Regional'
# Departments.loc[Departments.index.str.contains('Regional')==True,'TYPE']='Regional'
# Departments.loc[Departments.index.str.contains('Toronto')==True,'TYPE']='Regional'
# Departments.loc[Departments.index.str.contains("Service de police de l'agglomération de ")==True,'TYPE']='Regional'
# Departments.loc[Departments.index.str.contains('Transit Authority Police Service')==True,'TYPE']='Regional'
# Departments.loc[Departments.index.str.contains('Régie intermunicipale de police')==True,'TYPE']='Regional'
# Departments.loc[Departments.index.str.contains('Service de police de la Ville de Montréal')==True,'TYPE']='Regional'
Departments.loc[Departments.index.str.contains('RCMP')==True,'TYPE']='RCMP'
Departments.loc[Departments.index.str.contains('Ontario Provincial Police')==True,'TYPE']='Provincial'
Departments.loc[Departments.index.str.contains('OPP')==True,'TYPE']='Provincial'
Departments.loc[Departments.index.str.contains('Sûreté du Québec')==True,'TYPE']='Provincial'
Departments.loc[Departments.index.str.contains('Royal Newfoundland Constabulary')==True,'TYPE']='Provincial'

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "Deadliest Police Departments in Canada"
writer.header_list = ['Rank',"Department", "Province", "Killings"]
TB = Departments.loc[Departments['Total']>=10].sort_values(by='Total',ascending=False).reset_index()#[0:10]
writer.value_matrix = [[index+1,value['department'],value['prov'],value['Total']]for index,value in TB.iterrows()]
# list(Departments.groupby('TYPE').count()['NAME'])
#     ["1", "AD", "Europe/Andorra"],
#     ["2", "AE", "Asia/Dubai"],
#     ["3", "AF", "Asia/Kabul"],
#     ["4", "AG", "America/Antigua"],
#     ["5", "AI", "America/Anguilla"],
# ]
# print(Departments.head())
writer.write_table()

print(Departments['Total'].count())
print(Departments.loc[Departments['Total']>=10,'Total'].sum())
print(Departments.loc[Departments['Total']>=10,'Total'].count())

print(Departments.loc[Departments['Total']>=10,'Total'].sum()/CA_Total.sum())

print(Departments.loc[Departments.index == 'RCMP'].sum()['Total'])

print(Departments.loc[Departments.index == 'RCMP'].sum()['Total']/CA_Total.sum())

# print(80/CA_Total.sum())


# Deadliest Police Departments in Canada
|Rank|       Department        |Province|Killings|
|---:|-------------------------|--------|-------:|
|   1|RCMP                     |BC      |      53|
|   2|Ontario Provincial Police|ON      |      31|
|   3|RCMP                     |AB      |      28|
|   4|Toronto Police Service   |ON      |      24|
|   5|Edmonton Police Service  |AB      |      19|
|   6|Winnipeg Police Service  |MB      |      18|
|   7|Calgary Police Service   |AB      |      17|
|   8|Sûreté du Québec         |QC      |      17|
|   9|Surete du Quebec         |QC      |      13|
|  10|Peel Regional Police     |ON      |      12|
|  11|Ottawa Police Service    |ON      |      11|
76
243.0
11
0.6214833759590793
111.0
0.28388746803069054


In [8]:
PID_Canada=Data.CA_PoliceKillings.groupby(['city_town','prov','race']).count()['department']#.reset_index().sort_values(by='department')
PID_Canada = PID_Canada.unstack().reset_index().fillna(0)
races = Data.CA_PoliceKillings.race.unique()
PID_Canada['Total']=PID_Canada[races].sum(axis=1)
PID_Canada

race,city_town,prov,Asian,Black,Indigenous,Latin American,Middle Eastern,South Asian,Unknown,"Visible minority, n.i.e",White,Total
0,Ahtahkakoop,SK,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,Airdrie,AB,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
2,Akulivik,QC,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,Alix,AB,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
4,Amherst,NS,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...
219,Winnipeg,MB,0.0,1.0,12.0,0.0,0.0,0.0,5.0,0.0,1.0,19.0
220,Winnipeg,ON,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
221,York,ON,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
222,kugluktuk,NU,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [9]:
from geopy.geocoders import MapBox
import geopandas as gpd
api_key="pk.eyJ1IjoianVuZXNwYWNlYm9vdHMiLCJhIjoiY2twY3g4aXloMWFlcDJzbXN3aG95aG5uZiJ9.mFiJt0MIfL1MiJ2rB2xhKQ"

geolocator = MapBox(api_key=api_key)

# print(Departments)
PID_Canada['Lat']=np.nan
PID_Canada['Lon']=np.nan
PID_Canada['Geocoding_notes']=''
print(PID_Canada.shape)
for index,row in PID_Canada.iterrows():
    print(index)
    try:
        Point = geolocator.geocode(row.city_town+', '+Data.can_province_names[row.prov]+', Canada')#,exactly_one=False)
        PID_Canada.loc[PID_Canada.index==index,['Lat','Lon']]=[Point.latitude,Point.longitude]
    except:
        PID_Canada.loc[PID_Canada.index==index,'Geocoding_notes']='Geocoding Failed'
        pass



(224, 15)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223


In [10]:
print(PID_Canada.loc[PID_Canada['Geocoding_notes']=='Geocoding Failed'])

Empty DataFrame
Columns: [city_town, prov, Asian, Black, Indigenous, Latin American, Middle Eastern, South Asian, Unknown, Visible minority, n.i.e, White, Total, Lat, Lon, Geocoding_notes]
Index: []


In [11]:
# from geopy.geocoders import MapBox
import folium
# # from IPython.display import clear_output

# api_key = "pk.eyJ1IjoianVuZXNwYWNlYm9vdHMiLCJhIjoiY2s4a285NTM1MDQwbDNocHozdXlkNzIyaSJ9.-_8gh9gG4VuprIPnmXBr3A"

def plot_point(Map,X,Y,Popup_Text,Color='olive',Radius=5,Opacity=.75,LineColor='black',LineWidth=.15):
    folium.CircleMarker(
        # The coordiatnates
        location=[X,Y],
        # Text description
        popup=Popup_Text,
        # sets the fill color for the point
        fill_color=Color,
        # Size of the marker
        radius=Radius,
        # Opacity of the circle
        fill_opacity = Opacity,
        # Sets the line color for the edge
        color=LineColor,
        # Width of the border line
        line_weight=LineWidth,
    ).add_to(Map)

Scale,Offset=1,3
    
Final_Map = folium.Map(
    location=[60,-91.5],
    zoom_start=3,
    tiles='Stamen Toner'
)

group0 = folium.FeatureGroup(name='Police Involved Deaths')
# group1 = folium.FeatureGroup(name='Provincial Police Forces')
# group2 = folium.FeatureGroup(name='Municipal/Regional Police Forces')
# Cats=Data.CA_PoliceKillings['race'].unique()
for index, row in PID_Canada.loc[PID_Canada['Geocoding_notes']!='Geocoding Failed'].iterrows():
#     if row['geocoding_Notes'] != 'Geocoding Failed':
        
#     popuptext='''<body> <h2>'''+row['city_town']+'''</h2> 
#                      <table style="width:100%">
#                      <tr>
#                      <th>Number of Deaths &nbsp </th>
#                      <th>'''+str(row['department'])+ '''</th>
#                      </tr>
#                       '''
    
    # ''' indicates we're wrting multiline strings
    # We'll write the header and the top row of the table
    popuptext='''<body> <h2>'''+row['city_town']+'''</h2> 
                <table style="width:100%">
                <tr>
                <th>race</th>
                <th>Number of Killings &nbsp </th>
                <th>Percentage of Total</th>
                </tr>
                 '''

    # We'll sort each row by race so the largest appears at the top of the table
    Sorted = row[races].sort_values(ascending=False)
    for i,v in zip(Sorted.index,Sorted.values):
        if (np.isnan(v)==False) and (v>0):
            if np.isnan(v)==False:
                # If the value for a race exists, we'll add it to the table
                popuptext+= '''<tr>
                            <td>'''+i+''' &nbsp </td>
                            <td>'''+str(int(v))+'''</td>
                            <td>'''+str(int(v/row['Total']*100))+'''%</td>
                            </tr>'''
#         We'll add row at the bottom of the table with the total


    popuptext+='''<tr>
                  <th>Total </th>
                  <th>'''+str(row['Total'])+'''</th>
                  <th> </th>
                  </tr>'''

        # We'll convert the text tot html
    test = folium.Html(popuptext, script=True)
#         folium.
        
        # This defines the parameters for the popup text box
    popup = folium.Popup(test, max_width=400,min_width=300)
        
#         # Now we can send the popup to the CircleMarker
#         if row['TYPE']=='RCMP':
    plot_point(Map=group0,
               X=row.Lat,#['latitude'],
               Y=row.Lon,#['longitude'],
               Popup_Text=popup,
               # This is hexcode for the official red of Canada
               Color='#FF0000',
               # We'll scale the radius by the number of killings + 2
               Radius=(row['Total'])*Scale+Offset,
              )
            
#         elif row['TYPE']=='Provincial':
#             plot_point(Map=group1,
#                        X=row.Lat,#['latitude'],
#                        Y=row.Lon,#['longitude'],
#                        Popup_Text=popup,
#                        # This is hexcode for the official red of Canada
#                        Color='#9400D3',
#                        # We'll scale the radius by the number of killings + 2
#                        Radius=(row['Total'])*Scale+Offset,
#                       )
#         else:
#             plot_point(Map=group2,
#                    X=row.Lat,#['latitude'],
#                    Y=row.Lon,#['longitude'],
#                    Popup_Text=popup,
#                    # This is hexcode for the official red of Canada
#                    Color='#0000ff',
#                    # We'll scale the radius by the number of killings + 2
#                    Radius=(row['Total'])*Scale+Offset,
#                   )
group0.add_to(Final_Map)


# group2.add_to(Final_Map)


# group1.add_to(Final_Map)
    
folium.LayerControl().add_to(Final_Map)
    
Final_Map.save('Content/PoliceViolenceIncidents.html')
Final_Map


# 5) The racial breakdown of police killings.
* The majority of police killings are white people
    * The second largest demographic is "Unknown", which in most cases means the this information was not recorded by the police.  Since this data was collected and reported by a third party rather through a central database, information is missing.  This information should be mandatory for police departments to collect and publish.
* Demographic groups are not evenly represented in the populations
    * Canada is about 73.4% White while its only 4.7% Indigenous and 3.4% Black


In [12]:
print(Data.CA.sum()['Black']/Data.CA.sum()['Total'])

races = (Data.CA_PoliceKillings['race'].unique())
Pop = Data.CA[races].sum().to_frame()
print(Pop)
# print(Data.CA[Data.CA_PoliceKillings['race'].unique()])

fig,ax=plt.subplots(1,1)
ax.grid(axis='x',zorder=0)
ax.set_title('Racial Distribution of Police Killings',loc='left')
Byrace=Data.CA_PoliceKillings.groupby('race').count()['prov'].sort_values()
Byrace=Byrace.to_frame()
# Byrace = Byrace.join(Pop/Pop.sum(),lsuffix='_Pop')
print(Byrace)
# Byrace[['prov',0]].plot(kind='barh')
ax.barh(np.arange(Byrace.index.shape[0]),Byrace['prov'].values,color='#b01005',edgecolor='k',
        height=.8,label='Proportion of Police Killings',zorder=2)
# ax.barh(np.arange(Byrace.index.shape[0]),Byrace[0].values,color='#eb4034',edgecolor='k',
#         height=.4,label='Proportion of Population')
ax.set_yticks(np.arange(Byrace.index.shape[0]))
ax.set_yticklabels(Byrace.index)
ax.set_xlabel('Killings')
# ax.legend()
plt.tight_layout()
plt.savefig('Content/CA_race.png')



fig,ax=plt.subplots(1,1)
ax.grid(axis='x',zorder=0)

ax.set_title('Proportional of Total: Police Killings and Population',loc='left')
Byrace=Data.CA_PoliceKillings.groupby('race').count()['prov'].sort_values()
Byrace=(Byrace/Byrace.sum()).to_frame()
Byrace = Byrace.join(Pop/Pop.sum(),lsuffix='_Pop')
print(Byrace)
# Byrace[['prov',0]].plot(kind='barh')
ax.barh(np.arange(Byrace.index.shape[0])-.4,Byrace['prov'].values*100,color='#b01005',edgecolor='k',
        height=.4,label='Proportion of Police Killings',zorder=2)
ax.barh(np.arange(Byrace.index.shape[0]),Byrace[0].values*100,color='#eb4034',edgecolor='k',
        height=.4,label='Proportion of Population',zorder=2)
ax.set_yticks(np.arange(Byrace.index.shape[0]))
ax.set_yticklabels(Byrace.index)
ax.set_xlabel('Percent %')
ax.legend()
plt.tight_layout()

plt.savefig('Content/CA_race_Proportional.png')


fig,ax=plt.subplots(1,1)
ax.grid(axis='x',zorder=0)

ax.set_title('Police Involved Death Rates by Race',loc='left')
Byrace=Data.CA_PoliceKillings.groupby('race').count()['prov'].sort_values()
Byrace=Byrace.to_frame()
Byrace = Byrace.join(Pop,lsuffix='_Pop')
Byrace['Norm']=Byrace['prov'].values/Byrace[0].values*1e6/21
Byrace=Byrace.sort_values(by='Norm')[:-1]
print(Byrace)
# Byrace[['prov',0]].plot(kind='barh')
# ax.barh(np.arange(Byrace.index.shape[0])-.4,Byrace['prov'].values,color='#b01005',edgecolor='k',
#         height=.4,label='Proportion of Police Killings',zorder=2)
ax.barh(np.arange(Byrace.index.shape[0]),Byrace['Norm'],color='#b01005',edgecolor='k',
        height=.8,label='Proportion of Population',zorder=2)
ax.set_yticks(np.arange(Byrace.index.shape[0]))
ax.set_yticklabels(Byrace.index)
ax.set_xlabel('Deaths per Million People per Year')
# ax.legend()
plt.tight_layout()

plt.savefig('Content/CA_race_Normalized.png')#_Even_If



print(CA_Total.sum()/Data.CA.Total.sum()/21*1e6)
# print(Data.CA.Total.sum())
print(CA_Total.sum())

0.034096332333932486
                                0
Indigenous                1673780
Unknown                         0
White                    25803358
Middle Eastern             523235
Black                     1198545
Asian                     3216380
South Asian               1924640
Visible minority, n.i.e    364460
Latin American             447330


<IPython.core.display.Javascript object>

                         prov
race                         
Latin American              1
Middle Eastern              1
Visible minority, n.i.e     2
Asian                       3
South Asian                 3
Black                      20
Indigenous                 76
White                      99
Unknown                   186


<IPython.core.display.Javascript object>

                             prov         0
race                                       
Latin American           0.002558  0.012726
Middle Eastern           0.002558  0.014885
Visible minority, n.i.e  0.005115  0.010368
Asian                    0.007673  0.091500
South Asian              0.007673  0.054752
Black                    0.051151  0.034096
Indigenous               0.194373  0.047616
White                    0.253197  0.734057
Unknown                  0.475703  0.000000


<IPython.core.display.Javascript object>

                         prov         0      Norm
race                                             
Asian                       3   3216380  0.044416
South Asian                 3   1924640  0.074225
Middle Eastern              1    523235  0.091009
Latin American              1    447330  0.106452
White                      99  25803358  0.182700
Visible minority, n.i.e     2    364460  0.261313
Black                      20   1198545  0.794614
Indigenous                 76   1673780  2.162200
0.5296765956725547
391


  Byrace['Norm']=Byrace['prov'].values/Byrace[0].values*1e6/21


In [13]:
PID_Canada

race,city_town,prov,Asian,Black,Indigenous,Latin American,Middle Eastern,South Asian,Unknown,"Visible minority, n.i.e",White,Total,Lat,Lon,Geocoding_notes
0,Ahtahkakoop,SK,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,53.909005,-106.133622,
1,Airdrie,AB,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,51.292222,-114.014167,
2,Akulivik,QC,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,60.805000,-78.202778,
3,Alix,AB,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,52.400000,-113.179722,
4,Amherst,NS,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,45.833392,-64.212968,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
219,Winnipeg,MB,0.0,1.0,12.0,0.0,0.0,0.0,5.0,0.0,1.0,19.0,49.884444,-97.146389,
220,Winnipeg,ON,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,44.120210,-77.548092,
221,York,ON,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,43.022400,-79.889800,
222,kugluktuk,NU,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,67.826667,-115.093330,


In [14]:
# print(C)
# print(PID_Canada.city_town)
# PID_Canada.loc[PID_Canada.city_town == C]#.values#.flatten()

In [15]:
Add = Data.CA_PoliceKillings['race'].loc[Data.CA_PoliceKillings['race']=='Unknown'].shape[0]
print(Byrace.loc[Byrace.index=='White','prov'])
if Byrace.loc[Byrace.index=='White','prov'].values[0] < Add:
    Byrace.loc[Byrace.index=='White','prov']+=Add
from scipy.stats import chisquare

print(Byrace)


R = Byrace.index.values#.drop('Unknown').values
print(R)
# Dept = ['Winnipeg Police Service','Vancouver Police Department','Toronto Police Service','Calgary Police Service','Edmonton Police Service']
City = ['Winnipeg','Vancouver','Toronto','Calgary','Edmonton']
Prov = ['MB','BC','ON','AB','AB']
FOBS = []
FEXP = []
for C,P in zip(City,Prov):
    F_obs = PID_Canada.loc[((PID_Canada.city_town == C)&
                           (PID_Canada.prov == P)),R].values.flatten()
    F_exp = Data.Municipal_Boundaries.loc[Data.Municipal_Boundaries['Name'] == C,R].values.flatten()

    F_exp = F_exp/F_exp.sum()*Data.CA_PoliceKillings.loc[((Data.CA_PoliceKillings['city_town']==C)&
                                                         (Data.CA_PoliceKillings['prov']==P)),'prov'].count()#F_obs.sum()

    FOBS.append(F_obs)
    FEXP.append(F_exp)
    res = chisquare(F_obs, f_exp=F_exp)

    print(res)

R2 = [r+'_Killings'for r in R]


# R = Byrace.index#.drop('Unknown').values


F_obs=Data.CA[R2].fillna(0).sum().values
F_exp=Data.CA[R].fillna(0).sum().values

F0 = F_exp.tolist()
print(F_exp)
F0.append(F_exp.sum())
F0.append(0)
F_exp = F_exp*(Data.CA_PoliceKillings.count().race/F_exp.sum())

print()
res = chisquare(F_obs, f_exp=F_exp)
print(res)


R = R.tolist()
R.append('Total')

# R = np.append(R,'Unknown')

F1 = F_exp.tolist()
F1.append(F_exp.sum())
# F1.append(0)
F2 = F_obs.tolist()
F2.append(Data.CA_PoliceKillings.count().race)
# F2.append(Data.CA['Unknown_Killings'].sum())

from tabulate import tabulate
d = {"By race": ["Total Population (Millions)", "Expected Distribtuion", "Observed Killings"]}
for i,r in enumerate(R):
    d[r]=[int(F0[i]/1e5)/10,int(F1[i]),int(F2[i])]
df = pd.DataFrame(d).set_index("By race")
df = df.T.sort_values(by='Total Population (Millions)',ascending=False).T
# df.loc[df.index!='Observed Killings','Unknown']='--'


print(tabulate(df.T, tablefmt="pipe", headers="keys"))
# print(Data.CA['Total_Killings'].sum()/Data.CA['Total'].sum()*1e6/21)
# print(Data.CA['Total_Killings'].sum())
# print(Data.CA_PoliceKillings.count())

# T = (Data.CA_PoliceKillings.loc[Data.CA_PoliceKillings['race']=='Unknown'].groupby('POLICE SERVICE').count()['GENDER'].sort_values())

# print(T/T.sum())

race
White    99
Name: prov, dtype: int64
                         prov         0      Norm
race                                             
Asian                       3   3216380  0.044416
South Asian                 3   1924640  0.074225
Middle Eastern              1    523235  0.091009
Latin American              1    447330  0.106452
White                     285  25803358  0.182700
Visible minority, n.i.e     2    364460  0.261313
Black                      20   1198545  0.794614
Indigenous                 76   1673780  2.162200
['Asian' 'South Asian' 'Middle Eastern' 'Latin American' 'White'
 'Visible minority, n.i.e' 'Black' 'Indigenous']
Power_divergenceResult(statistic=55.86794759620718, pvalue=1.0031269250018728e-09)
Power_divergenceResult(statistic=3.4229843328220344, pvalue=0.8433143125425777)
Power_divergenceResult(statistic=6.951358575293645, pvalue=0.4339619259409242)
Power_divergenceResult(statistic=9.969537196550402, pvalue=0.19030635098169235)
Power_divergenceResult

### Think about how comparing at the total killings for population groups of very different sizes might impact the way you perceive patterns.  Using this chart, what demographic group do you think is most likely to be killed by the police in Canada?

# 4) Which police departments are responsible for the most killings?
Here are all departments which have killed at least ten people in the last 20 years.
* Provincial police services and large municipal police departments are responsible for the most deaths
* The RCMP serves as the provincial police in eight provinces and the territories.
    * All together, the RCMP is responsible for 34% of deaths 

In [16]:
# print(Data.CA.Total.sort_values()/Data.CA.Total.sum())
ARMED_TYPE=(Data.CA_PoliceKillings.groupby(['department','armed_type']).count()['age'].unstack())
# print(ARMED_TYPE)
ARMED_TYPE_RCMP_prov=(Data.CA_PoliceKillings.loc[Data.CA_PoliceKillings['department']=='RCMP'].groupby(['prov','armed_type']).count()['age'].unstack())
ARMED_TYPE_RCMP_prov['Name'] = [Data.can_province_names[x] for x in ARMED_TYPE_RCMP_prov.index]
ARMED_TYPE_RCMP_prov['department']=ARMED_TYPE_RCMP_prov['Name']+' RCMP'

ARMED_TYPE_RCMP_prov=ARMED_TYPE_RCMP_prov.set_index(ARMED_TYPE_RCMP_prov['department'])
# print(ARMED_TYPE_RCMP_prov)
# print(ARMED_TYPE)

ARMED_TYPE = ARMED_TYPE.loc[ARMED_TYPE.index !='RCMP'].append(ARMED_TYPE_RCMP_prov)


Types = ARMED_TYPE.columns
ARMED_TYPE['Total']=ARMED_TYPE.sum(axis=1)

fig,ax=plt.subplots(1,1)
ax.set_title('4) Canada: department',loc='left')
# Byrace=Data.CA_PoliceKillings.groupby('department').count()['age'].sort_values()[-10:]
ARMED_TYPE = ARMED_TYPE.loc[ARMED_TYPE['Total']>10].sort_values(by='Total')
ax.barh(ARMED_TYPE.index,ARMED_TYPE['Total'].values,color=[1,0,0],edgecolor='k')
# ax.yaxis.set_tick_params(pad=160)
# ax.set_yticklabels(ARMED_TYPE.index, ha = 'left')
ax.yaxis.set_label_position("right")
ax.yaxis.tick_right()
plt.grid(axis='x')
plt.tight_layout()
plt.savefig('Content/CA_PoliceServices.png')


print(ARMED_TYPE['Total']/ARMED_TYPE['Total'].sum())
# fig,ax=plt.subplots(1,1)
# RCMP = Data.CA_PoliceKillings.loc[Data.CA_PoliceKillings['department']=='RCMP'].groupby('prov').count().sort_values(by='race')
# ax.set_title('6) RCMP: By Province',loc='left')
# # ARMED_TYPE = ARMED_TYPE.loc[ARMED_TYPE['Total']>10].sort_values(by='Total')
# ax.barh(RCMP.index,RCMP['race'].values,color=[1,0,0],edgecolor='k')
# plt.tight_layout()


<IPython.core.display.Javascript object>

department
Peel Regional Police         0.056122
Surete du Quebec             0.066327
Calgary Police Service       0.081633
Edmonton Police Service      0.086735
Sûreté du Québec             0.086735
Winnipeg Police Service      0.086735
Toronto Police Service       0.122449
Alberta RCMP                 0.132653
British Columbia RCMP        0.137755
Ontario Provincial Police    0.142857
Name: Total, dtype: float64


# Data Normalization

Normalization, is the process of scaling (AKA Normalizing) one number by another.
* For example, we can as the question:
    * Which police departments are most likely to kill an unarmed person?
* We need two pieces of information for each police department
    * A) The total unarmed victims
    * B) The total victims
* We can divide A by B, this will tell us what percentage of each departments victims were unarmed.
* So our normalization calculation would look like:

    
\begin{align}
\ Percent Unarmed & = (\frac{Unarmed Victims}{Total Victims})* 100 \\
\end{align}

This shows different patterns in the data that are easy to overlook when using raw counts
* Nearly half the people killed by BC RCMP did not have a weapon
    * Vancouver Police are the fourth most likely to kill an unarmed person.  Nearly 40% of their victims are unarmed.
    * Killing of unarmed people by police in our region is a severe problem.
    
# This information should be widely known and available.  The RCMP and other Police Services across Canada need to be held accountable.

In [17]:
import matplotlib.ticker as mtick
ARMED_TYPE['Unarmed%']=ARMED_TYPE['None']/ARMED_TYPE['Total']*100
ARMED_TYPE = ARMED_TYPE.fillna(0)
fig,ax=plt.subplots(2,1,figsize=(5,7))

ax[0].set_title('6 A) Canada: Unarmed Victims by Police Service',loc='left')
ARMED_TYPE = ARMED_TYPE.loc[ARMED_TYPE['Total']>10].sort_values(by='None')
ax[0].barh(ARMED_TYPE.index,ARMED_TYPE['None'].values,color=[1,0,0],edgecolor='k')
ax[0].yaxis.tick_right()
ax[0].grid(axis='x')

ax[1].set_title('6 B) Canada: Unarmed Victims % by Police Service',loc='left')
ARMED_TYPE = ARMED_TYPE.loc[ARMED_TYPE['Total']>10].sort_values(by='Unarmed%')
ax[1].barh(ARMED_TYPE.index,ARMED_TYPE['Unarmed%'].values,color=[1,0,0],edgecolor='k')
ax[1].yaxis.tick_right()
ax[1].xaxis.set_major_formatter(mtick.PercentFormatter())
ax[1].grid(axis='x')
plt.tight_layout()
plt.savefig('Content/CA_UnarmedFraction.png')

<IPython.core.display.Javascript object>

In [18]:

plt.figure()
plt.scatter(Departments.White,Departments.Indigenous)
LR = stats.linregress(Departments.White,Departments.Indigenous)
plt.plot(Departments.White,Departments.White*LR[0]+LR[1])
print(LR)

<IPython.core.display.Javascript object>

LinregressResult(slope=0.31010075776501, intercept=0.5960529602797896, rvalue=0.34276322337375736, pvalue=0.002437411926362382, stderr=0.09879919289363145, intercept_stderr=0.27829490585313243)


# Questions:

What are some other applications for data normalization?

What metric(s) might you want consider when looking at White number of electric cars in each province to gauge electirc car adoption?

A) Kilometers driven
B) Cars per family
C) Median Income
D) White Population
E) Average Car Price

# The United States Data
 
The United States Data is colected by a colaboration of researchers and data scientists and is availble for download here: https://mappingpoliceviolence.org/

"We believe the data represented on this site is the most comprehensive accounting of people killed by police since 2013. Note that the Mapping Police Violence database is more comprehensive than the Washington Post police shootings database: while WaPo only tracks cases where people are fatally shot by on-duty police officers, our database includes additional incidents such as cases where police kill someone through use of a chokehold, baton, taser or other means as well as cases such as killings by off-duty police."

This is not an official count.
* This dataset is a collection of second hand information in the form of press releases, news articles, etc.
* Some records are incomplete, and the total number of incidents is likely higher than detailed here.

In [19]:
US_Total=Data.US_PoliceKillings["State"].count()

print(Data.US_PoliceKillings["AGE"].describe())

fig,ax=plt.subplots(2,2,figsize=(8,6))
ax[0,0].set_title('1) Police Killings by Year',loc='left')
ByYear=Data.US_PoliceKillings.resample('YS').count()['AGE']
ax[0,0].bar(ByYear.index.year,ByYear.values,color=[0,0,1],edgecolor='k')
ax[0,0].bar([2021],[ByYear.values[-1]],color=[.5,.5,1],edgecolor='k')
# plt.tight_layout()

ax[0,0].grid(axis='y')
# plt.savefig('Content/US_ByYear.png')

# fig,ax=plt.subplots(1,1)
ax[1,0].set_title('3) RACE',loc='left')
ByRACE=Data.US_PoliceKillings.groupby('RACE').count()['AGE'].sort_values()

print(ByRACE/ByRACE.sum())
ax[1,0].barh(ByRACE.index,ByRACE.values,color=[0,0,1],edgecolor='k')

ax[1,0].yaxis.tick_right()
ax[1,0].grid(axis='x')

# plt.tight_layout()
# fig,ax=plt.subplots(1,1)
ax[1,1].set_title('4) Armed Type',loc='left')
ByRACE=Data.US_PoliceKillings.groupby('Armed/Unarmed Status').count()['AGE'].sort_values()
ax[1,1].pie(ByRACE.values,labels=ByRACE.index,
    autopct='%1.1f%%')
# plt.tight_layout()



# fig,ax=plt.subplots(1,1)
Data.US_PoliceKillings['AGE'].hist(bins = np.arange(0,110,5),color=[0,0,1],edgecolor='k',ax=ax[0,1])
ax[0,1].set_title('2) AGE Distribution of Victims')
ax[0,1].grid(axis='x')
plt.tight_layout()



plt.savefig('Content/US_Data.png')



count    5826.000000
mean       37.015105
std        12.919484
min         1.000000
25%        27.000000
50%        35.000000
75%        45.000000
max        91.000000
Name: AGE, dtype: float64


<IPython.core.display.Javascript object>

RACE
Pacific Islander    0.007209
Asian               0.012358
Indigenous          0.016650
Unknown             0.097666
Hispanic            0.179197
Black               0.248026
White               0.438895
Name: AGE, dtype: float64


# Part 2) Comparing to the United States  

There are more police killings in the United States than in Canada

* What factors do we need to look at to compare police killings between Canada and the United States?

* The United States has ten times the population of Canada.  If we don't acount for that, our comparsion wont make any sense
    * The graph below is comparing two countries with very different populations and two datasets with different periods of record.

In [20]:

fig,ax=plt.subplots()
ax.bar([0],CA_Total,color=[1,0,0],label='Canada\n1/2000 - 6/2020')
ax.bar([1],US_Total,color=[0,0,1],label='Unite States\n1/2013 - 10/2020')
ax.set_xticks([0,1])
ax.set_xticklabels(['Canada','Unite States'])
ax.set_title('Police Killings',loc='left')
ax.grid(axis='y')
ax.legend()
plt.tight_layout()
plt.savefig('Content/RawComparison.png')

<IPython.core.display.Javascript object>

# What to account for

### A) Record Length
The time periods of these datasets are different
We could only look at the time period when they overlap but, but this would require us to ignore some of the data. 
Alternatively, we can calculate the average number of killings per year.
The data are not from the same periods, but they will be on the same time scale, and they will be as inclusive as possible
### B) Population
The Canada has about 35 million residents.  The US has about 327 million.  
To make the datasets directly comparable, we need to normalize by the total population of each country.  This will allow us to calculate the police killing rate
### C) Scale
Dividing by the population would give us the average number of police killings per person per year.  This will be a very small decimal.  Integers (round numbers) are easier to interpret.  We can divide by the population in millions instead.

# Police Killing Rates
* By normalizing, we can more directly compare the patterns between geographic regions with different characteristics (Population) and datasets of different lenghts

In [21]:
CA_Rate = CA_Total /(Data.CA.Total.sum()*Data.CA_Length) * 1e6
US_Rate = US_Total /(Data.US.Total.sum()*Data.US_Length) * 1e6

fig,ax=plt.subplots()
ax.bar([0],CA_Rate,color=[1,0,0],label='Canada\n1/2000 - 6/2020',edgecolor='k')
ax.bar([1],US_Rate,color=[0,0,1],label='Unite States\n1/2013 - 10/2020',edgecolor='k')
ax.set_xticks([0,1])
ax.set_xticklabels(['Canada','Unite States'])
ax.set_title('Police Killing Rates',loc='left')
ax.set_ylabel('Killings per Million Residents per Year')
ax.legend()


ax.grid(axis='y')
ax.legend()
plt.tight_layout()
plt.savefig('Content/NormalizedComparison.png')

print(US_Rate/CA_Rate)

<IPython.core.display.Javascript object>

1.6824720199306755


# Racial Disparities

Systemic Racism is pervasive on both sides of the border

* The police violence dataset and census for each country use different demographic groupings
    * We'll compare the police killing rates of three demographic groups: White, Black, and Indigenous because they are in both datasets.
        * Whites are the majority in both countries, while black and indigenous people disproportionately impacted by police killings on both sides of the border.
* One Caveat, the race of the victim is unknown for 24% of Canadian and 10% of United States.
    * This adds uncertainty to the comparison.  It also means that the Police Killing Rates by race are underestimated, especially for Canada

# Systemic Racism in Policing

* Scaled, to their respective populations, we can see that Indigenous and Black people are much more likely to be killed by the police than white pople in both Canada and the United States
* The overal rates for each race are higher in the US than Canada
    * However the dispartiy between races is actually greater in Canada than the United States
    
* To show this, we can divide the black and indigenous rates fore each country by the white rate.
    * This will tell us how many times more likely a black or indigenous individual is to be killed by the police than a white individual in each country.
    * We can see that Indigenous and Black Canadians are 5.8 and 4.4 times more likely to be killied by police than a White Canadian
        * These disparities are higher than in the US
            * By this metric, you could suggest that police in Canada may be more racially biased than police in the US

In [22]:
print(Data.CA_PoliceKillings['race'].unique())
# print(Data.Combined)

Data.ScaleData(scale=1e6)
print('Police Killing Rates:')
# print(Data.Summary)
Data.Summary = Data.Summary.dropna()
Data.Summary = Data.Summary.loc[Data.Summary.index!='Asian']

fig,ax=plt.subplots(figsize=(6,5))
Data.Summary[['US','CA']].plot.barh(color=[[0,0,1],[1,0,0]],edgecolor='k',ax=ax,zorder=2)
ax.set_title('Police Killing Rates',loc='left')
ax.set_xlabel('Killings per Million Residents per Year')
ax.yaxis.tick_right()
ax.grid(axis='x',zorder=0)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])


plt.tight_layout()
plt.savefig('Content/Racial_Comparison.png')


['Indigenous' 'Unknown' 'White' 'Middle Eastern' 'Black' 'Asian'
 'South Asian' 'Visible minority, n.i.e' 'Latin American']
Police Killing Rates:


<IPython.core.display.Javascript object>

In [23]:
Ratio = Data.Summary.loc[Data.Summary.index=='Black']/Data.Summary.loc[Data.Summary.index=='White'].values
Ratio = Ratio.append(Data.Summary.loc[Data.Summary.index=='Indigenous']/Data.Summary.loc[Data.Summary.index=='White'].values)

print(Ratio)
fig,ax=plt.subplots()
Ratio.plot.barh(color=[[0,0,1],[1,0,0]],edgecolor='k',ax=ax,zorder = 2)
ax.set_title('Racial Disparites in Police Killings',loc='left')
ax.set_xlabel('PKR Relative White People')
ax.yaxis.tick_right()
ax.grid(axis='x',zorder=0)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])


plt.tight_layout()
plt.savefig('Content/Racial_Disparities.png')


                  US         CA
Black       3.190866   4.393653
Indigenous  3.307761  11.955435


<IPython.core.display.Javascript object>

# Systemic Racism in Policing is a Canadian Problem

This issue isn't restricted to America, it's pervasive in Canada as well and can not be overlooked. 

* The RCMP were created by Prime Minister John A. Macdonald.  He got the idea for the Mounties from the Royal Irish Constabulary, a paramilitary police force the British created to keep the Irish under control.  Initially called the "North West Mounted Rifles", their primary purpose to clear Indigenous people off their land.  The name was changed to "North-West Mounted Police" because officials in the United States raised concerns that an armed force along the border was a prelude to a military buildup.  This organization was renamed the Royal Canadian Mounted Police in 1904.

# Questions
* Which country is displaying normalized data?
A) Canada
B) The United States
C) Both
D) Neither

In [24]:
print(Data.Combined)

    Indigenous_Killings  Indigenous_Rate  Indigenous  Unknown_Killings  \
NL                  NaN         0.000000       45725               NaN   
PE                  NaN         0.000000        2735               NaN   
NS                  NaN         0.000000       51490               2.0   
NB                  2.0        12.254240       29380               1.0   
QC                  9.0         8.858511      182890              24.0   
..                  ...              ...         ...               ...   
ME                  1.0        20.881246        8566               3.0   
HI                  NaN         0.000000        3237               1.0   
NH                  NaN         0.000000        3562               1.0   
AZ                 12.0         6.492533      330599              42.0   
RI                  NaN         0.000000        4341               NaN   

    Unknown_Rate  White_Killings  White_Rate    White  \
NL      0.000000             3.0    1.168457   462186 

In [25]:
Rate = 'Total'
n_classes=4
Data.Breaks(column='Total_Killings',classes=n_classes,Manual_Bins=[1,4,7,91,175])
Data.US=Data.US.to_crs(Data.CA.crs)

labels=Data.CA[Rate+'_Killings_NB'].unique().sort_values()
colors = []
Grey = .85
for c in range(n_classes):
    colors.append(matplotlib.colors.to_hex([Grey+(c/(n_classes-1)*(1-Grey)),Grey-(c/(n_classes-1)*Grey),Grey-(c/(n_classes-1)*Grey)]))
CA_Color = {key:value for key,value in zip(labels,colors)}
# print(CA_Color)

# import matplotlib
fig,ax=plt.subplots(figsize=(7.5,7.5))
CA_Patches = [matplotlib.text.Annotation('Canada',(0,0))]
for i,klass in enumerate(Data.CA[Rate+'_Killings_NB'].unique().sort_values()):
#     try:
    kwargs = {'facecolor':CA_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':str(np.round(Data.CA_jenks[i],1))+' - '+str(np.round(Data.CA_jenks[i+1],1))}
    if Data.CA.loc[Data.CA[Rate+'_Killings_NB']==klass].count()['PRNAME']>0:
        Data.CA.loc[Data.CA[Rate+'_Killings_NB']==klass].plot(
            ax=ax,
            **kwargs
                 )
    CA_Patches.append(mpatches.Patch(**kwargs))

Data.ScaleData(scale=1e6)
Data.Breaks(column=Rate+'_Rate',classes=n_classes,Manual_Bins=[0,.5,1,2,10])

labels=Data.US[Rate+'_Rate_NB'].unique().sort_values()
colors = []
for c in range(n_classes):
    colors.append(matplotlib.colors.to_hex([Grey-(c/(n_classes-1)*Grey),Grey-(c/(n_classes-1)*Grey),Grey+(c/(n_classes-1)*(1-Grey))]))
US_Color = {key:value for key,value in zip(labels,colors)}

US_Patches = []
# US_Patches.append(mpatches.Patch(**{'facecolor':'None',
#                  'edgecolor':'None',
#                  'linewidth':.5,'label':'United States'}))
for i,klass in enumerate(Data.US[Rate+'_Rate_NB'].unique().sort_values()):
#     try:
    kwargs = {'facecolor':US_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':str(np.round(Data.US_jenks[i],1))+' - '+str(np.round(Data.US_jenks[i+1],1))}
    Data.US.loc[Data.US[Rate+'_Rate_NB']==klass].plot(
        ax=ax,
        **kwargs
             )
    US_Patches.append(mpatches.Patch(**kwargs))
first_legend = plt.legend(handles=CA_Patches, loc='upper left',
      title='Canada: Total Killings 2000-2020')

# Add the legend manually to the current Axes.
plt.gca().add_artist(first_legend)

# Create another legend for the second line.
ax.legend(handles=(US_Patches), loc='lower left',
      title='United States: Annual Killings 2013-2020\nper 10 Million Residents')
    
# ax.legend(handles={'PKR':Patches},) 
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# ax.set_title('Police Killings')

plt.tight_layout()
plt.savefig('Content/IsItNormalalized_Map.png',bbox_inches='tight')

<IPython.core.display.Javascript object>

A proxy artist may be used instead.
See: https://matplotlib.org/users/legend_guide.html#creating-artists-specifically-for-adding-to-the-legend-aka-proxy-artists
  first_legend = plt.legend(handles=CA_Patches, loc='upper left',


# Part 3) Histograms, Data Classification, & Cloropleth Mapping


# Rates by Province/State

Police Killing Rates vary by administrative divisions, e.g. (State/Province)
* If we want to compare rates the first step is to look at histograms.
* A Histogram shows us the frequency distribution of a given variable
    * Data is grouped into a set of bins and counted


In [26]:
Rate = 'Total'
n_classes=4
Data.Breaks(column=Rate+'_Rate',classes=n_classes,Manual_Bins=[0,.5,1,2,10])
# plt.figure()
import numpy as np
fig,ax=plt.subplots()#1,2)
# print(Data.CA['Total_Killings'].describe())
Data.Combined['Total_Rate'].hist(ax=ax,color='#b01005',edgecolor='k',zorder=2)
ax.grid(axis='x')
ax.set_ylabel('Frequency')
ax.set_xlabel('Killings per Million Residents per Year')
ax.set_title('Police Killing Rates by Province/State',loc='left')
ax.axvline(Data.Combined['Total_Rate'].mean(), color='k', linestyle='dashed', linewidth=2,label='Mean')
ax.axvline(Data.Combined['Total_Rate'].mean()+Data.Combined['Total_Rate'].std(), 
            color='b', linestyle='dashed', linewidth=2,label='1 Standard Deviation')
ax.axvline(Data.Combined['Total_Rate'].mean()-Data.Combined['Total_Rate'].std(), 
            color='b', linestyle='dashed', linewidth=2)
ax.legend()
plt.savefig('Content/Combined_Rate_Hist.png')
# print(Data.Combined.index,Data.CA.index)
# print(Data.Combined['Total_Rate'].sort_values())
# print(Data.CA['Total_Rate'].sort_values())

<IPython.core.display.Javascript object>

# Outliers
Histograms can be useful for spotting outliers in a dataset
* The Indigenouos Police Killing rate hisogram for the US shows an outlier
    * Vermont has a rate many times higher than the nearest value

In [27]:

Data.ScaleData(scale=1e6)
Rate = 'Indigenous'
print(Data.Combined[Rate+'_Rate'].describe())
fig,ax=plt.subplots()
Data.Combined[Rate+'_Rate'].hist(color='#eb4034',
                                 edgecolor='k',bins=10,label='All Data',zorder=2)
# ax.set_title('Indigenous Police Killing Rates by State in US States')

Rate = 'Indigenous'
print(Data.Combined[Rate+'_Rate'].describe())
Data.Combined.loc[Data.Combined[Rate+'_Rate']<50,Rate+'_Rate'].hist(color='#b01005',
                                edgecolor='k',bins=10,label='Excluding Vermont',zorder=2)
ax.set_title('Indigenous Police Killing Rates by Province/State',loc='left')

ax.set_ylabel('Frequency')
ax.set_xlabel('Killings per Million Residents per Year')

ax.legend()
ax.grid(axis='x')
plt.tight_layout()
plt.savefig('Content/Combined_Hist_by_race.png')


Data.Combined[Rate+'_Fraction'] = Data.Combined[Rate]/Data.Combined['Total']*100
Data.Combined[[Rate+'_Rate',Rate+'_Killings',Rate,Rate+'_Fraction']].sort_values(Rate+'_Rate',ascending=False).round(3)[:5].reset_index()

count     64.000000
mean       7.460333
std       14.361590
min        0.000000
25%        0.000000
50%        1.336316
75%       11.110058
max      102.621203
Name: Indigenous_Rate, dtype: float64


<IPython.core.display.Javascript object>

count     64.000000
mean       7.460333
std       14.361590
min        0.000000
25%        0.000000
50%        1.336316
75%       11.110058
max      102.621203
Name: Indigenous_Rate, dtype: float64


Unnamed: 0,index,Indigenous_Rate,Indigenous_Killings,Indigenous,Indigenous_Fraction
0,VT,102.621,1.0,1743,0.278
1,NU,35.349,6.0,30555,85.007
2,ND,26.005,6.0,41270,5.43
3,ME,20.881,1.0,8566,0.64
4,NE,19.28,2.0,18555,0.962


# Classification Methods

We'll cover five classification methods

1) Equal Interval
* Data is split to bins of equal width regardless of distribution

2) Quantiles
* Data is split by percentiles

3) Natural Breaks
* Data is split using the Jenks algorithm

4) Standard Deviation
* Data is split to bins based on distance from the mean

5) Manual Breaks
* We define our own splits

# Equal Interval

* The simplest classification scheme is to just break the data into classes of equal sizes
    * e.g. The minimum is .3 and the maximum is 9.8, so we can split that into four bins 2.4 units wide


In [33]:

Data.ScaleData(scale=1e6)
Rate = 'Total'
n_classes=5
Data.Breaks(column=Rate+'_Rate',classes=n_classes,Manual_Bins=[0,1,2,5,10,35])

labels=Data.EB_bins#.sort_values()
print(labels)
colors = []
Grey = .85

colors=['#fef0d9','#fdcc8a','#fc8d59','#e34a33','#b30000']
Combined_Color = {key:value for key,value in zip(labels[1:],colors)}
# print(Data.Combined[Rate+'_Rate_EB'].unique())
print(Combined_Color)
fig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []
for i,klass in enumerate(labels[1:]):
    print(klass)
#     try:
    kwargs = {'facecolor':Combined_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':str(np.round(Data.EB_bins[i],1))+' - '+str(np.round(Data.EB_bins[i+1],1))}
    Data.Combined.loc[Data.Combined[Rate+'_Rate_EB']==klass].plot(
        ax=ax,
        **kwargs
             )
    Combined_Patches.append(mpatches.Patch(**kwargs)) 

ax.legend(handles=(Combined_Patches), loc='lower left',ncol=1,title='Killings per Million Residents per Year')

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Equal Interval: Police Killing Rates',loc='left')

plt.savefig('Content/EqualInterval_Map.png',bbox_inches='tight')

fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color='#b01005',edgecolor='k',zorder=2)


for v in Data.EB_bins:
    ax.axvline(v, color='k', linestyle='dashed', linewidth=2,label='Mean')

ax.grid(axis='x')
ax.set_xlim(0,40)
ax.set_ylim(0,30)
ax.set_ylabel('Count')
ax.set_title('Province/State',loc='left')
plt.tight_layout()

plt.savefig('Content/EqualInterval_Hist.png')

print(Data.Combined[Rate+'_Rate_EB'].sort_values())
print(Data.Combined[Rate+'_Rate'].sort_values())
print(Data.EB_bins)

[ 0.    6.04 12.08 18.12 24.16 30.2 ]
{6.040000000000001: '#fef0d9', 12.080000000000002: '#fdcc8a', 18.120000000000005: '#fc8d59', 24.160000000000004: '#e34a33', 30.200000000000003: '#b30000'}


<IPython.core.display.Javascript object>

6.040000000000001
12.080000000000002
18.120000000000005
24.160000000000004
30.200000000000003




<IPython.core.display.Javascript object>

NL     6.04
LA     6.04
NY     6.04
MI     6.04
ID     6.04
      ...  
NT    12.08
AZ    12.08
MT    12.08
CO    12.08
NU    30.20
Name: Total_Rate_EB, Length: 64, dtype: category
Categories (5, float64): [6.04 < 12.08 < 18.12 < 24.16 < 30.20]
NB     0.722853
RI     0.845863
MA     1.010683
NL     1.039114
NY     1.098353
        ...    
OK     7.983837
NT     8.616033
NM    10.584819
AK    10.672389
NU    30.049207
Name: Total_Rate, Length: 64, dtype: float64
[ 0.    6.04 12.08 18.12 24.16 30.2 ]


  for val, m in zip(values.ravel(), mask.ravel())


In [52]:
# Data.CA['Total_Rate'+'_EB'] = pd.cut(Data.CA['Total_Rate'],
#                     bins=Data.EB_bins,#pd.interval_range(start=start,freq=freq,end=end,closed='neither'),
#                     labels=Data.EB_bins[1:],
#                     include_lowest=True,
#                     duplicates='drop'
#                                )
# Data.CA['Total_Rate'+'_EB']

  for val, m in zip(values.ravel(), mask.ravel())


prov
NL     6.04
PE     6.04
NS     6.04
NB     6.04
QC     6.04
ON     6.04
MB     6.04
SK     6.04
AB     6.04
BC     6.04
YT     6.04
NT    12.08
NU    30.20
Name: Total_Rate_EB, dtype: category
Categories (5, float64): [6.04 < 12.08 < 18.12 < 24.16 < 30.20]

In [None]:
# Data.Combined_jenks
# Data.Combined['White_Rate_NB']
# Data.EB_bins

# Quantiles

In [36]:

labels=Data.Combined[Rate+'_Rate_QB'].unique().sort_values()
# colors = []
Grey = .85

Combined_Color = {key:value for key,value in zip(labels,colors)}

fig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []

for i,klass in enumerate(Data.Combined[Rate+'_Rate_QB'].unique().sort_values()):
#     try:
    kwargs = {'facecolor':Combined_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':str(np.round(Data.Combined[Rate+'_Rate'].quantile(i/Data.classes),1))+' - '+str(np.round(Data.Combined[Rate+'_Rate'].quantile((i+1)/Data.classes),1))}
    Data.Combined.loc[Data.Combined[Rate+'_Rate_QB']==klass].plot(
        ax=ax,
        **kwargs
             )
    Combined_Patches.append(mpatches.Patch(**kwargs))

    
ax.legend(handles=(Combined_Patches), loc='lower left',ncol=1,title='Killings per Million Residents per Year')

ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Quaniltes: Police Killing Rates per Million Residents per Year')

plt.tight_layout()
plt.savefig('Content/Quantile_Map.png',bbox_inches='tight')

fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color='#b01005',edgecolor='k')

for v in range(Data.classes+1):
    ax.axvline(Data.Combined[Rate+'_Rate'].quantile(v/Data.classes), color='k', linestyle='dashed', linewidth=2,label='Mean')
ax.grid(axis='x')

ax.set_xlim(0,40)
ax.set_ylim(0,30)

ax.set_ylabel('Count')

ax.set_title('Province/State',loc='left')

plt.tight_layout()

plt.savefig('Content/Quantiled_Hist.png')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Natural Breaks

In [37]:

Rate = 'Total'
labels=Data.Combined[Rate+'_Rate_NB'].unique().sort_values()
# colors = []
Grey = .85
Combined_Color = {key:value for key,value in zip(labels,colors)}
fig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []
for i,klass in enumerate(Data.Combined[Rate+'_Rate_NB'].unique().sort_values()):
    try:
        kwargs = {'facecolor':Combined_Color[klass],
                 'edgecolor':'black',
                 'linewidth':.5,
                 'label':str(np.round(Data.Combined_jenks[i],1))+' - '+str(np.round(Data.Combined_jenks[i+1],1))}
        Data.Combined.loc[Data.Combined[Rate+'_Rate_NB']==klass].plot(
            ax=ax,
            **kwargs
                 )
        Combined_Patches.append(mpatches.Patch(**kwargs))
    except:
        pass
    

ax.legend(handles=(Combined_Patches), loc='lower left',title='Killings per Million Residents per Year')

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Natural Breaks: Police Killing Rates per Million Residents per Year')

plt.savefig('Content/NaturalBreaks_Map.png',bbox_inches='tight')

fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color='#b01005',edgecolor='k')
for v0,v1 in zip(Data.Combined_jenks,Data.Combined_jenks):
    ax.axvline(v1, color='k', linestyle='dashed', linewidth=2,label='Mean')

ax.grid(axis='x')
ax.set_xlim(0,40)
ax.set_ylim(0,30)
ax.set_ylabel('Count')


ax.set_title('Province/State',loc='left')
plt.tight_layout()

plt.savefig('Content/NaturalBreaks_Hist.png')



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Manual Breaks

In [38]:

labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()

Combined_Color = {key:value for key,value in zip(labels,colors)}

ig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []
labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()

for i,klass in enumerate(Data.Combined[Rate+'_Rate_MB'].unique().sort_values()):
#     try:
    kwargs = {'facecolor':Combined_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':str(np.round(Data.Manual_Bins[i],1))+' - '+str(np.round(Data.Manual_Bins[i+1],1))}
#     print(Data.Combined.loc[Data.Combined[Rate+'_Rate_MB']==klass].index)
    Data.Combined.loc[Data.Combined[Rate+'_Rate_MB']==klass].plot(
        ax=ax,
        **kwargs
             )
    Combined_Patches.append(mpatches.Patch(**kwargs))
#     except:
#         pass
    
#     print(Combined_Patches)
ax.legend(handles=(Combined_Patches), loc='lower left',title='Killings per Million Residents per Year')

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Manual Breaks: Police Killing Rates per Million Residents per Year')

plt.savefig('Content/ManualBreaks_Map.png',bbox_inches='tight')

fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color='#b01005',edgecolor='k')

for v in Data.Manual_Bins:
    ax.axvline(v, color='k', linestyle='dashed', linewidth=2,label='Mean')

ax.grid(axis='x')
ax.set_xlim(0,40)
ax.set_ylim(0,30)
ax.set_ylabel('Count')


ax.set_title('States/Provinces',loc='left')
plt.tight_layout()

plt.savefig('Content/ManualBreaks_Hist.png')

print(Data.Combined[[Rate+'_Rate',Rate+'_Rate_MB']].sort_values(by=Rate+'_Rate_MB'))
print(Data.Combined[Rate+'_Rate_MB'].unique().sort_values())

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

    Total_Rate  Total_Rate_MB
RI    0.845863  (-0.001, 1.0]
NB    0.722853  (-0.001, 1.0]
NH    1.977969     (1.0, 2.0]
NJ    1.425566     (1.0, 2.0]
CT    1.201582     (1.0, 2.0]
..         ...            ...
CO    6.971893    (5.0, 10.0]
AZ    6.908685    (5.0, 10.0]
NU   30.049207   (10.0, 35.0]
NM   10.584819   (10.0, 35.0]
AK   10.672389   (10.0, 35.0]

[64 rows x 2 columns]
[(-0.001, 1.0], (1.0, 2.0], (2.0, 5.0], (5.0, 10.0], (10.0, 35.0]]
Categories (5, interval[float64]): [(-0.001, 1.0] < (1.0, 2.0] < (2.0, 5.0] < (5.0, 10.0] < (10.0, 35.0]]


# Standard Deviation

In [47]:
import pandas as pd
Rate = 'Total'
labels=pd.cut(np.arange(-1,10.1),np.arange(-1,10.1))[1:]

colors_hex = []
colors_rgb = []
n_classes_STD=labels.shape[0]

Colors = ['#b30000','#b30000','#b30000','#b30000','#b30000','#b30000','#b30000',
          '#e34a33','#fdcc8a','#fc8d59','#67a9cf']#,,
Combined_Color = {key:value for key,value in zip(labels,Colors[::-1])}
# print(Combined_Color)
Combined_Patches=[]
fig,ax=plt.subplots(figsize=(7.5,7.5))
# for klass in labels:
#     print(klass)
#     kwargs = {'facecolor':Combined_Color[klass],
#              'edgecolor':'black',
#              'linewidth':.5,
#              'label':klass}
#     Combined_Patches.append(mpatches.Patch(**kwargs))

for i,klass in enumerate(Data.Combined[Rate+'_Rate_STD'].unique().sort_values()):
    try:
        kwargs = {'facecolor':Combined_Color[klass],
                 'edgecolor':'black',
                 'linewidth':.5,
                 'label':klass}## for c in range(n_classes_STD):
        Data.Combined.loc[Data.Combined[Rate+'_Rate_STD']==klass].plot(
            ax=ax,
            **kwargs
                 )
        Combined_Patches.append(mpatches.Patch(**kwargs))
    except:
        pass
ax.legend(handles=(Combined_Patches), loc='lower left',title='Standard Deviations from the Mean')
plt.tight_layout()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Standard Deviation Breaks: Police Killing Rates per Million Residents per Year')

plt.savefig('Content/STDBreaks_Map.png',bbox_inches='tight')

fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color='#b01005',edgecolor='k')

for v in range(-4,5):
    if v == 4:
        ax.axvline(Data.Combined[Rate+'_Rate'].mean()+Data.Combined[Rate+'_Rate'].std()*v,
               color='b', linestyle='dashed', linewidth=2,label='Standard Deviation')
    else:
        ax.axvline(Data.Combined[Rate+'_Rate'].mean()+Data.Combined[Rate+'_Rate'].std()*v,
               color='b', linestyle='dashed', linewidth=2)
    

ax.axvline(Data.Combined[Rate+'_Rate'].mean(), color='k', linestyle='dashed', linewidth=2,label='Mean')
ax.legend()
ax.grid(axis='x')
ax.set_xlim(0,40)
ax.set_ylim(0,30)
ax.set_ylabel('Count')
ax.set_title('States/Provinces',loc='left')
plt.tight_layout()

plt.savefig('Content/STDBreaks_Hist.png')

print(Data.Combined[Rate+'_Rate_STD'].sort_values())

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

NL    (-1.0, 0.0]
OR    (-1.0, 0.0]
VA    (-1.0, 0.0]
NY    (-1.0, 0.0]
MI    (-1.0, 0.0]
         ...     
TN     (0.0, 1.0]
NM     (1.0, 2.0]
AK     (1.0, 2.0]
NT     (1.0, 2.0]
NU     (6.0, 7.0]
Name: Total_Rate_STD, Length: 64, dtype: category
Categories (16, interval[float64]): [(-8.0, -7.0] < (-7.0, -6.0] < (-6.0, -5.0] < (-5.0, -4.0] ... (4.0, 5.0] < (5.0, 6.0] < (6.0, 7.0] < (7.0, 8.0]]


In [49]:
print(Data.Combined['Total_Rate'].mean())

4.06181957316677


# Ratio to Ordinal

In [100]:
# CA
Data.ScaleData(scale=1e6)
Data.Breaks(column=Rate+'_Rate',classes=3,
            Manual_Bins=[0,CA_Rate,
                         US_Rate,15],
            labels=['Low',
                    'Medium',
                    'High'])


colors=['#fef0d9','#fc8d59','#b30000']
Rate = 'Total'
labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()
# colors = []
# Grey = .85
# for c in range(n_classes):
#     colors.append(matplotlib.colors.to_hex([Grey+(c/(n_classes-1)*(1-Grey)),Grey-(c/(n_classes-1)*Grey),Grey-(c/(n_classes-1)*Grey)]))
Combined_Color = {key:value for key,value in zip(labels,colors)}
# print(Combined_Color)

# import matplotlib
fig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []

labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()
Combined_Color = {key:value for key,value in zip(labels,colors)}
for i,klass in enumerate(Data.Combined[Rate+'_Rate_MB'].unique().sort_values()):
#     try:
    print(i,klass)
    kwargs = {'facecolor':Combined_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':klass}#str(np.round(Data.Manual_Breaks[i],1))+' - '+str(np.round(Data.Manual_Breaks[i+1],1))}
    Data.Combined.loc[Data.Combined[Rate+'_Rate_MB']==klass].plot(
        ax=ax,
        **kwargs
             )
    Combined_Patches.append(mpatches.Patch(**kwargs))
    

ax.legend(handles=(Combined_Patches), loc='lower left',title='Police Killing Rates')

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Ordinal Data: non-Descript Labels',loc='Left')

plt.savefig('Content/Ordinal_Map_Bad_Labels.png',bbox_inches='tight')


Data.ScaleData(scale=1e6)
Data.Breaks(column=Rate+'_Rate',classes=3,
            Manual_Bins=[0,CA_Rate,
                         US_Rate,15],
            labels=['Less than Canadian Average',
                    'Less than US Average',
                    'Greater than US Average'])


# colors=['#fef0d9','#fc8d59','#b30000']
Rate = 'Total'
labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()
Combined_Color = {key:value for key,value in zip(labels,colors)}


fig,ax=plt.subplots(figsize=(7.5,7.5))
Combined_Patches = []

labels=Data.Combined[Rate+'_Rate_MB'].unique().sort_values()
Combined_Color = {key:value for key,value in zip(labels,colors)}
for i,klass in enumerate(Data.Combined[Rate+'_Rate_MB'].unique().sort_values()):
    kwargs = {'facecolor':Combined_Color[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':klass}
    Data.Combined.loc[Data.Combined[Rate+'_Rate_MB']==klass].plot(
        ax=ax,
        **kwargs
             )
    Combined_Patches.append(mpatches.Patch(**kwargs))

ax.legend(handles=(Combined_Patches), loc='lower left',title='Police Killing Rates')

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Ordinal Data: Descriptive Labels',loc='Left')

plt.savefig('Content/Ordinal_Map.png',bbox_inches='tight')
fig,ax = plt.subplots(1,1,figsize=(4,2.75),sharex=True)

Data.Combined[Rate+'_Rate'].hist(ax=ax,bins=15,color=[0,0,1,Grey],edgecolor='k')

for v in Data.Manual_Bins:
    ax.axvline(v, color='k', linestyle='dashed', linewidth=2,label='Mean')

ax.grid(axis='x')
ax.set_xlim(0,11)
ax.set_ylim(0,14)
ax.set_ylabel('Count')


ax.set_title('States/Provinces',loc='left')
plt.tight_layout()

plt.savefig('Content/Ordinal_Hist.png')


['Indigenous_Killings', 'Indigenous_Rate', 'Indigenous', 'White_Killings', 'White_Rate', 'White', 'Unknown_Killings', 'Unknown_Rate', 'Asian_Killings', 'Asian_Rate', 'Asian', 'Black_Killings', 'Black_Rate', 'Black', 'Middle Eastern_Killings', 'Middle Eastern_Rate', 'Middle Eastern', 'South Asian_Killings', 'South Asian_Rate', 'South Asian', 'Visible minority, n.i.e_Killings', 'Visible minority, n.i.e_Rate', 'Visible minority, n.i.e', 'Latin American_Killings', 'Latin American_Rate', 'Latin American', 'Total_Killings', 'Total_Rate', 'Total', 'geometry', 'Country'] ['Indigenous_Killings', 'Indigenous_Rate', 'Indigenous', 'White_Killings', 'White_Rate', 'White', 'Unknown_Killings', 'Unknown_Rate', 'Asian_Killings', 'Asian_Rate', 'Asian', 'Black_Killings', 'Black_Rate', 'Black', 'Hispanic_Killings', 'Hispanic_Rate', 'Hispanic', 'Pacific Islander_Killings', 'Pacific Islander_Rate', 'Pacific Islander', 'Total_Killings', 'Total_Rate', 'Total', 'geometry', 'Country']


<IPython.core.display.Javascript object>

0 Low
1 Medium
2 High
['Indigenous_Killings', 'Indigenous_Rate', 'Indigenous', 'White_Killings', 'White_Rate', 'White', 'Unknown_Killings', 'Unknown_Rate', 'Asian_Killings', 'Asian_Rate', 'Asian', 'Black_Killings', 'Black_Rate', 'Black', 'Middle Eastern_Killings', 'Middle Eastern_Rate', 'Middle Eastern', 'South Asian_Killings', 'South Asian_Rate', 'South Asian', 'Visible minority, n.i.e_Killings', 'Visible minority, n.i.e_Rate', 'Visible minority, n.i.e', 'Latin American_Killings', 'Latin American_Rate', 'Latin American', 'Total_Killings', 'Total_Rate', 'Total', 'geometry', 'Country'] ['Indigenous_Killings', 'Indigenous_Rate', 'Indigenous', 'White_Killings', 'White_Rate', 'White', 'Unknown_Killings', 'Unknown_Rate', 'Asian_Killings', 'Asian_Rate', 'Asian', 'Black_Killings', 'Black_Rate', 'Black', 'Hispanic_Killings', 'Hispanic_Rate', 'Hispanic', 'Pacific Islander_Killings', 'Pacific Islander_Rate', 'Pacific Islander', 'Total_Killings', 'Total_Rate', 'Total', 'geometry', 'Country']


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# print(Data.Combined.columns)#.sort_values())
# print(Data.CA.append(Data.US
# Categories = [v for v in Data.CA_PoliceKillings.race.unique()]
# for c in Data.US_PoliceKillings.race.unique():
#     if c not in Data.CA_PoliceKillings.race.unique():
#         Categories.append(c)
print(Data.US.columns)
print(Data.US_PoliceKillings.race.unique())

# Categorical

In [102]:
All = Data.CA_PoliceKillings.groupby(['prov','race']).count()['age'].unstack().max(axis=1)
Max1 = Data.CA_PoliceKillings.groupby(['prov','race']).count()['age'].unstack().max(axis=1)
Max2 = Data.CA_PoliceKillings.groupby(['prov','race']).count()['age'].unstack().T
Data.CA['Top']=''
for v,i in zip(Max1,Max1.index):
    Data.CA.loc[Data.CA.index == i,'Top']=Max2.loc[Max2[i]==v].index.values[0]
# print(Data.CA['Top'])


All = Data.US_PoliceKillings.groupby(['State','RACE']).count()['AGE'].unstack().max(axis=1)
Max1 = Data.US_PoliceKillings.groupby(['State','RACE']).count()['AGE'].unstack().max(axis=1)
Max2 = Data.US_PoliceKillings.groupby(['State','RACE']).count()['AGE'].unstack().T
Data.US['Top']=''
for v,i in zip(Max1,Max1.index):
    Data.US.loc[Data.US.index == i,'Top']=Max2.loc[Max2[i]==v].index.values[0]
# print(Data.US['Top'])


R = ['White','Black','Indigenous','Latin American','Hispanic','Pacific Islander']
Tempp = Data.Combined.copy()
for r in R:
    Tempp.loc[((Tempp[r]/Tempp['Total']<=.025)&(Tempp[r+'_Killings']<=2)),r+'_Rate']=np.nan
#     Tempp.loc[Tempp[r]<=5e3,r+'_Rate']=np.nan

Max = Tempp[['White_Rate','Black_Rate','Indigenous_Rate','Latin American_Rate','Hispanic_Rate','Pacific Islander_Rate']].max(axis=1)
Temp = Tempp[['White_Rate','Black_Rate','Indigenous_Rate','Latin American_Rate','Hispanic_Rate','Pacific Islander_Rate']]
# print(Temp.loc[Temp==Max])
Data.Combined['Top']=''
for index,row in Temp.iterrows():
#     print()
    Data.Combined.loc[Data.Combined.index==index,'Top']=row.loc[row==Max[index]].index.values[0].split('_')[0]
# print(Max
# print(Data.Combined['Top'])
# print(Data.Combined[['White_Rate','Black_Rate','Indigenous_Rate']])
print(Data.Combined[['Pacific Islander','Pacific Islander_Rate','Black_Rate','Pacific Islander_Killings','Top']].sort_values(by='Pacific Islander').dropna())

    Pacific Islander  Pacific Islander_Rate  Black_Rate  \
ID            2763.0              44.539511   10.191525   
OK            3859.0              31.889782   23.955469   
MI            3907.0              31.497996    4.290542   
PA            5008.0              24.573217    5.513846   
MO            7385.0              33.327737   14.152625   
AK            7958.0              15.464020   19.536083   
NC           10218.0              12.043714    4.706489   
UT           29362.0               8.382445   27.438722   
WA           53924.0              11.410751   13.073469   
HI          144971.0              20.373068    4.344666   
CA          155739.0               7.901853   11.046193   

    Pacific Islander_Killings               Top  
ID                        1.0        Indigenous  
OK                        1.0             Black  
MI                        1.0             Black  
PA                        1.0             Black  
MO                        2.0            

In [104]:
print(Data.CA['Top'].unique())
print(Data.US['Top'].unique())
raceColor={'White':'#FB3640',
           'Black':'#3899C9',
           'Indigenous':'#E8800B',
           'Hispanic':'#FFF07C',
           'Pacific Islander':'#89FFA7',
           'Unknown':'#c2c0c0'}

fig,ax=plt.subplots(figsize=(7.5,7.5))
CA_Patches = []#[matplotlib.text.Annotation('Canada',(0,0))]

# CA_Patches.append(mpatches.Patch(**{'facecolor':'None',
#                  'edgecolor':'None',
#                  'linewidth':.5,'label':'Canada\n2000-2020'}))
for klass in raceColor.keys():
#     try:?
   #str(np.round(Data.CA_STD_bins[i],1))+' - '+str(np.round(Data.CA_STD_bins[i+1],1))}
    if Data.Combined.loc[Data.Combined['Top']==klass].count().Total_Killings>0:
        kwargs = {'facecolor':raceColor[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':klass}
        Data.Combined.loc[Data.Combined['Top']==klass].plot(
        ax=ax,
        **kwargs
             )
        CA_Patches.append(mpatches.Patch(**kwargs))


ax.legend(handles=(CA_Patches), loc='lower left',ncol=2)

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Categorical: race Most Likely to be Killed by Police',loc='left')

plt.savefig('Content/HighestRaterace_Map.png',bbox_inches='tight')

# raceColor={'White':'#FB3640',
#            'Black':'#3899C9',
#            'Indigenous':'#E8800B',
#            'Hispanic':'#FFF07C',
#            'Pacific Islander':'#89FFA7'}

fig,ax=plt.subplots(figsize=(7.5,7.5))
CA_Patches = []#[matplotlib.text.Annotation('Canada',(0,0))]

# CA_Patches.append(mpatches.Patch(**{'facecolor':'None',
#                  'edgecolor':'None',
#                  'linewidth':.5,'label':'Canada\n2000-2020'}))
for klass in raceColor.keys():
#     try:?
    kwargs = {'facecolor':raceColor[klass],
             'edgecolor':'black',
             'linewidth':.5,
             'label':klass}#str(np.round(Data.CA_STD_bins[i],1))+' - '+str(np.round(Data.CA_STD_bins[i+1],1))}
    if Data.CA.loc[Data.CA['Top']==klass].count().Total_Killings>0:
        Data.CA.loc[Data.CA['Top']==klass].plot(
        ax=ax,
        **kwargs
             )
    if Data.US.loc[Data.US['Top']==klass].count()['State']>0:
    
        Data.US.loc[Data.US['Top']==klass].plot(
            ax=ax,
            **kwargs
                 )
    CA_Patches.append(mpatches.Patch(**kwargs))


ax.legend(handles=(CA_Patches), loc='lower left',ncol=2)

plt.tight_layout()
    
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title('Categorical: race of Majoirty of Police Killing Victims',loc='left')

plt.savefig('Content/MostNumerousrace_Map.png',bbox_inches='tight')


['White' 'Unknown' 'Indigenous']
['Black' 'White' 'Hispanic' 'Pacific Islander']


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>