# Geospatial Analysis & Visualization with Python

## Lets Get Started!

### The first step with any python project is to import our packages

* We'll be working with three packages today: Panda, Geopandas, and Matplotlib
    * Pandas is general use package for working with tabular dataset
    * Geopandas is a geospatial exentson for Pandas
        * It to allow us to work with vector data (points, lines, & polgons)
    * Matplotlib is a powerful plotting library that can be used to make visualizations

In [1]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib notebook

### Now we can load the data

* We'll load the tabular data as a "DataFrame" using pandas
    * Then we'll converti it into a "GeoDataFrame" using Geopandas
        * To do this, we must assign the "geometry".  In this case its point data, and the coordinates are in lat/long
        * Then we need to assign a Coordiante Reference System (CRS) manually
            * ESPG is a standardized code that is used to represent CRSs.
            * 'espg:4326' is for the refers to the WGS 1984 datum, which our latitude/longitude data is based in.
                * This is a CRS that is widely used by many web-based platforms because like Google Maps and Mapbox
                * The original only had addresses, not coordinates, so we used a webservice (Mapbox) to generate the coordinates of our addresses
* Once we have the data loaded, calling .head() will give us a "preview" of our dataset

In [2]:
# We import the Police Killings file
police_Killings_Tabular = pd.read_csv('Data/PoliceKillings.csv',parse_dates=['date'])

# We can then convert teh pandas dataframe into a geopandas "GeodataFrame"
police_Killings = gpd.GeoDataFrame(police_Killings_Tabular,
    geometry=gpd.points_from_xy(police_Killings_Tabular.longitude, police_Killings_Tabular.latitude)
)

# Now we can assign a CRS
WGS_1984={'init' :'epsg:4326'}
police_Killings.crs = WGS_1984

# Lets take a quick look.
police_Killings.head()

Unnamed: 0,date,id_incident,date.1,day_week,prov,city_town,postal_code,location_type,id_victim,Name,...,taser_deployed,injured_officer,excited_delirium,mentral_distress_disorder,substance_abuse,summary,latitude,longitude,geocoding_Notes,geometry
0,2012-01-06,1,2012-01-06,Fri,QC,Montreal,H3B 4W5,Urban,0001-V1,*****,...,No,Yes,No,Yes,Yes,Farshad Mohammadi was carrying a knife when he...,45.498173,-73.567157,,POINT (-73.56716 45.49817)
1,2012-01-11,2,2012-01-11,Wed,AB,Onoway,T0E 1V0,Rural,0002-V1,*****,...,No,No,No,Unknown,Yes,RCMP called to an apartment complex in respons...,53.68876,-114.19944,,POINT (-114.19944 53.68876)
2,2012-01-12,3,2012-01-12,Thu,ON,Oakville,L6H 0G6,Urban,0003-V1,*****,...,No,No,No,Unknown,Unknown,Kyle Newman intentionally and repeatedly struc...,43.477098,-79.702193,,POINT (-79.70219 43.47710)
3,2012-02-03,4,2012-02-03,Fri,ON,Toronto,M4C 1X5,Urban,0004-V1,*****,...,No,No,No,Yes,No,"Michael Eligon, who had been involuntarily com...",43.68756,-79.321,,POINT (-79.32100 43.68756)
4,2012-02-13,5,2012-02-13,Mon,ON,Hamilton,L8K 5J4,Urban,0005-V1,*****,...,No,No,No,Yes,Unknown,"Police had a stolen van under surveillance, an...",43.21957,-79.79493,,POINT (-79.79493 43.21957)


# Before we dig into the data, lets make a quick map.

* We have a shapfile (denoted by the .shp) of the provinces in canada.
    * .shp is a standard format for vector data and Geopandas can read it as is


## Layers need to be in the same coordinate system to match up properly on a map!

* We can re-project the police_Killings layer using the .to_crs function to set the CRS to that of the Provinces
    * The provinces layer uses the Canada Lambert Conformal Conic projection (LCC).  This is the standard projection used by stats canada and is ideally suited for displaying the whole of country.
        
### Once both datasets are in the same coordinate system, we can make a map!
* First we must define a plot, using the matplotlib.pyplot package.  We imported this earlier as "plt"
    * We use the plt.subplots() to create a figure, and we can define how big we want it to be
* Geoapndas can then use the .plot() fucntion to create a map using matplotlib.
    * We simply tell it what axis to draw the plot on with ax="axes"
    * Then set a few other parameters:
        * We just want the provinces as a grey background so we can set the color
        * We want to classify killings by race, so we can set race as the column.  THen we can add a legend to aid interpretation of the data

In [90]:
# We can import shapefiles directly using geopandas
Provinces = gpd.read_file('Data/lpr_000b16a_e.shp')

# We can use .to_crs() to create a police killings layer with the same projection as the provinces layer.
police_Killings = police_Killings.to_crs(Provinces.crs)

# Now, we can create a figure using matplotlib (plt), first we define the figure and the size
fig,axes=plt.subplots(
    figsize=(6,6)
)

# Now we can add the provinces using the .plot() function.  We set the plotting axes and give it a grey color
Provinces.plot(
    ax=axes,
    color='grey'
)

# Then we add the police_Killings_LCC.  We'll set the column to 'race', so we can disply by race,
# give the point markers a few more parameters, and add them to a legend
police_Killings.plot(
    ax=axes,
    column='race',
    edgecolor='k',
    markersize=10,
    legend=True,
    legend_kwds={'loc': 'upper right','fontsize':10}
)
# lgnd = axes.legend()
# print(lgnd)
# lgnd.legendHandles[0]._legmarker.set_markersize(6)
# lgnd.legendHandles[1]._legmarker.set_markersize(6)

Unnamed: 0,PRUID,PRNAME,PRENAME,PRFNAME,PREABBR,PRFABBR,AREA_LCC,AREA_AEA,Area_Merc,geometry
0,10,Newfoundland and Labrador,Newfoundland and Labrador,Terre-Neuve-et-Labrador,N.L.,T.-N.-L.,397598.0,406998.0,1124050.0,"MULTIPOLYGON (((8307365.589 2582136.711, 83083..."
1,11,Prince Edward Island,Prince Edward Island,Île-du-Prince-Édouard,P.E.I.,Î.-P.-É.,6023.0,5893.29,12384.0,"MULTIPOLYGON (((8435711.754 1679935.966, 84358..."
2,12,Nova Scotia,Nova Scotia,Nouvelle-Écosse,N.S.,N.-É.,57534.5,55643.3,111891.0,"MULTIPOLYGON (((8470851.646 1624745.011, 84710..."
3,13,New Brunswic,New Brunswick,Nouveau-Brunswick,N.B.,N.-B.,74525.4,73050.6,154848.0,"MULTIPOLYGON (((8176224.537 1722553.460, 81762..."
4,24,Quebec,Quebec,Québec,Que.,Qc,1476350.0,1509750.0,4309780.0,"MULTIPOLYGON (((8399709.494 2261445.703, 84005..."


## And now you've made your first map with python!

* But notice, it doesn't look great.  This is just the quick and dirty way to look ata data

* To make things more presentable, we'll have to be more explicity in our definitions.  But that's a task for later.

* For now, lets move on and look at the dataset in more detail.

# Lets explore this dataset a bit further

## This dataset has information on many aspects of the incident.  The aspects we'll investigate today include: age, gender race, armed_type, mentral_distress_disorder.
* Pandas & Geopandas have some nice features to quickly summarize our dataset.
* We can use .count() to get the total # incidents.
    * Callling .count() as is, will give us a list of all the columns, and a count for each.  We can see most collumns are "full" but in the "geocoding_Notes" column, we can see that 4 points don't have coordinates associated with their address.  This suggests there was an error in the data entry process.  We don't need to worry about this though.    

In [4]:
police_Killings.count()

date                             462
id_incident                      462
date.1                           462
day_week                         462
prov                             462
city_town                        462
postal_code                      462
location_type                    462
id_victim                        462
Name                             462
age                              455
gender                           462
race                             462
ethnic_ancestry                  107
immigrant_refugee_naturalized    462
armed_type                       462
cause_death                      462
taser_deployed                   462
injured_officer                  462
excited_delirium                 462
mentral_distress_disorder        462
substance_abuse                  462
summary                          462
latitude                         458
longitude                        458
geocoding_Notes                    4
geometry                         462
d

* We can use .mean(), .min(), etc. followed by ['age'] to get some vital statistics on the age of victims.

In [21]:
print('Age Distribution of Victims')
print()
print('Mean:                ',
      police_Killings.mean()['age']
     )
print()
print('Standard Deviation:  ',
      police_Killings.std()['age']
     )
print()
print('Youngest:            ',
      police_Killings.max()['age']
     )
print()
print('Oldest:              ',
      police_Killings.min()['age']
     )

Age Distribution of Victims

Mean:                 36.73186813186813

Standard Deviation:   11.775739256991544

Youngest:             77.0

Oldest:               15.0


# Lets say we want to ask a more detailed question.  What's the distribution of police killings by race?

* We can use .groupby() to aggregate data by one or more categories.  After the groupby, we dictate "How" to aggregate.  Whether we want a count, a mean, etc.

* The sort_values() function is handy because it makes lists easier to interpret.

In [63]:
Race_Breakdown = police_Killings.groupby(['race']).count()['date'].sort_values()

Race_Breakdown

race
Latin American      3
Arab                5
Other               5
South Asian        10
Asian              15
Black              43
Indigenous         70
Unknown            99
Caucasian         212
Name: date, dtype: int64

# The racial demographics of Canada aren't evenly split however!


* ## So this information is missleading as its.


* ## We need to Normalize our data by population statistics.

* Lets import Provincial level Canadian Census Data.
* We'll set the index to the 'PRUID' column.  This is a unique provincial identifier used by stas Canada

In [91]:
Census = pd.read_csv('Data/Census.csv',index_col=['PRUID'])
Census.head()

Unnamed: 0_level_0,prov,Total,South Asian,Chinese,Black,Filipino,Latin American,Arab,Southeast Asian,West Asian,Korean,Japansese,"Visible minority, n.i.e",Mixed,Indigenous,Caucasian
PRUID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,CA,35151728,1924635,1577060,1198545,780125,447320,523235,313260,264305,188710,92920,132090,232375,1673780,25803368
10,NL,519716,2645,2325,2350,1385,635,1375,335,220,75,60,145,255,45725,462186
11,PE,142907,925,2570,825,670,255,585,145,215,210,110,50,85,2735,133527
12,NS,923598,7905,8645,21910,3400,1685,8115,1195,1540,1540,695,635,1390,51490,813453
13,NB,747101,2535,3925,6995,1975,1285,2960,1230,735,1685,230,300,680,29380,693186


## Stats Canada's racial data categories don't perfectly match our police violence data

* But we do have white, indigenous, and black, which are the three largest groups in the police killing dataset.  So we'll focus on these groups.

* Exploring the other races on your own time could be a good way to get some practice.

* The first row (PRUID = 1) is the are the numbers for the whole country.
    * Lets use that info to calculate the national police killing rate
    
# We can query the datasets using the .loc[] function

## This query uses the .index, we can also search by column names.  Any one have a guess how we could reformat thie query to seach for prov == 'CA" instead?

In [82]:
Can_Pop = Census.loc[Census.index==1,['Total','Caucasian','Black','Indigenous']]

print(Can_Pop)

          Total  Caucasian    Black  Indigenous
PRUID                                          
1      35151728   25803368  1198545     1673780


# From here, we can calculate the police killing rate.

* Dividing the total number of killings by the population gives us ...

In [83]:
Race_Breakdown.sum() / Can_Pop['Total']

PRUID
1    0.000013
Name: Total, dtype: float64

# This number isn't that meaningful though.  It represents the number of killings "per person" over the whole study period.

## Lets convert the rate to a more meaninful unit.  Killings / Million Residents / Year

### The date record is a "date" object.
* It has some added functionality like being able to query the the year, month, day

### How might we use this info to calculate our police killing rate?

In [84]:
First_Year = police_Killings['date'].min().year
Last_Year = police_Killings['date'].max().year

Scale = 1e6
Duration = Last_Year-First_Year
rate_Conversion = Scale /  Duration
Rate = Race_Breakdown.sum() / Can_Pop['Total'] * rate_Conversion
print(Rate)

PRUID
1    0.773119
Name: Total, dtype: float64


# How does the rate vary by race?

In [85]:
Races = ['Caucasian','Black','Indigenous']
Rates = Race_Breakdown[Races]/Can_Pop[Races]*rate_Conversion
Rates

Unnamed: 0_level_0,Caucasian,Black,Indigenous
PRUID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,0.483293,2.110402,2.460089


# The Police killing rates are 5x higher for Indigenous people and 4x higher for Black people than white people.

# This is an abhorent example of systemic racism in Canadian Policing.

* We can make a graph to visually highlight this sever racial disparity.

In [87]:
fig, ax = plt.subplots(figsize = (6,6))
ax.barh(Races,Rates.values[0])
ax.set_title('Police Killings by Race in Canada')
ax.set_xlabel('Killings/Million Residents/Year')
plt.tight_layout()

<IPython.core.display.Javascript object>

# To procede, lets join the census data.  This will let us map the disparity by province and do a more detailed analysis

* PRUID is a "unique identifier" that represents the provinces.  We can use it as the join key.

In [92]:
Test_Join = Provinces.set_index('PRUID').join(Census)
Test_Join.head()

Unnamed: 0_level_0,PRNAME,PRENAME,PRFNAME,PREABBR,PRFABBR,AREA_LCC,AREA_AEA,Area_Merc,geometry,prov,...,Latin American,Arab,Southeast Asian,West Asian,Korean,Japansese,"Visible minority, n.i.e",Mixed,Indigenous,Caucasian
PRUID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,Newfoundland and Labrador,Newfoundland and Labrador,Terre-Neuve-et-Labrador,N.L.,T.-N.-L.,397598.0,406998.0,1124050.0,"MULTIPOLYGON (((8307365.589 2582136.711, 83083...",,...,,,,,,,,,,
11,Prince Edward Island,Prince Edward Island,Île-du-Prince-Édouard,P.E.I.,Î.-P.-É.,6023.0,5893.29,12384.0,"MULTIPOLYGON (((8435711.754 1679935.966, 84358...",,...,,,,,,,,,,
12,Nova Scotia,Nova Scotia,Nouvelle-Écosse,N.S.,N.-É.,57534.5,55643.3,111891.0,"MULTIPOLYGON (((8470851.646 1624745.011, 84710...",,...,,,,,,,,,,
13,New Brunswic,New Brunswick,Nouveau-Brunswick,N.B.,N.-B.,74525.4,73050.6,154848.0,"MULTIPOLYGON (((8176224.537 1722553.460, 81762...",,...,,,,,,,,,,
24,Quebec,Quebec,Québec,Que.,Qc,1476350.0,1509750.0,4309780.0,"MULTIPOLYGON (((8399709.494 2261445.703, 84005...",,...,,,,,,,,,,


# But our join fails!

### Why?  Lets look at the join keys from both files?  Maybe we have a datatype missmatch?

In [93]:
print(Provinces['PRUID'])
print(Census.index)


0     10
1     11
2     12
3     13
4     24
5     35
6     46
7     47
8     48
9     59
10    60
11    61
12    62
Name: PRUID, dtype: object
Int64Index([1, 10, 11, 12, 13, 24, 35, 46, 47, 48, 59, 60, 61, 62], dtype='int64', name='PRUID')


## Sure enough!  The provinces PRUID is an "object", not an integer.

* We can fix that easily and then do the join!
    * Just type 'int64' in .astype() so it matches with the other layer!

In [95]:
dtype = 'int64'#'what should we type here??'
Provinces['PRUID'] = Provinces['PRUID'].astype(dtype)
Provinces_Join = Provinces.set_index('PRUID').join(Census)
Provinces_Join.head()

Unnamed: 0_level_0,PRNAME,PRENAME,PRFNAME,PREABBR,PRFABBR,AREA_LCC,AREA_AEA,Area_Merc,geometry,prov,...,Latin American,Arab,Southeast Asian,West Asian,Korean,Japansese,"Visible minority, n.i.e",Mixed,Indigenous,Caucasian
PRUID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,Newfoundland and Labrador,Newfoundland and Labrador,Terre-Neuve-et-Labrador,N.L.,T.-N.-L.,397598.0,406998.0,1124050.0,"MULTIPOLYGON (((8307365.589 2582136.711, 83083...",NL,...,635,1375,335,220,75,60,145,255,45725,462186
11,Prince Edward Island,Prince Edward Island,Île-du-Prince-Édouard,P.E.I.,Î.-P.-É.,6023.0,5893.29,12384.0,"MULTIPOLYGON (((8435711.754 1679935.966, 84358...",PE,...,255,585,145,215,210,110,50,85,2735,133527
12,Nova Scotia,Nova Scotia,Nouvelle-Écosse,N.S.,N.-É.,57534.5,55643.3,111891.0,"MULTIPOLYGON (((8470851.646 1624745.011, 84710...",NS,...,1685,8115,1195,1540,1540,695,635,1390,51490,813453
13,New Brunswic,New Brunswick,Nouveau-Brunswick,N.B.,N.-B.,74525.4,73050.6,154848.0,"MULTIPOLYGON (((8176224.537 1722553.460, 81762...",NB,...,1285,2960,1230,735,1685,230,300,680,29380,693186
24,Quebec,Quebec,Québec,Que.,Qc,1476350.0,1509750.0,4309780.0,"MULTIPOLYGON (((8399709.494 2261445.703, 84005...",QC,...,133920,213740,62825,32405,8055,4575,9840,23040,182890,6949091


## Now we  want to normalize the number of killings by population to get a rate

* We have a few more steps to go through first.
    * The police killings and census data use different abbreviations.  To do a join our dataset with the census data we'll need to assign an new abbreviaton
    * We'll us a dictionary to do this
* Then we can summarize the killings by province and join it to the Provinces_Join layer

## Now we can summarize the killings by province and join it to the Provinces_Join layer
* Note Prince Edward Island doesn't have any.

In [96]:
Provinces_Killings=Provinces_Join.set_index('prov').join(police_Killings.groupby('prov').count()['id_victim'])
Provinces_Killings


# Provinces['prov']=''
# for prov in Provinces['PREABBR'].unique():
#     Provinces.loc[Provinces['PREABBR']==prov,'prov']=abr_dict[prov]

Unnamed: 0_level_0,PRNAME,PRENAME,PRFNAME,PREABBR,PRFABBR,AREA_LCC,AREA_AEA,Area_Merc,geometry,Total,...,Arab,Southeast Asian,West Asian,Korean,Japansese,"Visible minority, n.i.e",Mixed,Indigenous,Caucasian,id_victim
prov,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
NL,Newfoundland and Labrador,Newfoundland and Labrador,Terre-Neuve-et-Labrador,N.L.,T.-N.-L.,397598.0,406998.0,1124050.0,"MULTIPOLYGON (((8307365.589 2582136.711, 83083...",519716,...,1375,335,220,75,60,145,255,45725,462186,3.0
PE,Prince Edward Island,Prince Edward Island,Île-du-Prince-Édouard,P.E.I.,Î.-P.-É.,6023.0,5893.29,12384.0,"MULTIPOLYGON (((8435711.754 1679935.966, 84358...",142907,...,585,145,215,210,110,50,85,2735,133527,
NS,Nova Scotia,Nova Scotia,Nouvelle-Écosse,N.S.,N.-É.,57534.5,55643.3,111891.0,"MULTIPOLYGON (((8470851.646 1624745.011, 84710...",923598,...,8115,1195,1540,1540,695,635,1390,51490,813453,3.0
NB,New Brunswic,New Brunswick,Nouveau-Brunswick,N.B.,N.-B.,74525.4,73050.6,154848.0,"MULTIPOLYGON (((8176224.537 1722553.460, 81762...",747101,...,2960,1230,735,1685,230,300,680,29380,693186,4.0
QC,Quebec,Quebec,Québec,Que.,Qc,1476350.0,1509750.0,4309780.0,"MULTIPOLYGON (((8399709.494 2261445.703, 84005...",8164361,...,213740,62825,32405,8055,4575,9840,23040,182890,6949091,87.0
ON,Ontario,Ontario,Ontario,Ont.,Ont.,980244.0,986723.0,2448600.0,"MULTIPOLYGON (((6378815.614 2295412.440, 63787...",13448494,...,210435,133860,154670,88940,30835,97970,128590,374395,9188499,152.0
MB,Manitoba,Manitoba,Manitoba,Man.,Man.,627595.0,649630.0,1979250.0,"MULTIPOLYGON (((6039656.509 2636304.343, 60396...",1278365,...,5030,8565,2695,4375,1850,3200,6480,223310,838205,19.0
SK,Saskatchewan,Saskatchewan,Saskatchewan,Sask.,Sask.,632214.0,652385.0,1940680.0,"POLYGON ((5248633.914 2767057.263, 5249285.640...",1098352,...,4300,5740,2070,1880,955,1145,2815,175020,807467,17.0
AB,Alberta,Alberta,Alberta,Alta.,Alb.,639937.0,663251.0,2044880.0,"POLYGON ((5228304.177 2767597.891, 5228098.463...",4067175,...,56700,43985,20980,21275,12165,9905,28355,258640,2875365,71.0
BC,British Columbia,British Columbia,Colombie-Britannique,B.C.,C.-B.,917733.0,948292.0,2870620.0,"MULTIPOLYGON (((4018904.414 3410247.271, 40194...",4648055,...,19840,54920,48695,60495,41230,8765,40465,270585,2996225,98.0


In [77]:
print(Provinces)

    PRUID                     PRNAME                    PRENAME  \
0      10  Newfoundland and Labrador  Newfoundland and Labrador   
1      11       Prince Edward Island       Prince Edward Island   
2      12                Nova Scotia                Nova Scotia   
3      13               New Brunswic              New Brunswick   
4      24                     Quebec                     Quebec   
5      35                    Ontario                    Ontario   
6      46                   Manitoba                   Manitoba   
7      47               Saskatchewan               Saskatchewan   
8      48                    Alberta                    Alberta   
9      59           British Columbia           British Columbia   
10     60                      Yukon                      Yukon   
11     61      Northwest Territories      Northwest Territories   
12     62                    Nunavut                    Nunavut   

                      PRFNAME PREABBR   PRFABBR   AREA_LCC   

### We can fix this easily using the .fillna() function

### The we can do the normalization!!

* We'll calculat the rate of police killings per ... ten thousand?
    * Divide the number of killings, by the total population.  To get the per person rate
    * Then multiply by 1 million
    * Then sort the values
    
* Then we'll plot it on a map, with a bar graph below for extra context!

In [None]:
Provinces_Killings['Victims'] = Provinces_Killings['id_victim'].fillna(0)
Provinces_Killings['Rate'] = Provinces_Killings['Victims']/Provinces_Killings['Total']*1e4
Provinces_Killings = Provinces_Killings.sort_values(by='Rate')
print(Provinces_Killings)

# Okay, now we're ready to normalize

In [None]:
# Provinces_Killings['East Asian']=Provinces_Killings['Asian']
police_Killings.loc[police_Killings['race']=='East Asian','race']='Asian'
Total = police_Killings.groupby('race').count()
# print(policde_killings)
print(Total['date'].T)
Rate = []


# Races = ['Black','Indigenous','Caucasian','East Asian','South Asian']

Races = ['Asian','South Asian','Caucasian','Black','Indigenous']
for race in Races:
    Rate.append((Total['date'].T[race]/Provinces_Killings[race].sum())*1e6/17)
#     print(Total['date'].T[race])#,Provinces_Killings[race])
Normalized_Rates = pd.DataFrame(index=Races,data={'Rate':Rate})
fig,ax=plt.subplots(figsize = (8,8))
ax.barh(Normalized_Rates.index,Normalized_Rates.Rate,edgecolor='k',color=[1,.2,.25],linewidth = 1)
ax.set_title("Police Killing Rates by Race in Canada (2000-2017)",fontsize = 16)
ax.set_xlabel('Victims per Year per Million Residents',fontsize=12)
ax.grid(axis='x',color='grey')
ax.set_axisbelow(True)
textstr = '$\\bf{Data Sources}$\nDempgraphics:  Stats Canada\nPolice Killings:    Jacques Marcoux &\n'+\
'                          Katie Nicholson (CBC)'
props = dict(facecolor='white')#, alpha=0.75)
ax.text(0.46, 0.16, textstr, transform=ax.transAxes, fontsize=12,
        verticalalignment='top', bbox=props)

plt.savefig('Rates.png',dpi=400)
# print(Provinces_Killings[['Black','Indigenous','Caucasian','East Asian','South Asian','Arab','West Asian','Mixed']].sum().sort_values()/1e6)

In [None]:

fig = plt.figure(figsize=(8,8))
gs = fig.add_gridspec(6,6)
ax1 = fig.add_subplot(gs[0:3, :3])
ax2 = fig.add_subplot(gs[0:3, 3:])
ax3 = fig.add_subplot(gs[3:, :3])
ax4 = fig.add_subplot(gs[3:, 3:])
Provinces_Killings['Indigenous_pct'] = Provinces_Killings['Indigenous']/Provinces_Killings['Total']
Provinces_Killings['Black_pct'] = Provinces_Killings['Black']/Provinces_Killings['Total']

Provinces_Killings.plot(ax=ax1,column='Rate',edgecolor='k',legend=True,scheme='natural_breaks')
# ax3.bar(Provinces_Killings.index,Provinces_Killings['Rate'])
# ax3.set_xticklabels(Provinces_Killings.index,rotation=30)
Total = police_Killings.groupby('race').count()
Rate = []
Races = ['Black','Indigenous','Caucasian','Asian','South Asian']
for race in Races:
    Rate.append((Total['date'].T[race]/Provinces_Killings[race]).values[0]*1e6/17)
Normalized_Rates = pd.DataFrame(index=Races,data={'Rate':Rate})
ax4.bar(Normalized_Rates.index,Normalized_Rates.Rate,edgecolor='k')
ax4.set_title('Police Killing Rates in Canada')
ax4.set_ylabel('Killings/Year/Million Residents')
Provinces_Killings.plot(ax=ax3,column='Indigenous_pct',edgecolor='k',legend=True,scheme='natural_breaks')
Provinces_Killings.plot(ax=ax2,column='Black_pct',edgecolor='k',legend=True,scheme='natural_breaks')
plt.suptitle('Police Killing Rates Across Canada')

### Nunavut obviously has a serious problem.  75% of its population is Inuit & its rate is an order of magnitude higher than the other provinces or teritories.

### BC is the worst to of the provinces.  Lets select BC and investigate further.

* For We can select BC using the PREABBR value.
* We'll also re-project into UTM Zone 10 N, this is a more accurate coordinate system for this region

* Then we can look at the racial breakdown of killings in BC using the .groupby function again.
    * We'll create a new dataframe normalizing total killings for different races by total population within those groups
    * Then we can make a barplot highlighting racial biases 


In [None]:
# 26910
BC = Provinces_Killings.loc[Provinces_Killings.index=='B.C.'].to_crs(26910)
BC_Killings = police_Killings.loc[police_Killings['PREABBR']=='B.C.'].to_crs(26910)
Total_BC = BC_Killings.groupby('race').count()

Rate = []
Races = ['Black','Indigenous','Caucasian','Asian','South Asian']
for race in Races:
    Rate.append((Total_BC['date'].T[race]/BC[race]).values[0]*1e6/17)
Normalized_Rates = pd.DataFrame(index=Races,data={'Rate':Rate})
plt.figure()
plt.bar(Normalized_Rates.index,Normalized_Rates.Rate,edgecolor='k')
plt.title('Police Killing Rates in BC')
plt.ylabel('Killings/Year/Million Residents')
plt.savefig('BCPoliceKillings.png')
# print(Total_BC['Black']/BC['Black'])

# fig,ax=plt.subplots(figsize=(6,6))

# BC.plot(ax=ax,color='grey',edgecolor='k')
# BC_Killings.plot(ax=ax,column='race',legend=True)
# print(Canada.head())

# from fiona.crs import from_epsg
# from_epsg(2193)
# # Proj({'init': 'epsg:2193', 'no_defs': True}, preserve_flags=True)

## Lets make an infographic for BC, summarizing some of the key points

In [None]:
fig = plt.figure(figsize=(8,8))
gs = fig.add_gridspec(10,10)
ax0 = fig.add_subplot(gs[0:6, 0:6])
BC.plot(ax=ax0,color='grey',edgecolor='k')
BC_Killings.plot(ax=ax0,column='race',legend=True,edgecolor='k')
ax0.axes.get_xaxis().set_visible(False)
ax0.axes.get_yaxis().set_visible(False)
ax0.set_title('Police Killings in BC 2000-2017')


ax1 = fig.add_subplot(gs[1:5, 6:])

Mental_Health = BC_Killings.groupby('mentral_distress_disorder').count()
ax1.pie(Mental_Health['id_victim'],labels=Mental_Health.index,textprops={'fontsize': 8},
        autopct='%1.1f%%',wedgeprops={"edgecolor":"k",'linewidth': 1, 'linestyle': 'dashed'})
ax1.set_title('Was it a Mental Health Crisis?')
# ax1.bar(BC_Year.index,BC_Year.race,width=300,edgecolor='k')
# ax1.set_title('Police Killings by Year in BC')
# ax1.set_ylabel('Number')

ax2 = fig.add_subplot(gs[6:, 0:6])
ax2.bar(Normalized_Rates.index,Normalized_Rates.Rate,edgecolor='k')
ax2.set_title('Police Killing Rates by Race in BC')
ax2.set_ylabel('Killings/Year/Million Residents')
ax2.set_xticklabels(Normalized_Rates.index,rotation=30)

BC_Killings.loc[BC_Killings.armed_type=='Knife, axe, other cutting instruments','armed_type']='Knife/Axe'
BC_Killings.loc[BC_Killings.armed_type=='Bat, club, other swinging object','armed_type']='Bat/Club'
BC_Killings.loc[BC_Killings.armed_type=='Air gun, replica gun','armed_type']='Fake Gun'
BC_Killings.loc[BC_Killings.armed_type=='Unknown','armed_type']='None'
Mental_Health = BC_Killings.groupby('armed_type').count()
ax3 = fig.add_subplot(gs[6:, 6:])
ax3.pie(Mental_Health['id_victim'],labels=Mental_Health.index,textprops={'fontsize': 8},
        autopct='%1.1f%%',wedgeprops={"edgecolor":"k",'linewidth': 1, 'linestyle': 'dashed'})
ax3.set_title('Were They Armed?')


plt.tight_layout()
# ax0 = fig.add_subplot(gs[0:6, 0:6])