![](http://i.imgur.com/hYgDqD4.png)

# 5 different views for "Philadelphia Crime DataTen Years of Crime Data, by OpenDataPhilly"

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. First of all, import pandas:

In [None]:
import pandas as pd

Reading data from the csv file (https://www.kaggle.com/mchirico/philadelphiacrimedata)

In [None]:
data = pd.read_csv('crime.csv')

To view a small sample of a Series or DataFrame object, use the head() and tail() methods. The default number of elements to display is five, but you may pass a custom number.

In [None]:
data.head()

In [None]:
data.tail()

Number of non-null observations for each variable (Notice that there is different values for some variables like UCR_General)

In [None]:
data.count()

In [None]:
# Count of 1 for each record. Nice to have a standard column name
data['Value'] = 1

In [None]:
data.head()

The variable description at fields.csv

In [None]:
field = pd.read_csv('fields.csv')

In [None]:
field

## How may districts are there?

In [None]:
data['Dc_Dist'].unique()

In [None]:
data['Dc_Dist'].unique().shape[0]

### What is the Total number of crimes in the data?

In [None]:
data['Value'].sum()

## Visualize number of crimes per district

In [None]:
import matplotlib.pyplot as plt

In [None]:
import matplotlib
matplotlib.style.use('ggplot')
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = [16.0, 8.0]

In [None]:
data['Dc_Dist'] = data['Dc_Dist'].astype("category")

In [None]:
data.head()

In [None]:
data.dtypes

In [None]:
data['Dc_Dist'].value_counts()

In [None]:
data['Dc_Dist'].value_counts().plot.bar(title='Crimes by district (2006-2017)')

In [None]:
data['Text_General_Code'].value_counts().plot.barh(title='Crimes by type (2006-2017)')

## Visualize data by date/time

In [None]:
data['Dispatch_Date_Time'] = pd.to_datetime(data['Dispatch_Date_Time'])

In [None]:
data['year'] = data['Dispatch_Date_Time'].dt.year

In [None]:
data.set_index(['Dispatch_Date_Time']).Value.resample('H').sum().plot(title='Hourly evolution of total crimes')

In [None]:
data.set_index(['Dispatch_Date_Time']).Value.resample('D').sum().plot(title='Daily evolution of total crimes')

In [None]:
data.set_index(['Dispatch_Date_Time']).Value.resample('M').sum().plot(title='Monthly evolution of total crimes')

In [None]:
data.groupby('year')['Value'].sum().drop([2017]).plot(title='Yearly evolution of total crimes')

In [None]:
pd.pivot_table(data, index=data.Dc_Dist, columns=data.Dispatch_Date_Time.dt.year, values='Value', aggfunc='sum').plot.bar(title='Evolution of crime by district')

In [None]:
pd.pivot_table(data, index=data.Text_General_Code, columns=data.Dispatch_Date_Time.dt.year, values='Value', aggfunc='sum').plot.bar(title='Evolution of crime by type')

In [None]:
data.groupby(data.Dispatch_Date_Time.dt.hour)['Value'].sum().plot(title='Total crimes per hour')

## Visualize location data

[gmaps](https://github.com/pbugnion/gmaps) is a plugin for Jupyter for embedding Google Maps in the notebook. Is built arround the idea of adding layers to a base map. You've to be authenticated with Google maps. Here we use environment variables to store the GOOGLE_API_KEY.

In [None]:
import gmaps
import gmaps.datasets
import os
gmaps.configure(api_key=os.environ["GOOGLE_API_KEY"])
map2016 = gmaps.Map()

In [None]:
locations2016 = data[data.year == 2016][['Lat', 'Lon']].dropna(how='any')

In [None]:
map2016.add_layer(gmaps.heatmap_layer(locations2016, point_radius=6))

### Distribution of crime in 2016

In [None]:
map2016

In [None]:
locations2006 = data[data.year == 2006][['Lat', 'Lon']].dropna(how='any')

In [None]:
m2006= gmaps.Map()

In [None]:
m2006.add_layer(gmaps.heatmap_layer(locations2006, point_radius=6))

### Distribution of crime in 2006

In [None]:
m2006

### Visualize data by year and district

OpenDataPhilly brings us police districts information: https://www.opendataphilly.org/dataset/police-districts/resource/25dfa174-245f-4560-a3ae-0f21a3b59a3d We use geojson format:

In [None]:
import json
with open("Boundaries_District.geojson") as f:
    geometry = json.load(f)
    
district_map = gmaps.Map()
geojson_layer = gmaps.geojson_layer(geometry)
district_map.add_layer(geojson_layer)
district_map

In [None]:
print(geometry['features'][0])

Properties encodes meta-information about the feature, like the DIST_NUM. We will use this name to look up a crime value for that country and year and translate that into a colour.

In [None]:
pd.pivot_table(data,values=['Value'],index=['Dc_Dist'],columns=['year'],aggfunc='sum',margins=True).xs(2016, level=1, axis=1)

In [None]:
pd.pivot_table(data,values=['Value'],index=['Dc_Dist'],columns=['year'],aggfunc='sum').xs(2016, level=1, axis=1)['Value']

In [None]:
districtTo2016crimeAmmount = dict(enumerate(pd.pivot_table(data,values=['Value'],index=['Dc_Dist'],columns=['year'],aggfunc='sum').xs(2016, level=1, axis=1)['Value']))

We can now use the districtTo2016crimeAmmount to map each district color, we will use [colormap](http://matplotlib.org/api/cm_api.html) to map from our crime amount to a color, we use de [inferno color scale](http://matplotlib.org/examples/color/colormaps_reference.html)

In [None]:
from matplotlib.cm import inferno
from matplotlib.colors import to_hex

#We will need to scale the crime value to lie between 0 and 1
min_crime = min(districtTo2016crimeAmmount.values())
max_crime = max(districtTo2016crimeAmmount.values())

crime_range = max_crime - min_crime

def calculate_color(crime_amount):
    normalized_crime_amount = (crime_amount - min_crime) / crime_range
    
    #invert crime_amount so than high crime value gives dark color
    inverse_crime = 1.0 - normalized_crime_amount
    
    #transform the crime amount to a matplotlib color
    mpl_color = inferno(inverse_crime)
    
    # to a valid CSS color
    gmaps_color = to_hex(mpl_color, keep_alpha = False)
    
    return gmaps_color

colors = []
for feature in geometry['features']:
    district = feature['properties']['DIST_NUM']
    try:
        crime_amount = districtTo2016crimeAmmount[district]
        color = calculate_color(crime_amount)
    except KeyError:
        # No crime amount for that country: return default color
        color = (0, 0, 0, 0.3)
    colors.append(color)
    
district_map_2016 = gmaps.Map()
crime_2016_layer = gmaps.geojson_layer(
    geometry,
    fill_color=colors,
    stroke_color=colors,
    fill_opacity=0.8
)
district_map_2016.add_layer(crime_2016_layer)
district_map_2016