
# Point Pattern Analysis

In our previous lab, we looked at spatial autocorrelation as a means to extract statistical significance in our datas spatial clustering tendencies. We did so by summarizing point data by small geographic boundaries, spatially joining arrest data to census block groups. But what if we did not care to summarize data by geographic boundaries, but rather simply look at the the location of points to deduct statistical spatial patterns? In this lab, we look at various methods to conduct point pattern analysis, while also introducing interactive notebook widgets to explore our data.

## Libraries

In [None]:
# !pip install pysal

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd

# for basemaps
import contextily as ctx

# to import data from LA Data portal
from sodapy import Socrata

# data viz!
import seaborn as sns

import plotly.express as px

# to explore point patterns
from pointpats import centrography

## Arrest Data

In [None]:
# connect to the data portal
client = Socrata("data.lacity.org", None)

results = client.get("amvf-fr72", 
                     limit=50000,
                     where = "arst_date between '2020-03-01T00:00:00' and '2020-10-30T00:00:00'",
                     order='arst_date desc')

# Convert to pandas DataFrame
arrests = pd.DataFrame.from_records(results)


In [None]:
# connect to the data portal
client = Socrata("data.lacity.org", None)

results = client.get("2nrs-mtv8", 
                     limit=50000,
                     where = "date_rptd >= '2020-09-01T00:00:00'",
                     order='date_rptd desc')

# Convert to pandas DataFrame
df = pd.DataFrame.from_records(results)


In [None]:
df.shape

In [None]:
# convert pandas dataframe to geodataframe
df = gpd.GeoDataFrame(df, 
                     crs='EPSG:4326',
                     geometry=gpd.points_from_xy(df.lon, df.lat))

In [None]:
# convert lat/lon to floats
df.lon = df.lon.astype('float')
df.lat = df.lat.astype('float')
df.vict_age = df.vict_age.astype('int')

In [None]:
# drop the unmapped rows
df.drop(df[df.lon==0].index,inplace=True)

In [None]:
# filter columns
df=df[['date_rptd','area_name','vict_age','vict_sex','vict_descent','crm_cd_desc','geometry']]

In [None]:
# rename columns
df.columns = ['date','area','age','sex','race','crime','geometry']

In [None]:
# project to web mercator
df=df.to_crs('EPSG:3857')

In [None]:
# drop rows with null values
df = df.dropna()

In [None]:
# drop rows where age == 0
df = df[df.age!=0]

In [None]:
df.shape

In [None]:
df.sample(20)

## Heat maps
This lab will focus on visualing point densities in a variety of ways. Before we begin, let's have a look at the arrest data in its "raw" format, by simply creating a point map: a single point for its given location on a grid.

In [None]:
df.plot(figsize=(12,12),
             markersize=0.5)

The resulting plot tells us a lot about the data we have imported into the notebook. The overall shape, if you are familiar with Los Angeles, gives a sense of the physical space that is defined by its city boundary. Even in the absence of basemaps, satellite imagery, and other layers of information, the divided city of angels comes to life: from the "valley" in the northwest, the Santa Monica Mountains that divide that north with the Westside, highlighted by the empty rectangle that is Santa Monica, and the blob in center right that defines the contours of downtown Los Angeles, accentuated by the pathway to the port heading south towards Long Beach. And through this cacophony of points, we can begin to detect point patterns that delineate streets and certain neighborhoods appear to be more concentrated than others. As much as the blue dots represent actual data points, the absence of their presence also informs

To begin with our exploration on point patterns,

## Interactive exploration

Jupyter notebooks is a unique coding platform that allows you to mix documentation (markdown cells) with interactive code cells. There is, however, another level of interactivity that can be developed. By "interactive" we mean to say that it utilizes the interactive features of the web, allowing users to use dropdowns and sliders to manipulate the output.

The presence of these interactive widgets allows us to explore the data without the need to consistently modify code cells to change parameters. It is, in a sense, a snazzy and useful utility to your notebook.

To add interactivity to your cell output, the following steps are required:

- import the interact library
- create a function with at least one argument
- if the argument is numeric, a slider will be generated
- if the argument is categorical, provide a list of values to generate a dropdown menu

For this section, we will build an interactive map of Los Angeles showing the location of arrests by arrest type. A dropdown menu will allow you to change the crime type and update the map.

In [None]:
# import that interact library
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

In [None]:
# get the top 50 crime types into a list
top_50_crimes = df.crime.value_counts().head(50).index.tolist()
top_50_crimes

In [None]:
df[df.crime == 'BATTERY - SIMPLE ASSAULT'].head()

In [None]:
# use display instead of print if it is not the last output in a cell
display(df[df.crime == 'BATTERY - SIMPLE ASSAULT'].head()) 

# a regular filtered data output
ax = df[df.crime == 'BATTERY - SIMPLE ASSAULT'].plot(figsize=(9,9), markersize=1)
ax.axis('off')

# add a basemap
ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)

In [None]:
# create a function
def crime_by(crime='BATTERY - SIMPLE ASSAULT'):
    # use display instead of print if it is not the last output in a cell
    display(df[df.crime == crime].head()) 

    # a regular filtered data output
    ax = df[df.crime == crime].plot(figsize=(9,9), markersize=2)
    ax.axis('off')
    
    # add a basemap
    ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)

In [None]:
crime_by(crime = 'BURGLARY')

Next, we use an interactive feature to create a drop down for our function.

In [None]:
@interact
def arrests_by(crime=top_50_crimes):
    # use display instead of print if it is not the last output in a cell
    display(df[df.crime == crime].head()) 

    # a regular filtered data output
    ax = df[df.crime == crime].plot(figsize=(9,9), markersize=10)
    ax.axis('off')
    # add a basemap
    ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)

In [None]:
@interact
def arrests_by(crime=top_50_crimes,
               area=df['area'].unique().tolist()):
    # use display instead of print if it is not the last output in a cell
    display(df[(df.crime == crime) & (df['area'] == area)].head()) 

    # a regular filtered data output
    ax = df[(df.crime == crime) & (df['area'] == area)].plot(figsize=(9,9), markersize=10)
    ax.axis('off')
    # add a basemap
    ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)

## Seaborn Plots
> Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

-https://seaborn.pydata.org/

In [None]:
# we'll work in Web Mercator
df = df.to_crs('EPSG:3857')

In [None]:
# seaborn needs an x and y column so let's extract it from the geometry field
df['x'] = df.geometry.x
df['y'] = df.geometry.y

In [None]:
df.head()

In [None]:
sns.relplot(data=df,
            x='x', 
            y='y',
            hue='area')

In [None]:
sns.relplot(data=df[df['area']=='Hollywood'],
            x='x', 
            y='y')

In [None]:
sns.relplot(data=df,
            x='x', 
            y='y',
            hue='sex')

In [None]:
sns.relplot(data=df,
            x='x', 
            y='y',
            hue='sex',
            style='sex')

In [None]:
sns.relplot(data=df,
            x='x', 
            y='y',
            hue='sex',
            style='sex',
            col='race',
            col_wrap=4)

## Distribution plots

In [None]:
# create a subset
data_mini = df[df.race.isin(['H','B'])]

In [None]:
g = sns.jointplot(data = data_mini,
                  x='x', 
                  y='y',
                  s=10)

In [None]:
g = sns.jointplot(data = data_mini,
                  x='x', 
                  y='y',
                  hue='race',
                  s=10)

In [None]:
sns.jointplot(data = data_mini,
              x='x', 
              y='y', 
              kind="hist",
              hue='race')

In [None]:
sns.jointplot(data = data_mini,
              x='x', 
              y='y', 
              kind='kde')

In [None]:
sns.jointplot(data = data_mini,
              x='x', 
              y='y', 
              kind='kde',
              hue='race')

In [None]:
# combining seaborn charts
g = sns.jointplot(data = data_mini,
                  x='x', 
                  y='y', 
                  hue='race',
                  s=10,
                  alpha=0.5)
g.plot_joint(sns.kdeplot, 
             hue='race')

## Heatmap

The `kde` jointplot c

In [None]:
# Set up figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))

# Generate and add KDE with a shading of 50 gradients 
# coloured contours, 75% of transparency,
# and the reverse viridis colormap
sns.kdeplot(x = df[df.race=='H'].x, 
                y=df[df.race=='H'].y,
                n_levels=20, 
                shade=False,
#                 shade_lowest=False,
#                 thresh=0.1,    
                alpha=0.5, 
                cmap='Reds')

# Remove axes
ax.set_axis_off()

# add a basemap
ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)

## Centrography

In [None]:
# create new columns for x and y values from the geometry column
df['x'] = df.geometry.x
df['y'] = df.geometry.y

In [None]:
# compute the mean and median centers
mean_center = centrography.mean_center(df[['x','y']])
med_center = centrography.euclidean_median(df[['x','y']])

In [None]:
print(mean_center[1])

In [None]:
# Set up figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))

# Plot points
ax.scatter(df['x'], df['y'], s=0.75)
ax.scatter(*mean_center, color='red', marker='x', label='Mean Center')
ax.scatter(*med_center, color='limegreen', marker='o', label='Median Center')

ax.legend()

# add a basemap
ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)
# Display
plt.show()

In [None]:
centrography.std_distance(df[['x','y']])

In [None]:
major, minor, rotation = centrography.ellipse(df[['x','y']])

In [None]:
from matplotlib.patches import Ellipse
import numpy

In [None]:
# filter the data by race
race = 'W'
crime_filtered = df[df.race == race]

# mean center and median
mean_center = centrography.mean_center(crime_filtered[['x','y']])
med_center = centrography.euclidean_median(crime_filtered[['x','y']])

# standard ellipse
major, minor, rotation = centrography.ellipse(crime_filtered[['x','y']])

# Set up figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))

# plot arrest points
ax.scatter(crime_filtered['x'], crime_filtered['y'], s=0.75)

# add the mean and median center points
ax.scatter(*mean_center, color='red', marker='x', label='Mean Center')
ax.scatter(*med_center, color='limegreen', marker='o', label='Median Center')

# heatmap
sns.kdeplot(x = crime_filtered.geometry.x, 
            y = crime_filtered.geometry.y,
            n_levels=20, 
            shade=False,
            thresh=0.1,
            alpha=0.3, 
            cmap='Reds', 
            ax=ax)

# Construct the standard ellipse using matplotlib
ellipse = Ellipse(xy=mean_center, # center the ellipse on our mean center
                  width=major*2, # centrography.ellipse db_filtered
                  height=minor*2, 
                  angle = numpy.rad2deg(rotation), # Angles for this are in degrees, not radians
                  facecolor='none', 
                  edgecolor='red', linestyle='--',
                  label='Std. Ellipse')

ax.add_patch(ellipse)

ax.legend()

ax.axis('Off')

ax.set_title(str(len(crime_filtered)) + ' incidents of crime with reported victim descent "' + race + '"')

# add a basemap
ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)
# Display
plt.show()

In [None]:
@interact
def crime_ellipse(race=df.race.unique().tolist()):
    # filter the data by race
    crime_filtered = df[df.race == race]

    # mean center and median
    mean_center = centrography.mean_center(crime_filtered[['x','y']])
    med_center = centrography.euclidean_median(crime_filtered[['x','y']])

    # standard ellipse
    major, minor, rotation = centrography.ellipse(crime_filtered[['x','y']])

    # Set up figure and axis
    f, ax = plt.subplots(1, figsize=(9, 9))

    # plot arrest points
    ax.scatter(crime_filtered['x'], crime_filtered['y'], s=0.75)

    # add the mean and median center points
    ax.scatter(*mean_center, color='red', marker='x', label='Mean Center')
    ax.scatter(*med_center, color='limegreen', marker='o', label='Median Center')

    # heatmap
    sns.kdeplot(x = crime_filtered.geometry.x, 
                y = crime_filtered.geometry.y,
                n_levels=20, 
                shade=False,
                thresh=0.1,
                alpha=0.3, 
                cmap='Reds', 
                ax=ax)

    # Construct the standard ellipse using matplotlib
    ellipse = Ellipse(xy=mean_center, # center the ellipse on our mean center
                      width=major*2, # centrography.ellipse db_filtered
                      height=minor*2, 
                      angle = numpy.rad2deg(rotation), # Angles for this are in degrees, not radians
                      facecolor='none', 
                      edgecolor='red', linestyle='--',
                      label='Std. Ellipse')

    ax.add_patch(ellipse)

    ax.legend()

    ax.axis('Off')

    ax.set_title(str(len(crime_filtered)) + ' incidents of crime with reported victim descent "' + race + '"')

    # add a basemap
    ctx.add_basemap(ax,source=ctx.providers.CartoDB.DarkMatter)
    # Display
    plt.show()

In [None]:
races=df.race.unique().tolist()
for race in races:
    crime_ellipse(race)