# Spatial Joins

Spatial joins are what make place-based exploration meaningful. Your project may like to investigate the relevance of the location of points of interest. Are hospitals, schools, police stations, fire stations all located in areas that can effectively serve its population? How many instances of crime are recorded by neighborhood? What census tracts have the highest counts of traffic incidents, and what are the characteristics of those tracts?

So how do spatial joins work? Unlike an attribute join, a "spatial join" joins two spatial datasets by where they are located relevant to each other. For example, if you have a point dataset that you want to join to a polygon dataset, you can spatially join them to produce a new layer that tells you which polygon each point fell inside of. Or vice versa! You can find out *how many* points fall inside each polygon.

In this lab, we want ask the question: Are there spatial correlations for different travel behaviors within different Los Angeles Neighborhoods?

To answer this question, we will look at two different datasets:

1. Census data with data for "Means of transportation to work"
2. Neighborhood boundaries from the Los Angeles Times

In [1]:
import pandas as pd
import geopandas as gpd
import contextily as ctx

## Census Tracts

Use [Census Reporter](https://censusreporter.org/) to grab census data at the tract level.

In [2]:
tracts = gpd.read_file('trans.geojson')

DriverError: trans.geojson: No such file or directory

In [None]:
tracts.head()

In [None]:
# first row is the total for the county so drop it
tracts=tracts.drop([0])

In [None]:
# look at tracts data again
tracts.head()

In [None]:
tracts.shape

## Metadata
What is the metadata? When you download data from censusrepoter.com, it comes with a metadata.json file. You can open this with any text editor (even your browser) to see its contents.
- [metadata](metadata.json)

```
B08105A001: {
indent: 0,
name: "Total:"
},
B08105A002: {
indent: 1,
name: "Car, truck, or van - drove alone"
},
B08105A003: {
indent: 1,
name: "Car, truck, or van - carpooled"
},
B08105A004: {
indent: 1,
name: "Public transportation (excluding taxicab)"
},
B08105A005: {
indent: 1,
name: "Walked"
},
B08105A006: {
indent: 1,
name: "Taxicab, motorcycle, bicycle, or other means"
},
B08105A007: {
indent: 1,
name: "Worked at home"
}
```

In [None]:
# columns
tracts.columns.to_list()

In [None]:
# rename to human readable column names
tracts.columns=['geoid',
 'name',
 'Total',
 'Total, Error',
 'Drove alone',
 'Drove alone, Error',
 'Carpooled',
 'Carpooled, Error',
 'Public transportation',
 'Public transportation, Error',
 'Walked',
 'Walked, Error',
 'Other',
 'Other, Error',
 'Worked from home',
 'Worked from home, Error',
 'geometry']

In [None]:
# get rid of the error columns
tracts = tracts[['geoid',
 'name',
 'Total',
 'Drove alone',
 'Carpooled',
 'Public transportation',
 'Walked',
 'Other',
 'Worked from home',
 'geometry']]

In [None]:
tracts.sample(5)

## Map plot

In [None]:
ax = tracts.plot(figsize=(12,12),
                 column='Public transportation',
                 legend=True,
                 scheme='equal_interval')

## Clean it up!
- add a basemap
- remove the axix
- add a title

In [None]:
# add a basemap with contextily

# 1. first reproject to web mercator
tracts_web_mercator = tracts.to_crs(epsg=3857)

In [None]:
ax = tracts_web_mercator.plot(figsize=(15,15),
                 column='Public transportation',
                 legend=True,
                 alpha=0.8,
#                  scheme='equal_interval'
                             )

# remove the axis
ax.axis('off')

# add a title
ax.set_title('Public Transportation Users in Los Angeles')

ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

## Bubble map

### Using centroids

In [None]:
# how about centroids?
tracts_web_mercator['centroid'] = tracts_web_mercator['geometry'].centroid

In [None]:
tracts_web_mercator.head()

In [None]:
# switch the geometry column from polygon to centroid
tracts_web_mercator = tracts_web_mercator.set_geometry('centroid')

In [None]:
tracts_web_mercator.plot(figsize=(12,12))

In [None]:
ax = tracts_web_mercator.plot(figsize=(15,15),
                 markersize='Public transportation',
                 column='Public transportation',
                 alpha=0.4, 
                 legend=True,
#                  categorical=True,
#                  scheme='quantiles',
                 cmap='RdYlGn_r',
#                  legend_kwds={'loc':'upper left','bbox_to_anchor':(1,1)}
                )
# remove the axis
ax.axis('off')

# add a title
ax.set_title('Public Transportation Users in Los Angeles')

ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

## LA Neighborhoods

Bring in Neighborhoods from the LA Times


In [None]:
neighborhoods = gpd.read_file('http://s3-us-west-2.amazonaws.com/boundaries.latimes.com/archive/1.0/boundary-set/la-county-neighborhoods-v5.geojson')

What is the coordinate system?

In [None]:
neighborhoods.crs

In [None]:
# reproject to Web Mercator
neighborhoods_web_mercator = neighborhoods.to_crs(epsg=3857)

In [None]:
# map it
ax = neighborhoods_web_mercator.plot(figsize=(12,12),alpha=0.8)
ax.axis('off')
ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

## Unique neighborhoods

In [None]:
neighborhoods_web_mercator.name.unique()

## One neighborhood at a time

In [None]:
neighborhoods_web_mercator[neighborhoods_web_mercator.name=='Westwood']

In [None]:
westwood = neighborhoods_web_mercator[neighborhoods_web_mercator.name=='Westwood']

In [None]:
ax = westwood.plot(figsize=(12,12),alpha=0.6)
ax.axis('off')
ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

## Spatial join: find census tracts within a neighborhood

In [None]:
# find the census tracts that fall within 
tracts_in_neighborhood = gpd.sjoin(tracts_web_mercator, westwood, how="inner", op='intersects')

In [None]:
tracts_in_neighborhood.head()

In [None]:
ax = tracts_in_neighborhood.plot(figsize=(15,15),
                 markersize='Public transportation',
                 column='Public transportation',
                 alpha=0.4, 
                 legend=True,
#                  categorical=True,
#                  scheme='quantiles',
                 cmap='RdYlGn_r',
#                  legend_kwds={'loc':'upper left','bbox_to_anchor':(1,1)}
                )
# remove the axis
ax.axis('off')

# add a title
ax.set_title('Public Transportation Users in Los Angeles')

ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

In [None]:
# put the geometry back to polygon
tracts_in_neighborhood = tracts_in_neighborhood.set_geometry('geometry')

In [None]:
ax = tracts_in_neighborhood.plot(figsize=(15,15),
                 markersize='Public transportation',
                 column='Public transportation',
                 alpha=0.4, 
                 legend=True,
                 cmap='RdYlGn_r'
                )
# remove the axis
ax.axis('off')

# add a title
ax.set_title('Public Transportation Users in Los Angeles')

ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

## Create a function

In [None]:
tracts_web_mercator.plot()

In [None]:
# create a function
def neighborhood_tracts(name='Westwood'):
    # subset neighborhoods by name
    neighborhood = neighborhoods_web_mercator[neighborhoods_web_mercator.name==name]
    
    # spatial join to get tracts within the neighborhood
    tracts_in_neighborhood = gpd.sjoin(tracts_web_mercator,neighborhood, how="inner", op='intersects')
    
    # switch the geometry column from polygon to centroid
    tracts_in_neighborhood = tracts_in_neighborhood.set_geometry('geometry')

    ax = tracts_in_neighborhood.plot(figsize=(15,15),
                     markersize='Public transportation',
                     column='Public transportation',
                     alpha=0.4, 
                     legend=True,
                     cmap='RdYlGn_r'
                    )
    # remove the axis
    ax.axis('off')

    # add a title
    ax.set_title('Public Transportation Users in '+name)

    ctx.add_basemap(ax,source=ctx.providers.CartoDB.Positron)

In [None]:
downtown = neighborhood_tracts('Downtown')
downtown

In [None]:
santa_monica = neighborhood_tracts('Koreatown')

# Spatial Autocorrelation

Tobler's law:

How similar are census tracts to their neighbors based on their usage of public transportation?

In [None]:
import esda
from esda.moran import Moran, Moran_Local

In [None]:
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster

import libpysal as lps

To calculate Queen contiguity spatial weights, we use Pysal.

In [None]:
w =  lps.weights.Queen.from_dataframe(tracts)
w.transform = 'r'

## Spatial Weights and Spatial Lag
Spatial weights are how we determine the area’s neighborhood. There are different statistical methods that are used for determining spatial weights, and it is beyond this to provide an in-depth explanation of each in this article. One of the most commonly used spatial weights methods is Queen Contiguity Matrix, which we use. Here is a diagram explaining how the Queen contiguity matrix works ( included also is the rook contiguity matrix)

![Queen](https://www.researchgate.net/profile/Matthew_Tenney/publication/304782766/figure/fig8/AS:380175423426567@1467652292591/Rooks-vs-Queens-Contiguity.png)

Next, calculate the spatial lag. In other words, get the average of the values from neighoring tracts as defined by the contiguity weights above.

In [None]:
tracts['w_public'] = lps.weights.lag_spatial(w, tracts['Public transportation'])

In [None]:
tracts[['Public transportation','w_public']].sample(5)

In [None]:
px.scatter(tracts,x='Public transportation',y='w_public')

## Global Spatial Autocorrelation
Global spatial autocorrelation determines the overall pattern in the dataset. Here we can calculate if there is a trend and summarize the variable of interest. Moran’s I statistics is typically used to determine the global spatial autocorrelation, so let us calculate that.


In [None]:
y = tracts['Public transportation']
moran = Moran(y, w)
moran.I

In [None]:
fig, ax = moran_scatterplot(moran, aspect_equal=True)
# plt.show()

## Local Spatial Autocorrelation
So far, we have only determined that there is a positive spatial autocorrelation between the price of properties in neighborhoods and their locations. But we have not detected where clusters are. Local Indicators of Spatial Association (LISA) is used to do that. LISA classifies areas into four groups: high values near to high values (HH), Low values with nearby low values (LL), Low values with high values in its neighborhood, and vice-versa.
We had already calculated the weights (w) and determined the price as our variable of interest(y). To calculate Moran Local, we use Pysal’s functionality.

In [None]:
# calculate Moran Local 
m_local = Moran_Local(y, w)

And plot Moran’s Local Scatter Plot.

In [None]:
# Plot
fig, ax = moran_scatterplot(m_local, p=0.005)
ax.set_xlabel('Uses Public Transportation')
ax.set_ylabel('Spatial Lag of Public Transportation')
ax.text(1.95, 0.5, 'HH', fontsize=25)
ax.text(1.95, -1.5, 'HL', fontsize=25)
ax.text(-2, 1, 'LH', fontsize=25)
ax.text(-1, -1, 'LL', fontsize=25)
fig.show()

In [None]:
from splot.esda import lisa_cluster

In [None]:
lisa_cluster(m_local, tracts, p=0.05, figsize = (18,18))
# plt.show()

In [None]:
# prefer a choropleth?
tracts = tracts.set_geometry('geometry')
tracts.plot()

In [None]:
fig,ax = lisa_cluster(m_local, tracts, p=0.05, figsize = (18,18))
# plt.show()

In [None]:
from splot.esda import plot_local_autocorrelation

In [None]:
fig,ax = plot_local_autocorrelation(m_local, tracts, 'Public transportation')

In [None]:
def SA_by_neighborhood(name,variable):
    neighborhood = neighborhood_tracts(name)
    w =  lps.weights.Queen.from_dataframe(neighborhood)
    w.transform = 'r'
    y = neighborhood[variable]
    moran = Moran(y, w)
    moran.I
    # calculate Moran Local 
    m_local = Moran_Local(y, w)
    plot_local_autocorrelation(m_local, neighborhood, variable)
    

In [None]:
SA_by_neighborhood('Long Beach',variable='Drove alone')

In [None]:
SA_by_neighborhood('Koreatown',variable='Drove alone')