# Does it correlate?

In this part, you will try to explore spatial autocorrelation on your own.

## Scottish Index of Multiple Deprivation again

In the exercise on pandas, you have worked with the Scottish Index of Multiple Deprivation (SIMD). Since you learned how to work with spatial data later, there was no geometry assigned. That will change today.

1. Download the ESRI Shapefile version of the Scottish Index of Multiple Deprivation 2020 from this link.
2. Read it as a GeoDataFrame and assign a column you think would be the best as an index.
3. Backup data.
4. Filter the data to work only with Glasgow.
5. Create contiguity weights based on the reduced dataset.

In [None]:
import geopandas as gpd
import esda
import matplotlib.pyplot as plt
import seaborn as sns
from libpysal import graph

In [None]:
data = r'C:\Computation\Scripts\Packages\sds-zapletalj-cze\05_esda\data\SG_SIMD_2020.shp'
gdf_simd = gpd.read_file(data)
gdf_simd = gdf_simd.set_index("DataZone")

gdf_simd_glasgow = gdf_simd.loc[gdf_simd['LAName'] == 'Glasgow City']


## Global spatial autocorrelation

With the data prepared like this:

### Join Counts

- Create a binary variable from "Rankv2" encoding areas with rank above city-wide mean.
- Measure Join Counts statistic on your new variable.
- What conclusions can you reach from the Join Counts?


In [None]:
mean_rank = gdf_simd_glasgow['Rankv2'].mean()
# build contiguity
contiguity = graph.Graph.build_contiguity(gdf_simd_glasgow, rook=False)
gdf_simd_glasgow.loc[:, "bin_rank"] = (gdf_simd_glasgow["Rankv2"] > mean_rank).astype(int)

gdf_simd_glasgow_join_counts = esda.Join_Counts(
    gdf_simd_glasgow["bin_rank"],
    contiguity
)




### Moran’s I

- Visualise the main "Rankv2" with a Moran Plot.
- Calculate Moran’s I.
- What conclusions can you reach from the Moran Plot and Moran’s I? What’s the main spatial pattern? Does it agree with Join counts?

In [None]:
# data standardization
gdf_simd_glasgow["std_Rankv2"] = (
    gdf_simd_glasgow["Rankv2"] - gdf_simd_glasgow["Rankv2"].mean()
) / gdf_simd_glasgow["Rankv2"].std()

# transform contiguity
contiguity_r = contiguity.transform("r")

gdf_simd_glasgow["std_lag_Rankv2"] = contiguity_r.lag(gdf_simd_glasgow["std_Rankv2"])

# Moran's I
moran = esda.Moran(gdf_simd_glasgow["Rankv2"], contiguity)
f, ax = plt.subplots(1, figsize=(6, 6))
sns.regplot(
    x="std_Rankv2",
    y="std_lag_Rankv2",
    data=gdf_simd_glasgow,
    marker=".",
    scatter_kws={"alpha": 0.2},
    line_kws=dict(color="lightcoral")
)
ax.set_aspect('equal')
plt.axvline(0, c="black", alpha=0.5)
plt.axhline(0, c="black", alpha=0.5)
plt.text(2.3, 2.7, "High-high", fontsize=10)
plt.text(2.3, -2.7, "High-low", fontsize=10)
plt.text(-4.4, 2.7, "Low-high", fontsize=10)
plt.text(-4.4, -2.7, "Low-low", fontsize=10)

## Local spatial autocorrelation

Now that you have a good sense of the overall pattern in the SIMD dataset, let’s move to the local scale:

1. Calculate LISA statistics for the areas.
2. Make a map of significant clusters at the 5% level.
3. Can you identify hotspots or coldspots? If so, what do they mean? What about spatial outliers?
- Hot spots - areas with higher values of the variable. In Glasgow, there is clustering of areas with high values (Kelvinside, Crossmyloof).
- Cold spots - areas with lower values of the variable. In Glasgow, there is clustering of areas with low values (Barrowfield, Possilpark).



In [None]:
lisa = esda.Moran_Local(gdf_simd_glasgow['std_Rankv2'], contiguity_r)
gdf_simd_glasgow['cluster'] = lisa.get_cluster_labels(crit_value=0.05)

lisa.explore(
  gdf_simd_glasgow,
  crit_value=0.05,
  prefer_canvas=True,
  tiles="CartoDB Positron",
)


### Warning

The last action is a bit more sophisticated, put all your brain power into it and you’ll achieve it!

- Create cluster maps for significance levels 1% and 10%; compare them with the one we obtained. What are the main changes? Why?
- Create a single map that combines all three significance levels and changes the alpha to distinguish them.
- Can you create both interactive and static versions of those maps?

In [None]:
gdf_simd_glasgow.loc[:, 'cluster10'] = lisa.get_cluster_labels(crit_value=0.1)
gdf_simd_glasgow.loc[:, 'cluster1'] = lisa.get_cluster_labels(crit_value=0.01)
gdf_simd_glasgow.loc[:, 'cluster5'] = lisa.get_cluster_labels(crit_value=0.05)


# create interactive map
m = lisa.explore(
    gdf = gdf_simd_glasgow,
    tiles='CartoDB Positron')

n = lisa.explore(gdf_simd_glasgow, m = m, crit_value=0.1, alpha=0.01)

o = lisa.explore(gdf_simd_glasgow, m = n, crit_value=0.05, alpha=0.3)

lisa.explore(gdf_simd_glasgow, m = o, crit_value=0.01, alpha=0.7)

In [None]:
# create three subplots

fig, axs = plt.subplots(1, 3, figsize=(18, 6))

# Plot for 10% significance level
gdf_simd_glasgow.loc[gdf_simd_glasgow["cluster10"] == "Insignificant"].plot(ax=axs[0], color="lightgrey")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster10"] == "High-High")].plot(ax=axs[0], color="#d7191c")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster10"] == "Low-Low")].plot(ax=axs[0], color="#2c7bb6")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster10"] == "Low-High")].plot(ax=axs[0], color="#abd9e9")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster10"] == "High-Low")].plot(ax=axs[0], color="#fdae61")
axs[0].set_title('10% Significance Level')

# Plot for 5% significance level
gdf_simd_glasgow.loc[gdf_simd_glasgow["cluster5"] == "Insignificant"].plot(ax=axs[1], color="lightgrey")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster5"] == "High-High")].plot(ax=axs[1], color="#d7191c")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster5"] == "Low-Low")].plot(ax=axs[1], color="#2c7bb6")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster5"] == "Low-High")].plot(ax=axs[1], color="#abd9e9")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster5"] == "High-Low")].plot(ax=axs[1], color="#fdae61")
axs[1].set_title('5% Significance Level')

# Plot for 1% significance level
gdf_simd_glasgow.loc[gdf_simd_glasgow["cluster1"] == "Insignificant"].plot(ax=axs[2], color="lightgrey")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster1"] == "High-High")].plot(ax=axs[2], color="#d7191c")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster1"] == "Low-Low")].plot(ax=axs[2], color="#2c7bb6")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster1"] == "Low-High")].plot(ax=axs[2], color="#abd9e9")
gdf_simd_glasgow.loc[(gdf_simd_glasgow["cluster1"] == "High-Low")].plot(ax=axs[2], color="#fdae61")
axs[2].set_title('1% Significance Level')

plt.tight_layout()
plt.show()