# Graph here, graph there
---

It is time to work with spatial weights matrices by yourself.

Zones of suburbanisation
You are familiar with Prague from the last section, so let’s zoom out to zones of suburbanisation around Prague and other Czech cities. Head over to the DataHub of the Faculty of Science and download the dataset called “Zóny rezidenční suburbanizace 2008-2016” containing the zones of residential suburbanisation outlined by Ouřednı́ček, Klsák, and Špačková (2019). Download the dataset and open it with geopandas. Use "OBJECTID" column as an index (other feasible columns are not unique - they contain duplicated entries).


## Interaction with Graphs
- Create a contiguity matrix using the Queen criterion
- Let’s focus on Prague (ID 891 in the table). How many neighbours does it have?
- Reproduce the previous section’s zoom plot with Prague and its neighbours. Can you make that plot as both static and interactive maps?
- Create a block spatial weights matrix where every geometry is connected to other geometries in the NUTS2 region.
- Create KNN weights with 5 neighbours. Remember that KNN expects point geometry.
- Compare the number of neighbours by geometry for the three weights matrices. Which one has more? Why? Can you compare distributions of number of neighbors as kde plots?

In [20]:
import geopandas as gpd
from libpysal import graph
import contextily as ctx
import sys


# data explore
gdf_suburban = gpd.read_file(r'data\zony_suburbanizace_2008_2016.shp')
gdf_suburban = gdf_suburban.set_index('OBJECTID')
gdf_suburban.head()

Unnamed: 0_level_0,obec_kod,obec_nazev,obyv_31122,obyv_311_1,prist_09_1,byty_09_16,POU_kod,POU_nazev,ORP_kod,ORP_nazev,...,zmena_vy_2,jadro_1__4,jadro_1__5,jadro_2__4,jadro_2__5,jadro_3__4,jadro_3__5,SHAPE_Leng,SHAPE_Area,geometry
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,500011,Želechovice nad Dřevnicí,1943,1864,296.0,15,72131,Zlín,7213,Zlín,...,nové suburbium,Zlín,585068.0,,,,,28403.65986,16025720.0,"POLYGON ((-516000.47 -1165924.58, -515794.17 -..."
2,500020,Petrov nad Desnou,?,1185,287.0,14,71112,Šumperk,7111,Šumperk,...,nové suburbium,Šumperk,523704.0,,,,,16097.676743,12087010.0,"POLYGON ((-556406.04 -1072328.53, -556374.35 -..."
3,500291,Vřesina,2605,2903,750.0,83,81191,Ostrava,8119,Ostrava,...,,Ostrava,554821.0,,,,,12601.626837,8652943.0,"POLYGON ((-481995.85 -1100599.6, -481850.23 -1..."
4,500496,Olomouc,100373,100378,17480.0,2924,71072,Olomouc,7107,Olomouc,...,,,,,,,,91422.80238,103334400.0,"POLYGON ((-546976.47 -1114550.74, -547037.34 -..."
5,500526,Bělkovice-Lašťany,2087,2256,490.0,69,71072,Olomouc,7107,Olomouc,...,,Olomouc,500496.0,,,,,26373.625742,15296510.0,"POLYGON ((-540894.02 -1109008.6, -540901.44 -1..."


In [18]:
prague_idx = gdf_suburban.index[gdf_suburban['obec_nazev'] == 'Praha'].to_list()[0]

# build queen contiguity
queen = graph.Graph.build_contiguity(gdf_suburban, rook=False)
# count neighbours
neighbours = queen[prague_idx].count()
print(f'Prague has {neighbours} neighbours')

# create geodataframe for Prague
queen_prg_neighbor = queen[prague_idx].index.tolist()
queen_prg = queen.subgraph(queen_prg_neighbor)

# create geodataframe for Prague neighbours
gdf_prague_neighbor = gdf_suburban.loc[queen_prg_neighbor]

Prague has 38 neighbours


### Interactive map

In [None]:
# interactive map
m = gdf_suburban.loc[queen[prague_idx].index].explore(color="#25b497", highlight=False)
gdf_suburban.loc[[prague_idx]].explore(m=m, color="#fa94a5", highlight=False)

### Static map


In [None]:
# static map
ax = gdf_prague_neighbor.plot(ax=None, color='white', edgecolor='black')
queen_prg.plot(ax=ax, gdf=gdf_prague_neighbor, nodes=False, edge_kws={"linewidth": .5})
ctx.add_basemap(ax=ax, crs=gdf_prague_neighbor.crs, source="CartoDB Positron")
ax.set_axis_off()

### Aggregation of spatial weights

In [None]:
block = graph.Graph.build_block_contiguity(gdf_suburban['NUTS2_naze'])
ax = block.plot(gdf_suburban, nodes=False, edge_kws={"linewidth": .01})
ctx.add_basemap(ax=ax, crs=gdf_suburban.crs, source="CartoDB Positron")
ax.set_axis_off()

### Graph of K-nearest neighbours

In [None]:
# centroids
gdf_suburban_centroid = gdf_suburban.copy()
gdf_suburban_centroid["centroid"] = gdf_suburban_centroid.centroid
gdf_suburban_centroid = gdf_suburban_centroid.set_geometry("centroid")
graph_suburban_knn5 = graph.Graph.build_knn(gdf_suburban_centroid, k=5)
graph_suburban_knn5

# visualize
ax = graph_suburban_knn5.plot(gdf_suburban_centroid, nodes=True, edge_kws={"linewidth": .5,}, node_kws={"s": 1})
ctx.add_basemap(ax=ax, crs=gdf_suburban_centroid.crs, source="CartoDB Positron")
ax.set_axis_off()



 Compare the number of neighbours by geometry for the three weights matrices. Which one has more? Why? Can you compare distributions of number of neighbors as kde plots

In [None]:
graph_suburban_dist_band  = graph.Graph.build_distance_band(gdf_suburban_centroid, 20000)
graph_suburban_dist_band[prague_idx]

# visualize
ax = graph_suburban_dist_band.plot(gdf_suburban_centroid, nodes=True, edge_kws={"linewidth": .05,}, node_kws={"s": 1})
ctx.add_basemap(ax=ax, crs=gdf_suburban_centroid.crs, source="CartoDB Positron")
ax.set_axis_off()

## Spatial lag
Let’s have a look at spatial lag. 
- Before proceeding, you will probably need to pre-process the column with the population ("obyv_31122") since it comes as string. Assuming the GeoDataFrame is called suburbanisation, you can do the following to cast it to float.

- suburbanisation["obyv_31122"] = (suburbanisation["obyv_31122"].replace("?", None).astype(float))

- Measure spatial lag (as mean, so don’t forget to standardise your weights) of the "obyv_31122" column using all weights matrices you have created.
- What is the difference in results for Prague? Can you explain why?

In [19]:
# data preprocessing
gdf_suburban["obyv_31122"] = (gdf_suburban["obyv_31122"].replace("?", None).astype(float))

# compute lag on attribute population
queen_lag = queen.lag(gdf_suburban["obyv_31122"])
gdf_suburban['pop_lag'] = queen_lag