# Urban morphometrics

Morpohometric assessment measure wide range of characters of urban form to derive a complex description of built-up patterns composed of enclosed tessellation, buildings and street network.

All algorithms used within this notebook are part of `momepy` Python toolkit and can be used from there. We have extracted them from `momepy`, adapted for `dask` and `pygeos` and used in raw form tailored directly to our use case. The algorithms which were enhanced are pushed back to momepy and will be part of `momepy` 0.4.0.

All steps within this notebook are parallelised using `dask`. The first part, which measures aspects of individual elements (does not require to know the context) uses pre-release of `dask-geopandas`. The rest uses `dask` to manage parallel iteration over geo-chunks with single-core algorithms. 

Some functions are imported from a `momepy_utils.py` file stored wihtin this directory. Those are either helper functions taken directly from momepy or their enhanced versions, all which will be included in the next release of momepy:

- `get_edge_ratios` is implemented in momepy 0.4.0 as `get_network_ratio`
- `get_nodes` is included in `get_node_id`
- remaining functions have been used to refactor existing momepy classes.


## Individual elements

Note: Requires dask-geopandas and current master of geopandas to support dask version.

In [None]:
# !pip install git+git://github.com/jsignell/dask-geopandas.git
# !pip install git+git://github.com/geopandas/geopandas.git

In [4]:
import time
import warnings
from time import time

import dask.dataframe as dd
import dask_geopandas as dask_geopandas
import geopandas
import libpysal
import momepy
import networkx as nx
import numpy as np
import pandas as pd
import pygeos
import scipy
from tqdm.notebook import tqdm
from dask.distributed import Client, LocalCluster, as_completed
from libpysal.weights import Queen
from momepy_utils import (
    _circle_radius,
    centroid_corner,
    elongation,
    get_corners,
    get_edge_ratios,
    get_nodes,
    solar_orientation_poly,
    squareness,
)

We are using a single machine wihtin this notebook with 14 cores, so we start local dask cluster with 14 workers. 

In [None]:
client = Client(LocalCluster(n_workers=14))
client

`dask-geopandas` is still under development and raises few warnigns at the moment, all which can be ignored.

In [None]:
warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')
warnings.filterwarnings('ignore', message='.*Assigning CRS to a GeoDataFrame without a geometry*')

### Measuring buildings and enclosed cells

In the first step, we iterate over geo-chunks, merge enclosed tessellation and buildings to a single `geopandas.GeoDataFrame` and convert it to `dask.GeoDataFrame`. The rest of the code is mostly an extraction from momepy source code adapted for dask.

In [None]:
for chunk_id in tqdm(range(103), total=103):
    
    # Load data and merge them together
    blg = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/buildings/blg_{chunk_id}.pq")
    tess = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/tessellation/tess_{chunk_id}.pq")
    
    blg = blg.rename_geometry('buildings')
    tess = tess.rename_geometry('tessellation')

    df = tess.merge(blg, on='uID', how='left')
    
    # Convert to dask.GeoDataFrame
    ddf = dask_geopandas.from_geopandas(df, npartitions=14)
    
    ## Measure morphometric characters
    # Building area
    ddf['sdbAre'] = ddf.buildings.area
    
    # Building perimeter
    ddf['sdbPer'] = ddf.buildings.length
    
    # Courtyard area
    exterior_area = ddf.buildings.map_partitions(lambda series: pygeos.area(pygeos.polygons(series.exterior.values.data)), meta='float')
    ddf['sdbCoA'] = exterior_area - ddf['sdbAre']

    # Circular compactness
    hull = ddf.buildings.convex_hull.exterior

    radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
    ddf['ssbCCo'] = ddf['sdbAre'] / (np.pi * radius ** 2)

    # Corners
    ddf['ssbCor'] = ddf.buildings.apply(lambda g: get_corners(g), meta='float')

    # Squareness
    ddf['ssbSqu'] = ddf.buildings.apply(lambda g: squareness(g), meta='float')
    
    # Equivalent rectangular index
    bbox = ddf.buildings.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
    ddf['ssbERI'] = (ddf['sdbAre'] / bbox.area).pow(1./2) * (bbox.length / ddf['sdbPer'])

    # Elongation
    ddf['ssbElo'] = bbox.map_partitions(lambda s: elongation(s), meta='float')
    
    # Centroid corner mean distance and deviation
    def _centroid_corner(series):
        ccd = series.apply(lambda g: centroid_corner(g))
        return pd.DataFrame(ccd.to_list(), index=series.index)

    
    ddf[['ssbCCM', 'ssbCCD']] = ddf.buildings.map_partitions(_centroid_corner, meta=pd.DataFrame({0: [0.1], 1: [1.1]}))
    
    # Solar orientation
    ddf['stbOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
    
    # Tessellation longest axis length
    hull = ddf.tessellation.convex_hull.exterior

    ddf['sdcLAL'] = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
    
    # Tessellation area
    ddf['sdcAre'] = ddf.tessellation.area
    
    # Circular compactness
    radius = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float')
    ddf['sscCCo'] = ddf['sdcAre'] / (np.pi * radius ** 2)
    
    # Equivalent rectangular index
    bbox = ddf.tessellation.apply(lambda g: g.minimum_rotated_rectangle, meta=geopandas.GeoSeries())
    ddf['sscERI'] = (ddf['sdcAre'] / bbox.area).pow(1./2) * (bbox.length / ddf.tessellation.length)
    
    # Solar orientation
    ddf['stcOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
    
    # Covered area ratio
    ddf['sicCAR'] = ddf['sdbAre'] / ddf['sdcAre']
    
    # Building-cell alignment
    ddf['stbCeA'] = (ddf['stbOri'] - ddf['stcOri']).abs()
    
    # Compute all characters using dask
    df = ddf.compute()
    
    # Save to parquet file
    df.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    client.restart()
    time.sleep(5)

### Measuring enclosures

All enclosures are loaded as a single dask.GeoDataFrame and measured at once.

In [None]:
%%time
# Load data
encl = dask_geopandas.read_parquet("../../urbangrammar_samba/spatial_signatures/enclosures/encl_*.pq")

# Area
encl['ldeAre'] = encl.geometry.area

# Perimeter
encl['ldePer'] = encl.geometry.length

# Circular compacntess
hull = encl.geometry.convex_hull.exterior

radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
encl['lseCCo'] = encl['ldeAre'] / (np.pi * radius ** 2)

# Equivalent rectangular index
bbox = encl.geometry.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
encl['lseERI'] = (encl['ldeAre'] / bbox.area).pow(1./2) * (bbox.length / encl['ldePer'])

# Compactness-weighted axis
longest_axis = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
encl['lseCWA'] = longest_axis * ((4 / np.pi) - (16 * encl['ldeAre']) / ((encl['ldePer']) ** 2))

# Solar orientation
encl['lteOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')

# Compute data and return geopandas.GeoDataFrame
encl_df = encl.compute()

# Weighted number of neighbors
inp, res = encl_df.sindex.query_bulk(encl_df.geometry, predicate='intersects')
indices, counts = np.unique(inp, return_counts=True)
encl_df['neighbors'] = counts - 1
encl_df['lteWNB'] = encl_df['neighbors'] / encl_df['ldePer']

# Load complete enclosed tessellation as a dask.GeoDataFrame
tess = dd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_*.pq")

# Measure weighted cells within enclosure
encl_counts = tess.groupby('enclosureID').count().compute()
merged = encl_df[['enclosureID', 'ldeAre']].merge(encl_counts[['geometry']], how='left', on='enclosureID')
encl_df['lieWCe'] = merged['geometry'] / merged['ldeAre']

# Save data to parquet
encl_df.drop(columns='geometry').to_parquet("../../urbangrammar_samba/spatial_signatures/morphometrics/enclosures.pq")

We can now close dask client.

In [None]:
client.close()

## Generate spatial weights (W)

Subsequent steps will require understanding of the context of each tessellation cell in a form of spatial weights matrices (Queen contiguity and Queen contiguty of inclusive 3rd order). We generate them beforehand and store as `npz` files representing sparse matrix.

Each geo-chunk is loaded together with relevant cross-chunk tessellation cells (to avoid edge effect). We use dask to parallelise the iteration. Number of workers is smaller now to ensure enough memory for each chunk.

In [None]:
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

First we have to specify a function doing the processing itself, where the only attribure is the `chunk_id`.

In [None]:
def generate_w(chunk_id):
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation')
    w3 = momepy.sw_high(k=3, weights=w)
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz", w.sparse)
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz", w3.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully."

Then we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes (courtesy of Matthew Rocklin). That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.

In [None]:
%%time
inputs = iter(range(103))
futures = [client.submit(generate_w, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(generate_w, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())

In [None]:
client.close()

## Spatial distribution and network analysis

To measure spatial distribution of we use single-core algorithm and parallelise iteration.

In [None]:
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

We will need to load street network data from PostGIS datatabase, so we establish a connection which will be used within the loop.

In [None]:
cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices.pq')
chunks = geopandas.read_parquet('../../urbangrammar_samba/spatial_signatures/local_auth_chunks.pq')

user = os.environ.get('DB_USER')
pwd = os.environ.get('DB_PWD')
host = os.environ.get('DB_HOST')
port = os.environ.get('DB_PORT')

db_connection_url = f"postgres+psycopg2://{user}:{pwd}@{host}:{port}/built_env"

Within the same function below we measure spatial distribution of elements and network-based characters.

In [None]:
def measure(chunk_id):
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    cells['keep'] = True
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    # read W
    w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz")).to_W()
    
    # alignment
    def alignment(x, orientation='stbOri'):
        orientations = df[orientation].iloc[w.neighbors[x]]
        return abs(orientations - df[orientation].iloc[x]).mean()
    
    df['mtbAli'] = [alignment(x) for x in range(len(df))]

    # mean neighbour distance
    def neighbor_distance(x):
        geom = df.buildings.iloc[x]
        if geom is None:
            return np.nan
        return df.buildings.iloc[w.neighbors[x]].distance(df.buildings.iloc[x]).mean()

    df['mtbNDi'] = [neighbor_distance(x) for x in range(len(df))]
    
    # weighted neighbours
    df['mtcWNe'] = pd.Series([w.cardinalities[x] for x in range(len(df))], index=df.index) / df.tessellation.length
    
    # area covered by neighbours
    def area_covered(x, area='sdcAre'):
        neighbours = [x]
        neighbours += w.neighbors[x]

        return df[area].iloc[neighbours].sum()

    df['mdcAre'] = [area_covered(x) for x in range(len(df))]
    
    # read W3
    w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
      
    # weighted reached enclosures
    def weighted_reached_enclosures(x, area='sdcAre', enclosure_id='enclosureID'):
        neighbours = [x]
        neighbours += w3.neighbors[x]

        vicinity = df[[area, enclosure_id]].iloc[neighbours]

        return vicinity[enclosure_id].unique().shape[0] / vicinity[area].sum()
    
    df['ltcWRE'] = [weighted_reached_enclosures(x) for x in range(len(df))]
    
    # mean interbuilding distance
    # define adjacency list from lipysal
    adj_list = w.to_adjlist(remove_symmetric=False)
    adj_list["weight"] = (
        df.buildings.iloc[adj_list.focal]
        .reset_index(drop=True)
        .distance(df.buildings.iloc[adj_list.neighbor].reset_index(drop=True)).values
    )

    G = nx.from_pandas_edgelist(
            adj_list, source="focal", target="neighbor", edge_attr="weight"
        )
    ibd = []
    for i in range(len(df)):
        try:
            sub = nx.ego_graph(G, i, radius=3)
            ibd.append(np.nanmean([x[-1] for x in list(sub.edges.data('weight'))]))
        except:
            ibd.append(np.nan)

    df['ltbIBD'] = ibd
    
    # Reached neighbors and area on 3 topological steps on tessellation
    df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
    df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]

    # Save cells to parquet keeping only within-chunk data not the additional neighboring
    df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")

    # Load street network for an extended chunk area
    chunk_area = chunks.geometry.iloc[chunk_id].buffer(5000)  # we extend the area by 5km to minimise edge effect
    engine = create_engine(db_connection_url)
    sql = f"SELECT * FROM openroads_200803_topological WHERE ST_Intersects(geometry, ST_GeomFromText('{chunk_area.wkt}',27700))"
    streets = geopandas.read_postgis(sql, engine, geom_col='geometry')
    
    # Street profile (measures width, width deviation and openness)
    sp = street_profile(streets, blg)
    streets['sdsSPW'] = sp[0]
    streets['sdsSWD'] = sp[1]
    streets['sdsSPO'] = sp[2]
    
    # Street segment length
    streets['sdsLen'] = streets.length
    
    # Street segment linearity
    streets['sssLin'] = momepy.Linearity(streets).series
    
    # Convert geopadnas.GeoDataFrame to networkx.Graph for network analysis
    G = momepy.gdf_to_nx(streets)
    
    # Node degree
    G = momepy.node_degree(G)
    
    # Subgraph analysis (meshedness, proportion of 0, 3 and 4 way intersections, local closeness)
    G = momepy.subgraph(
        G,
        radius=5,
        meshedness=True,
        cds_length=False,
        mode="sum",
        degree="degree",
        length="mm_len",
        mean_node_degree=False,
        proportion={0: True, 3: True, 4: True},
        cyclomatic=False,
        edge_node_ratio=False,
        gamma=False,
        local_closeness=True,
        closeness_weight="mm_len",
        verbose=False
    )
    
    # Cul-de-sac length
    G = momepy.cds_length(G, radius=3, name="ldsCDL", verbose=False)
    
    # Square clustering
    G = momepy.clustering(G, name="xcnSCl")
    
    # Mean node distance
    G = momepy.mean_node_dist(G, name="mtdMDi", verbose=False)
    
    # Convert networkx.Graph back to GeoDataFrames and W (denoting relationships between nodes)
    nodes, edges, sw = momepy.nx_to_gdf(G, spatial_weights=True)
    
    # Generate inclusive higher order weights
    edges_w3 = momepy.sw_high(k=3, gdf=edges)
    
    # Mean segment length
    edges["ldsMSL"] = momepy.SegmentsLength(edges, spatial_weights=edges_w3, mean=True, verbose=False).series
    
    # Generate inclusive higher order weights
    nodes_w5 = momepy.sw_high(k=5, weights=sw)
    
    # Node density
    nodes["lddNDe"] = momepy.NodeDensity(nodes, edges, nodes_w5, verbose=False).series
    
    # Weighter node density
    nodes["linWID"] = momepy.NodeDensity(nodes, edges, nodes_w5, weighted=True, node_degree="degree", verbose=False).series
    
    # Save to parquets
    edges.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
    nodes.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")


    return f"Chunk {chunk_id} processed sucessfully."

Again we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes. That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.

In [None]:
inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(measure, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())

In [None]:
client.close()

## Link elements together

For the further analysis, we need to link data measured on individual elements together. We link cells to edges based on the proportion of overlap (if a cell intersects more than one edge) and nodes based on proximity (with a restriction - node has to be on linked edge). Enclosures are linked based on enclosure ID.

As above, we define a single-core function and use dask to manage parallel iteration.

In [None]:
def link(chunk_id):
    s = time()
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    edges = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
    nodes = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
    
    cells['edgeID'] = get_edge_ratios(cells, edges)
    cells['nodeID'] = get_nodes(cells, nodes, edges, 'nodeID', 'edgeID', 'node_start', 'node_end')
    
    characters = ['sdsSPW', 'sdsSWD', 'sdsSPO', 'sdsLen', 'sssLin', 'ldsMSL']
    l = []
    for d in cells.edgeID:
        l.append((edges.iloc[list(d.keys())][characters].multiply(list(d.values()), axis='rows')).sum(axis=0))
    cells[characters] = pd.DataFrame(l, index=cells.index)
    
    cells = cells.merge(nodes.drop(columns=['geometry']), on='nodeID', how='left')
    cells = cells.rename({'degree': 'mtdDeg', 'meshedness': 'lcdMes', 'proportion_3': 'linP3W', 'proportion_4': 'linP4W',
                     'proportion_0': 'linPDE', 'local_closeness': 'lcnClo'}, axis='columns')
    
    cells['edgeID_keys'] = cells.edgeID.apply(lambda d: list(d.keys()))
    cells['edgeID_values'] = cells.edgeID.apply(lambda d: list(d.values()))
    
    cells.drop(columns='edgeID').to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."

In [None]:
workers = 14
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

In [None]:
%%time
inputs = iter(range(103))
futures = [client.submit(link, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(link, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())

In [None]:
client.close()

Enclosures are linked via simple attribute join and since the operation is does not require any computation, it is done as a simple loop.

In [None]:
enclosures = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/enclosures.pq")

In [None]:
for chunk_id in range(103):
    s = time()
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    cells = cells.merge(enclosures.drop(columns=['neighbors']), on='enclosureID', how='left')
    
    cells.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    print(f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds.")

## Inter-element characters

The remaining morphometric characters are based on a relations between multiple elements. The implementation mirrors the approach above.

In [None]:
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

In [None]:
def measure(chunk_id):
    s = time()
    # Load data
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    edges = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
    nodes = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
    
    # Street Alignment
    edges['orient'] = momepy.Orientation(edges, verbose=False).series
    edges['edgeID'] = range(len(edges))
    keys = cells.edgeID_values.apply(lambda a: np.argmax(a))
    cells['edgeID_primary'] = [inds[i] for inds, i in zip(cells.edgeID_keys, keys)]
    cells['stbSAl'] = momepy.StreetAlignment(cells, 
                                             edges, 
                                             'stbOri', 
                                             left_network_id='edgeID_primary', 
                                             right_network_id='edgeID').series
   
    # Area Covered by each edge
    vals = {x:[] for x in range(len(edges))}
    for i, keys in enumerate(cells.edgeID_keys):
        for k in keys:
            vals[k].append(i)
    area_sums = []
    for inds in vals.values():
        area_sums.append(cells.sdcAre.iloc[inds].sum())
    edges['sdsAre'] = area_sums
    
    # Building per meter
    bpm = []
    for inds, l in zip(vals.values(), edges.sdsLen):
        bpm.append(cells.buildings.iloc[inds].notna().sum() / l if len(inds) > 0 else 0)
    edges['sisBpM'] = bpm
    
    # Cell area
    nodes['sddAre'] = nodes.nodeID.apply(lambda nid: cells[cells.nodeID == nid].sdcAre.sum())
    
    # Area covered by neighboring edges + count of reached cells
    edges_W = Queen.from_dataframe(edges)
    
    areas = []
    reached_cells = []
    for i in range(len(edges)):
        neighbors = [i] + edges_W.neighbors[i]
    #     areas
        areas.append(edges.sdsAre.iloc[neighbors].sum())
    #     reached cells
        ids = []
        for n in neighbors:
             ids += vals[n]
        reached_cells.append(len(set(ids)))

    edges['misCel'] = reached_cells
    edges['mdsAre'] = areas
    
    # Area covered by neighboring (3 steps) edges + count of reached cells
    edges_W3 = momepy.sw_high(k=3, weights=edges_W)
    
    areas = []
    reached_cells = []
    for i in range(len(edges)):
        neighbors = [i] + edges_W3.neighbors[i]
    #     areas
        areas.append(edges.sdsAre.iloc[neighbors].sum())
    #     reached cells
        ids = []
        for n in neighbors:
             ids += vals[n]
        reached_cells.append(len(set(ids)))

    edges['lisCel'] = reached_cells
    edges['ldsAre'] = areas

    # Link together 
    e_to_link = ['sdsAre', 'sisBpM', 'misCel', 'mdsAre', 'lisCel', 'ldsAre']
    n_to_link = 'sddAre'

    cells = cells.merge(nodes[['nodeID', 'sddAre']], on='nodeID', how='left')

    l = []
    for keys, values in zip(cells.edgeID_keys, cells.edgeID_values):
        l.append((edges.iloc[keys][e_to_link].multiply(values, axis='rows')).sum(axis=0))  # weighted by the proportion
    cells[e_to_link] = pd.DataFrame(l, index=cells.index)
    
    # Reached neighbors and area on 3 topological steps on tessellation
    cells['keep'] = True
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
    w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
    
    # Reached cells in 3 topological steps
    df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
    
    # Reached area in 3 topological steps
    df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]
    
    # Save
    df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."

In [None]:
%%time
inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(measure, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())

In [None]:
client.close()

At this point, all primary morphometric characters are measured and stored in a chunked parquet.

## Convolution

Morphometric variables are an input of cluster analysis, which should result in delineation of spatial signatures. However, primary morphometric characters can't be used directly. We have to understand them in context. For that reason, we introduce a convolution step. Each of the characters above will be expressed as first, second (median) and third quartile within 3 topological steps on enclosed tessellation. Resulting convolutional data will be then used as an input of cluster analysis.

#### Generate weights of 10th order

In [7]:
cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices_10.pq')

def generate_w(chunk_id):
    s = time()
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation', silence_warnings=True)
    w10 = momepy.sw_high(k=10, weights=w)
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_queen_{chunk_id}.npz", w.sparse)
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz", w10.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."

In [None]:
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
    print(generate_w(i))

  0%|          | 0/103 [00:00<?, ?it/s]

Chunk 0 processed sucessfully in 316.37095499038696 seconds.
Chunk 1 processed sucessfully in 491.681759595871 seconds.
Chunk 2 processed sucessfully in 431.6562063694 seconds.
Chunk 3 processed sucessfully in 415.92933177948 seconds.
Chunk 4 processed sucessfully in 703.5903306007385 seconds.
Chunk 5 processed sucessfully in 622.5808844566345 seconds.
Chunk 6 processed sucessfully in 972.744900226593 seconds.
Chunk 7 processed sucessfully in 582.5596327781677 seconds.
Chunk 8 processed sucessfully in 649.1616985797882 seconds.
Chunk 9 processed sucessfully in 489.80875730514526 seconds.
Chunk 10 processed sucessfully in 440.14411187171936 seconds.
Chunk 11 processed sucessfully in 410.46586871147156 seconds.
Chunk 12 processed sucessfully in 723.2832217216492 seconds.
Chunk 13 processed sucessfully in 523.529534816742 seconds.
Chunk 14 processed sucessfully in 608.9691398143768 seconds.
Chunk 15 processed sucessfully in 431.5302448272705 seconds.
Chunk 16 processed sucessfully in 359.

In [20]:
def generate_distance_w(chunk_id):
    s = time()
    # load cells of a chunk
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq", columns=["tessellation"])
    
    # add neighbouring cells from other chunks
    cross_chunk_cells = []
    
    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq", columns=["tessellation"]).iloc[inds]
        cross_chunk_cells.append(add_cells)
    
    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz")).to_W()

    df.geometry = df.centroid
    for i, geom in enumerate(df.geometry):
        neighbours = w.neighbors[i]
        vicinity = df.iloc[neighbours]
        distance = vicinity.distance(geom)
        w.weights[i] = distance.to_list()
    
    scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_distance_{chunk_id}.npz", w.sparse)
    
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."

In [21]:
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
    print(generate_distance_w(i))

  0%|          | 0/103 [00:00<?, ?it/s]

 There are 2 disconnected components.
 There is 1 island with id: 117408.


Chunk 0 processed sucessfully in 148.43301725387573 seconds.


 There are 17 disconnected components.
 There are 7 islands with ids: 129987, 133329, 154196, 164082, 173764, 174296, 178630.


Chunk 1 processed sucessfully in 233.17103719711304 seconds.


 There are 8 disconnected components.
 There are 2 islands with ids: 128344, 155326.


Chunk 2 processed sucessfully in 211.11699414253235 seconds.


 There are 11 disconnected components.
 There are 5 islands with ids: 111576, 115679, 130190, 135459, 144200.


Chunk 3 processed sucessfully in 196.47370409965515 seconds.


 There are 9 disconnected components.
 There are 3 islands with ids: 197896, 219887, 225719.


Chunk 4 processed sucessfully in 325.46193265914917 seconds.


 There are 216 disconnected components.
 There are 162 islands with ids: 234211, 234213, 234707, 234708, 234725, 234783, 234785, 234789, 234796, 234801, 234803, 234810, 234870, 234958, 234965, 234969, 234983, 235019, 235024, 235042, 235045, 235047, 235055, 235058, 235059, 235062, 235063, 235066, 235067, 235071, 235076, 235077, 235080, 235082, 235089, 235094, 235101, 235104, 235106, 235107, 235111, 235120, 235121, 235138, 235142, 235143, 235146, 235148, 235149, 235150, 235152, 235155, 235157, 235158, 235167, 235168, 235169, 235175, 235179, 235188, 235190, 235214, 235216, 235217, 235220, 235221, 235226, 235229, 235230, 235235, 235242, 235250, 235255, 235271, 235285, 235287, 235315, 235321, 235338, 235348, 235362, 235388, 235393, 235399, 235409, 235413, 235425, 235441, 235442, 235443, 235444, 235445, 235446, 235447, 235453, 235455, 235461, 235462, 235464, 235465, 235471, 235472, 235490, 235491, 235493, 235501, 235502, 235503, 235506, 235511, 235516, 235517, 235537, 235541, 235557, 235559,

Chunk 5 processed sucessfully in 292.6534535884857 seconds.


 There are 17 disconnected components.
 There are 7 islands with ids: 261627, 261628, 265375, 270336, 271107, 273452, 325283.


Chunk 6 processed sucessfully in 445.8685245513916 seconds.


 There are 11 disconnected components.
 There are 6 islands with ids: 146067, 157019, 178978, 179921, 196088, 201462.


Chunk 7 processed sucessfully in 266.2336826324463 seconds.


 There are 9 disconnected components.
 There are 5 islands with ids: 149961, 156543, 186429, 210738, 211707.


Chunk 8 processed sucessfully in 282.93740034103394 seconds.


 There are 11 disconnected components.
 There are 6 islands with ids: 131784, 144438, 145139, 148759, 178003, 182317.


Chunk 9 processed sucessfully in 229.047461271286 seconds.


 There are 6 disconnected components.
 There is 1 island with id: 163162.


Chunk 10 processed sucessfully in 210.61869072914124 seconds.


 There are 7 disconnected components.
 There are 2 islands with ids: 119450, 134297.


Chunk 11 processed sucessfully in 195.23993682861328 seconds.


 There are 8 disconnected components.
 There are 4 islands with ids: 209165, 209170, 213280, 230710.


Chunk 12 processed sucessfully in 332.61838126182556 seconds.


 There are 14 disconnected components.
 There are 8 islands with ids: 141991, 141993, 170904, 171323, 172109, 172779, 173220, 189518.


Chunk 13 processed sucessfully in 237.32561421394348 seconds.


 There are 26 disconnected components.
 There are 10 islands with ids: 149934, 149961, 150595, 151291, 173515, 185651, 194258, 195351, 196488, 204577.


Chunk 14 processed sucessfully in 271.8659632205963 seconds.


 There are 13 disconnected components.
 There are 4 islands with ids: 104333, 106066, 135397, 155471.


Chunk 15 processed sucessfully in 202.72529363632202 seconds.


 There are 11 disconnected components.
 There are 4 islands with ids: 120266, 129179, 135636, 150457.


Chunk 16 processed sucessfully in 176.17322421073914 seconds.


 There are 5 disconnected components.
 There are 3 islands with ids: 119002, 124069, 125308.


Chunk 17 processed sucessfully in 184.20350122451782 seconds.


 There are 7 disconnected components.
 There are 3 islands with ids: 105404, 117589, 128371.


Chunk 18 processed sucessfully in 179.24176859855652 seconds.


 There are 5 disconnected components.
 There are 2 islands with ids: 160236, 167180.


Chunk 19 processed sucessfully in 202.80839371681213 seconds.


 There are 16 disconnected components.
 There are 5 islands with ids: 108513, 108636, 108941, 141461, 158970.


Chunk 20 processed sucessfully in 207.90735244750977 seconds.


 There are 21 disconnected components.
 There are 5 islands with ids: 117072, 120055, 120436, 139715, 165715.


Chunk 21 processed sucessfully in 233.11504578590393 seconds.


 There are 8 disconnected components.
 There are 2 islands with ids: 113667, 137661.


Chunk 22 processed sucessfully in 206.59698796272278 seconds.


 There are 8 disconnected components.
 There are 4 islands with ids: 109056, 116572, 129537, 140451.


Chunk 23 processed sucessfully in 220.47490882873535 seconds.


 There are 3 disconnected components.
 There are 2 islands with ids: 159859, 186073.


Chunk 24 processed sucessfully in 243.22397184371948 seconds.


 There are 8 disconnected components.


Chunk 25 processed sucessfully in 190.69647121429443 seconds.


 There are 7 disconnected components.
 There are 3 islands with ids: 157186, 162337, 167339.


Chunk 26 processed sucessfully in 251.88402652740479 seconds.


 There are 6 disconnected components.
 There are 3 islands with ids: 149990, 167520, 187452.


Chunk 27 processed sucessfully in 238.72793292999268 seconds.


 There are 6 disconnected components.
 There are 3 islands with ids: 110532, 119551, 123624.


Chunk 28 processed sucessfully in 189.12012243270874 seconds.


 There are 27 disconnected components.
 There are 17 islands with ids: 123468, 123475, 123524, 123585, 123599, 123631, 123642, 123649, 123718, 123726, 123761, 123782, 123787, 124473, 124492, 152280, 153614.


Chunk 29 processed sucessfully in 199.6555633544922 seconds.


 There are 11 disconnected components.
 There is 1 island with id: 227210.


Chunk 30 processed sucessfully in 311.71409463882446 seconds.


 There are 13 disconnected components.
 There are 11 islands with ids: 133908, 134101, 134218, 134245, 134271, 134318, 134338, 134358, 134372, 134400, 134418.


Chunk 31 processed sucessfully in 174.53720831871033 seconds.


 There are 26 disconnected components.
 There are 21 islands with ids: 209487, 209488, 209489, 209490, 209491, 209494, 209498, 209500, 209503, 209505, 209506, 209517, 209527, 209529, 209530, 209531, 209532, 219045, 231905, 234629, 249854.


Chunk 32 processed sucessfully in 319.5335350036621 seconds.


 There are 37 disconnected components.
 There are 21 islands with ids: 128457, 128461, 128490, 128647, 128667, 129117, 129218, 129233, 129237, 129263, 129267, 129303, 129359, 129392, 129657, 129672, 129685, 129771, 129773, 129775, 138894.


Chunk 33 processed sucessfully in 193.7149510383606 seconds.


 There are 29 disconnected components.
 There are 17 islands with ids: 116984, 117291, 117477, 117513, 117714, 117939, 118012, 118169, 118191, 118194, 118220, 118348, 118382, 121095, 124441, 134933, 155793.


Chunk 34 processed sucessfully in 188.13902521133423 seconds.


 There are 10 disconnected components.
 There are 6 islands with ids: 135378, 140632, 142768, 162965, 162970, 168824.


Chunk 35 processed sucessfully in 238.97440838813782 seconds.


 There are 10 disconnected components.
 There are 6 islands with ids: 107209, 107395, 107535, 108741, 109485, 109490.


Chunk 36 processed sucessfully in 148.72315406799316 seconds.


 There are 18 disconnected components.
 There are 12 islands with ids: 152613, 152677, 152862, 153043, 153044, 153257, 153750, 154010, 154064, 186500, 189194, 190010.


Chunk 37 processed sucessfully in 229.74570775032043 seconds.


 There are 25 disconnected components.
 There are 18 islands with ids: 226545, 226754, 227199, 227209, 227243, 228265, 228287, 228311, 228350, 228351, 228451, 228502, 228638, 232585, 235563, 245268, 248536, 255031.


Chunk 38 processed sucessfully in 331.59811091423035 seconds.


 There are 4 disconnected components.


Chunk 39 processed sucessfully in 231.28053855895996 seconds.


 There are 12 disconnected components.
 There are 6 islands with ids: 244447, 254820, 254871, 261906, 279876, 303924.


Chunk 40 processed sucessfully in 393.5891122817993 seconds.


 There are 11 disconnected components.
 There are 4 islands with ids: 133359, 134960, 149114, 164789.


Chunk 41 processed sucessfully in 227.68449091911316 seconds.


 There are 28 disconnected components.
 There are 22 islands with ids: 149910, 149913, 149930, 149970, 149978, 149984, 150003, 150019, 150353, 150670, 150877, 151109, 152111, 152133, 152141, 152143, 152187, 152261, 152374, 152379, 152806, 152810.


Chunk 42 processed sucessfully in 219.58051896095276 seconds.


 There are 12 disconnected components.
 There are 5 islands with ids: 132612, 133460, 133573, 170278, 170951.


Chunk 43 processed sucessfully in 208.21176767349243 seconds.


 There are 8 disconnected components.
 There are 2 islands with ids: 125145, 131133.


Chunk 44 processed sucessfully in 198.68476033210754 seconds.


 There are 7 disconnected components.
 There are 3 islands with ids: 148136, 162903, 188070.


Chunk 45 processed sucessfully in 229.89559864997864 seconds.


 There are 6 disconnected components.
 There is 1 island with id: 183791.


Chunk 46 processed sucessfully in 249.5990493297577 seconds.


 There are 18 disconnected components.
 There are 11 islands with ids: 111746, 111788, 111880, 111890, 111892, 111910, 111918, 111934, 111966, 112191, 120918.


Chunk 47 processed sucessfully in 175.11259937286377 seconds.


 There are 25 disconnected components.
 There are 15 islands with ids: 124961, 124962, 124967, 124978, 124988, 125005, 125008, 125009, 125014, 125015, 125027, 125107, 125109, 125238, 137148.


Chunk 48 processed sucessfully in 247.36505031585693 seconds.


 There are 11 disconnected components.
 There are 7 islands with ids: 134005, 134039, 134585, 134669, 138792, 153783, 157829.


Chunk 49 processed sucessfully in 233.3647243976593 seconds.


 There are 14 disconnected components.
 There are 7 islands with ids: 152034, 159569, 172767, 172769, 178756, 187119, 196194.


Chunk 50 processed sucessfully in 274.49847769737244 seconds.


 There are 13 disconnected components.
 There are 3 islands with ids: 144989, 150449, 165906.


Chunk 51 processed sucessfully in 217.49440360069275 seconds.


 There are 6 disconnected components.
 There are 4 islands with ids: 122150, 129518, 145800, 146784.


Chunk 52 processed sucessfully in 181.56513929367065 seconds.


 There are 10 disconnected components.
 There are 4 islands with ids: 152864, 161153, 166517, 185820.


Chunk 53 processed sucessfully in 271.16719484329224 seconds.


 There are 13 disconnected components.
 There are 2 islands with ids: 172689, 179046.


Chunk 54 processed sucessfully in 257.327924489975 seconds.


 There are 10 disconnected components.
 There are 6 islands with ids: 137274, 139874, 139997, 141567, 142676, 155879.


Chunk 55 processed sucessfully in 202.45160293579102 seconds.


 There are 6 disconnected components.
 There are 2 islands with ids: 165140, 175433.


Chunk 56 processed sucessfully in 223.3841724395752 seconds.


 There are 15 disconnected components.
 There are 10 islands with ids: 130725, 130729, 130878, 131107, 131134, 131175, 131201, 131260, 145552, 146082.


Chunk 57 processed sucessfully in 158.50740456581116 seconds.


 There are 44 disconnected components.
 There are 23 islands with ids: 118910, 118912, 118923, 118924, 118930, 118970, 118975, 118977, 118982, 118997, 119010, 119081, 119098, 119104, 119121, 119123, 119183, 119256, 119260, 119289, 119513, 122046, 136583.


Chunk 58 processed sucessfully in 198.86790132522583 seconds.


 There are 8 disconnected components.
 There are 5 islands with ids: 130961, 133798, 143597, 176070, 179637.


Chunk 59 processed sucessfully in 223.31676125526428 seconds.


 There are 11 disconnected components.
 There are 4 islands with ids: 124266, 128999, 134552, 149861.


Chunk 60 processed sucessfully in 209.40075492858887 seconds.


 There are 14 disconnected components.
 There are 4 islands with ids: 236333, 280329, 287828, 306669.


Chunk 61 processed sucessfully in 427.454585313797 seconds.


 There are 7 disconnected components.
 There are 2 islands with ids: 142574, 160842.


Chunk 62 processed sucessfully in 253.92991995811462 seconds.


 There are 11 disconnected components.
 There are 6 islands with ids: 120538, 133082, 136761, 157193, 160484, 160717.


Chunk 63 processed sucessfully in 196.85611534118652 seconds.


 There are 50 disconnected components.
 There are 25 islands with ids: 216853, 217415, 217996, 218924, 218965, 219048, 219050, 219181, 219319, 219334, 219338, 219368, 219379, 219439, 219452, 219455, 219512, 219519, 219892, 219907, 219937, 219939, 229633, 231210, 238465.


Chunk 64 processed sucessfully in 354.77829933166504 seconds.


 There are 17 disconnected components.
 There are 6 islands with ids: 156647, 156653, 157156, 157257, 159513, 165602.


Chunk 65 processed sucessfully in 263.1460394859314 seconds.


 There are 10 disconnected components.
 There are 3 islands with ids: 166190, 166954, 195471.


Chunk 66 processed sucessfully in 254.1169135570526 seconds.


 There are 5 disconnected components.
 There are 2 islands with ids: 235233, 252343.


Chunk 67 processed sucessfully in 368.75104236602783 seconds.


 There are 12 disconnected components.
 There are 5 islands with ids: 155088, 155160, 167986, 191013, 196669.


Chunk 68 processed sucessfully in 264.6607723236084 seconds.


 There are 11 disconnected components.
 There are 2 islands with ids: 158286, 159068.


Chunk 69 processed sucessfully in 290.5715596675873 seconds.


 There are 12 disconnected components.
 There are 6 islands with ids: 150978, 166808, 167170, 178361, 178384, 193905.


Chunk 70 processed sucessfully in 234.2130560874939 seconds.


 There are 9 disconnected components.
 There are 5 islands with ids: 129657, 131156, 138104, 147618, 177723.


Chunk 71 processed sucessfully in 253.7894606590271 seconds.


 There are 12 disconnected components.
 There are 6 islands with ids: 130098, 147514, 152653, 156044, 159793, 164178.


Chunk 72 processed sucessfully in 263.50014424324036 seconds.


 There are 30 disconnected components.
 There are 18 islands with ids: 159450, 159452, 159454, 159455, 159457, 159459, 159460, 159461, 159465, 159466, 159467, 159468, 159469, 159471, 200198, 205753, 209350, 210274.


Chunk 73 processed sucessfully in 299.1839327812195 seconds.


 There are 13 disconnected components.
 There are 5 islands with ids: 133214, 142247, 162225, 171526, 171662.


Chunk 74 processed sucessfully in 231.59208178520203 seconds.


 There are 16 disconnected components.
 There are 9 islands with ids: 175128, 184864, 185710, 217674, 221634, 225009, 225316, 229729, 239190.


Chunk 75 processed sucessfully in 304.73678612709045 seconds.


 There are 16 disconnected components.
 There are 8 islands with ids: 149664, 150112, 151215, 162296, 162297, 181738, 183735, 213768.


Chunk 76 processed sucessfully in 276.6728549003601 seconds.


 There are 6 disconnected components.
 There are 5 islands with ids: 137306, 137355, 137363, 137406, 155001.


Chunk 77 processed sucessfully in 220.53749704360962 seconds.


 There are 16 disconnected components.
 There are 9 islands with ids: 109475, 109675, 110994, 111063, 111138, 115290, 122922, 128689, 155812.


Chunk 78 processed sucessfully in 207.42325520515442 seconds.


 There are 8 disconnected components.
 There are 5 islands with ids: 123412, 128408, 149735, 149736, 152636.


Chunk 79 processed sucessfully in 207.57619166374207 seconds.


 There are 10 disconnected components.
 There are 5 islands with ids: 167921, 167922, 183242, 200627, 208682.


Chunk 80 processed sucessfully in 242.84630632400513 seconds.


 There are 11 disconnected components.
 There are 6 islands with ids: 108879, 109619, 109791, 109794, 109795, 109933.


Chunk 81 processed sucessfully in 156.10908246040344 seconds.


 There are 11 disconnected components.
 There are 3 islands with ids: 145108, 159652, 189674.


Chunk 82 processed sucessfully in 237.42833352088928 seconds.


 There are 29 disconnected components.
 There are 16 islands with ids: 142220, 142245, 142256, 142266, 142269, 142369, 142417, 142529, 145065, 145696, 145718, 146889, 147156, 147278, 165073, 168672.


Chunk 83 processed sucessfully in 232.90090823173523 seconds.


 There are 5 disconnected components.
 There are 2 islands with ids: 110833, 111462.


Chunk 84 processed sucessfully in 180.23269033432007 seconds.


 There are 40 disconnected components.
 There are 24 islands with ids: 173357, 173761, 173798, 173880, 173883, 173976, 174174, 174194, 174298, 174305, 177736, 177755, 177771, 177778, 177780, 177796, 177798, 177818, 177823, 177909, 178447, 178513, 178557, 178799.


Chunk 85 processed sucessfully in 237.13694953918457 seconds.


 There are 13 disconnected components.
 There are 9 islands with ids: 143103, 143866, 145455, 150180, 177294, 184733, 191892, 193767, 198540.


Chunk 86 processed sucessfully in 248.50655436515808 seconds.


 There are 721 disconnected components.
 There are 651 islands with ids: 142916, 142931, 143078, 143132, 143133, 143134, 143167, 143168, 143184, 143193, 143197, 143208, 143243, 143380, 143383, 143409, 143411, 143417, 143429, 143436, 143437, 143441, 143471, 143472, 143517, 143526, 143544, 143648, 143663, 143682, 143691, 143695, 143704, 143761, 143771, 143773, 143818, 143832, 143835, 143836, 143842, 143872, 143875, 143891, 143900, 143919, 143928, 143931, 143933, 143939, 143942, 143950, 143952, 143969, 143977, 143981, 144001, 144022, 144046, 144049, 144055, 144056, 144061, 144069, 144076, 144085, 144089, 144127, 144143, 144185, 144225, 144227, 144249, 144250, 144255, 144259, 144262, 144272, 144274, 144295, 144297, 144304, 144314, 144320, 144342, 144352, 144362, 144376, 144389, 144405, 144412, 144524, 144559, 144561, 144572, 144573, 144576, 144589, 144609, 144646, 144648, 144671, 144710, 144731, 144739, 144740, 144741, 144742, 144754, 144806, 144819, 144842, 144987, 145085, 145110, 145112,

Chunk 87 processed sucessfully in 227.92471551895142 seconds.


 There are 11 disconnected components.
 There are 5 islands with ids: 172530, 173713, 174052, 190075, 190076.


Chunk 88 processed sucessfully in 233.84631967544556 seconds.


 There are 68 disconnected components.
 There are 49 islands with ids: 227459, 227460, 227461, 227462, 227463, 227466, 227468, 227484, 227485, 227486, 227488, 227489, 227492, 227493, 227494, 227495, 227497, 227498, 227499, 227500, 227502, 227503, 227504, 227505, 227508, 227510, 227514, 227524, 227527, 227529, 227531, 227533, 227539, 227540, 227541, 227729, 227735, 227785, 227793, 228342, 228991, 229121, 229546, 229549, 229550, 229636, 230115, 231834, 233730.


Chunk 89 processed sucessfully in 284.80965065956116 seconds.


 There are 8 disconnected components.
 There are 2 islands with ids: 150406, 154817.


Chunk 90 processed sucessfully in 211.46636080741882 seconds.


 There are 7 disconnected components.
 There is 1 island with id: 167260.


Chunk 91 processed sucessfully in 233.61219358444214 seconds.


 There are 13 disconnected components.
 There are 5 islands with ids: 178434, 178538, 219117, 238999, 267437.


Chunk 92 processed sucessfully in 364.09398126602173 seconds.


 There are 5 disconnected components.
 There are 2 islands with ids: 125054, 134814.


Chunk 93 processed sucessfully in 220.23269653320312 seconds.


 There are 12 disconnected components.
 There are 3 islands with ids: 139292, 140529, 169298.


Chunk 94 processed sucessfully in 237.6123263835907 seconds.


 There are 7 disconnected components.
 There are 3 islands with ids: 135388, 138476, 146980.


Chunk 95 processed sucessfully in 247.24600315093994 seconds.
Chunk 96 processed sucessfully in 201.34737181663513 seconds.


 There are 315 disconnected components.
 There are 267 islands with ids: 111786, 111840, 111952, 111953, 111959, 111974, 111976, 111979, 111984, 111985, 111991, 111993, 111994, 111996, 111997, 111999, 112002, 112003, 112005, 112009, 112014, 112015, 112016, 112018, 112025, 112027, 112029, 112030, 112031, 112033, 112034, 112049, 112050, 112062, 112087, 112096, 112142, 112155, 112164, 112168, 112226, 112276, 112284, 112308, 112324, 112358, 112378, 112524, 112693, 112696, 112711, 112738, 112745, 112747, 112748, 112752, 112759, 112783, 112899, 112955, 112965, 112982, 112985, 113003, 113012, 113044, 113053, 113062, 113082, 113140, 113143, 113237, 113255, 113284, 113288, 113294, 113308, 113311, 113315, 113320, 113327, 113355, 113367, 113383, 113412, 113452, 113465, 113472, 113483, 113530, 113597, 113613, 113620, 113640, 114008, 114180, 114199, 114212, 114226, 114370, 114403, 114458, 114466, 114513, 114520, 114556, 114618, 114640, 114718, 114762, 114832, 114849, 114853, 114869, 114885, 114887,

Chunk 97 processed sucessfully in 230.589741230011 seconds.


 There are 8 disconnected components.
 There are 3 islands with ids: 192665, 238648, 248981.


Chunk 98 processed sucessfully in 310.0511486530304 seconds.


 There are 46 disconnected components.
 There are 27 islands with ids: 136939, 136946, 136953, 136956, 137450, 137497, 137536, 137538, 137563, 137571, 137573, 138003, 138034, 138044, 138087, 143233, 144070, 146219, 152038, 154368, 156244, 156245, 164971, 166175, 173534, 186205, 186370.


Chunk 99 processed sucessfully in 237.9031059741974 seconds.


 There are 11 disconnected components.
 There are 4 islands with ids: 145018, 159712, 166989, 166990.


Chunk 100 processed sucessfully in 252.1358687877655 seconds.


 There are 11 disconnected components.
 There are 4 islands with ids: 245250, 245382, 273303, 302782.


Chunk 101 processed sucessfully in 385.7198483943939 seconds.


 There are 9 disconnected components.
 There are 5 islands with ids: 139758, 154451, 154461, 164041, 172042.


Chunk 102 processed sucessfully in 235.5640685558319 seconds.


In [None]:
def convolute(chunk_id):
   
    s = time()
    cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
    cells['keep'] = True
    # add neighbouring cells from other chunks
    cross_chunk_cells = []

    for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
        add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
        add_cells['keep'] = False
        cross_chunk_cells.append(add_cells)

    df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)

    # read W
    w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()

    # list characters
    characters = [x for x in df.columns if len(x) == 6]
    
    # prepare dictionary to store results
    convolutions = {}
    for c in characters:
        convolutions[c] = []
        
    # measure convolutions
    for i in range(len(df)):
        neighbours = [i]
        neighbours += w.neighbors[i]

        vicinity = df.iloc[neighbours]

        for c in characters:
            convolutions[c].append(np.nanpercentile(vicinity[c], [25, 50, 75], interpolation='midpoint'))
    
    # save convolutions to parquet file
    conv = pd.DataFrame(convolutions)
    exploded = pd.concat([pd.DataFrame(conv[c].to_list(), columns=[c + '_q1', c + '_q2',c + '_q3']) for c in characters], axis=1)
    exploded[df.keep].to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/convolutions/conv_{chunk_id}.pq")
        
    return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."

In [None]:
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client

In [None]:
%%time
inputs = iter(range(103))
futures = [client.submit(convolute, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
    # submit new future 
    try:
        new_future = client.submit(convolute, next(inputs))
        ac.add(new_future)
    except StopIteration:
        pass
    print(finished_future.result())