# Comparison between Spatial Signatures and GHSL classes

In [34]:
import pandas
import geopandas
import rasterio
import rioxarray
import dask_geopandas
from rasterstats import zonal_stats

## GHSL

Global file downloaded originally from [here](https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_C_GLOBE_R2022A/GHS_BUILT_C_MSZ_GLOBE_R2022A/GHS_BUILT_C_MSZ_E2018_GLOBE_R2022A_54009_10/V1-0/).

In [18]:
ghsl_path = (
    'GHS_BUILT_S_P2030LIN_GLOBE_R2022A_54009_100_V1_0/'
    'GHS_BUILT_S_P2030LIN_GLOBE_R2022A_54009_100_V1_0.tif'
)
ghsl = rasterio.open(ghsl_path)

In [43]:
ghsl = rioxarray.open_rasterio(ghsl_path, chunks=5000)

## Spatial Signatures

For simplicity, we use a simplified file that we have accessible internally:

In [9]:
ss = geopandas.read_parquet(
    'signatures_form_combined_levels_simplified.pq'
).to_crs(ghsl.crs)

The officially released file with all detail (and a larger footprint) is available from the Figshare repository [here](https://figshare.com/articles/dataset/Geographical_Characterisation_of_British_Urban_Form_and_Function_using_the_Spatial_Signatures_Framework/16691575/2?file=36049196).

## Overlap

- Single-core approach

In [25]:
%%time
stats = zonal_stats(ss.head(100), ghsl_path, categorical=True)

CPU times: user 644 ms, sys: 1.23 s, total: 1.88 s
Wall time: 3.34 s


- Multi-core on `dask`

In [29]:
ss_ddf = dask_geopandas.from_geopandas(ss, npartitions=16)

In [31]:
tmp = ss_ddf.map_partitions(
    lambda df: zonal_stats(df, ghsl_path, categorical=True),
    meta=
)

ValueError: Metadata inference failed in `lambda`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
ValueError('width and height must be > 0')

Traceback:
---------
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 182, in raise_on_meta_error
    yield
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6247, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/tmp/ipykernel_147/3647013166.py", line 2, in <lambda>
    lambda df: zonal_stats(df, ghsl_path, categorical=True)
  File "/opt/conda/lib/python3.9/site-packages/rasterstats/main.py", line 31, in zonal_stats
    return list(gen_zonal_stats(*args, **kwargs))
  File "/opt/conda/lib/python3.9/site-packages/rasterstats/main.py", line 163, in gen_zonal_stats
    rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
  File "/opt/conda/lib/python3.9/site-packages/rasterstats/utils.py", line 41, in rasterize_geom
    rv_array = features.rasterize(
  File "/opt/conda/lib/python3.9/site-packages/rasterio/env.py", line 387, in wrapper
    return f(*args, **kwds)
  File "/opt/conda/lib/python3.9/site-packages/rasterio/features.py", line 353, in rasterize
    raise ValueError("width and height must be > 0")


In [23]:
pandas.DataFrame(stats).fillna(0)

Unnamed: 0,0,15,191,63,70,90,140,1090,488,531,776
0,36,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,6,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
2,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0


## Comparison