The output data should be Hilbert sorted. We're using the wrong sort in vecorel-cli/conversion/base.py:411::
gdf.sort_values(“geometry”, inplace=True, ignore_index=True)
That’s a lexicographic compare on the WKB bytes, not a Hilbert curve. WKB byte order has essentially no spatial meaning (the first byte is endianness, the next four are the geometry type code, etc.), so the rows come out in roughly random spatial order.
We can verify with the published datasets:
gpio check https://data.source.coop/fiboa/data/at/at-2025.parquet
GeoParquet Metadata:
✓ Version 1.1.0
ℹ️ GeoParquet 2.0 is available, with native spatial stats and filter pushdown. Run: gpio convert geoparquet input.parquet output.parquet --geoparquet-version 2.0
Spatial Order Analysis:
⚠️ Data may not be optimally spatially ordered
Consider running 'gpio sort hilbert' to improve spatial locality
An open question is; the ST_Hilbert function sorts with respect to a Bounding Box. Should we use the dataset's own bounding box (like geoparquet-IO seems to do) or should we use the world bounds for EPSG:4326. I would prefer the latter, so we can efficiently merge and compare datasets.
The output data should be Hilbert sorted. We're using the wrong sort in vecorel-cli/conversion/base.py:411::
gdf.sort_values(“geometry”, inplace=True, ignore_index=True)
That’s a lexicographic compare on the WKB bytes, not a Hilbert curve. WKB byte order has essentially no spatial meaning (the first byte is endianness, the next four are the geometry type code, etc.), so the rows come out in roughly random spatial order.
We can verify with the published datasets:
An open question is; the ST_Hilbert function sorts with respect to a Bounding Box. Should we use the dataset's own bounding box (like geoparquet-IO seems to do) or should we use the world bounds for EPSG:4326. I would prefer the latter, so we can efficiently merge and compare datasets.