Hilbert Sorting

The output data should be Hilbert sorted. We're using the wrong sort in [vecorel-cli/conversion/base.py:411:](https://github.com/vecorel/cli/blob/main/vecorel_cli/conversion/base.py#L411):

  gdf.sort_values(“geometry”, inplace=True, ignore_index=True)

That’s a lexicographic compare on the WKB bytes, not a Hilbert curve. WKB byte order has essentially no spatial meaning (the first byte is endianness, the next four are the geometry type code, etc.), so the rows come out in roughly random spatial order.

We can verify with the published datasets:

> `gpio check https://data.source.coop/fiboa/data/at/at-2025.parquet`
> 
> GeoParquet Metadata:
> ✓ Version 1.1.0
> ℹ️  GeoParquet 2.0 is available, with native spatial stats and filter pushdown. Run: gpio convert geoparquet input.parquet output.parquet --geoparquet-version 2.0
> 
> Spatial Order Analysis:
> ⚠️  Data may not be optimally spatially ordered
> Consider running 'gpio sort hilbert' to improve spatial locality

An open question is; the ST_Hilbert function sorts with respect to a Bounding Box. Should we use the dataset's own bounding box (like geoparquet-IO seems to do) or should we use the world bounds for EPSG:4326. I would prefer the latter, so we can efficiently merge and compare datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hilbert Sorting #22

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Hilbert Sorting #22

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions