Skip to content

Hilbert Sorting #22

@ivorbosloper

Description

@ivorbosloper

The output data should be Hilbert sorted. We're using the wrong sort in vecorel-cli/conversion/base.py:411::

  gdf.sort_values(“geometry”, inplace=True, ignore_index=True)

That’s a lexicographic compare on the WKB bytes, not a Hilbert curve. WKB byte order has essentially no spatial meaning (the first byte is endianness, the next four are the geometry type code, etc.), so the rows come out in roughly random spatial order.

We can verify with the published datasets:

gpio check https://data.source.coop/fiboa/data/at/at-2025.parquet

GeoParquet Metadata:
✓ Version 1.1.0
ℹ️ GeoParquet 2.0 is available, with native spatial stats and filter pushdown. Run: gpio convert geoparquet input.parquet output.parquet --geoparquet-version 2.0

Spatial Order Analysis:
⚠️ Data may not be optimally spatially ordered
Consider running 'gpio sort hilbert' to improve spatial locality

An open question is; the ST_Hilbert function sorts with respect to a Bounding Box. Should we use the dataset's own bounding box (like geoparquet-IO seems to do) or should we use the world bounds for EPSG:4326. I would prefer the latter, so we can efficiently merge and compare datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions