Skip to content

Add dask-native out-of-core reproject and merge module#1031

Merged
brendancol merged 5 commits intomasterfrom
reproject-module
Mar 19, 2026
Merged

Add dask-native out-of-core reproject and merge module#1031
brendancol merged 5 commits intomasterfrom
reproject-module

Conversation

@brendancol
Copy link
Contributor

Summary

  • Adds xrspatial.reproject() and xrspatial.merge() for CRS reprojection and multi-raster mosaicking
  • Supports all 4 backends: numpy, cupy, dask+numpy, dask+cupy
  • Uses an approximate transform (bilinear interp on a coarse pyproj control grid) paired with numba-JIT resampling kernels for nearest, bilinear, and cubic
  • pyproj is an optional dependency with a lazy import and clear error message

Performance vs rioxarray (GDAL warp)

Benchmarked on synthetic terrain, EPSG:4326 to EPSG:3857:

Resampling 256x256 512x512 1024x1024 2048x2048
nearest xrs 1.6x faster xrs 1.4x ~tied rio 1.3x
bilinear xrs 2.0x faster xrs 2.7x xrs 2.5x xrs 1.8x
cubic xrs 2.7x faster xrs 4.5x xrs 5.7x xrs 2.2x

Full results in benchmarks/REPROJECT_BENCHMARKS.md.

Consistency vs rioxarray

All 18 comparisons have correlation > 0.999 when sampled at matching geographic coordinates. Identity reprojections (4326 to 4326) produce bit-identical results (correlation = 1.000000).

What's included

New files (10):

  • xrspatial/reproject/ -- 6 modules (__init__.py, _transform.py, _interpolate.py, _grid.py, _merge.py, _crs_utils.py)
  • xrspatial/tests/test_reproject.py -- 53 tests covering CRS detection, approximate transform accuracy, resampling, grid computation, merge strategies, end-to-end reproject/merge for all backends, accessor integration, integer rasters
  • benchmarks/REPROJECT_BENCHMARKS.md
  • examples/user_guide/34_Reproject.ipynb with preview image

Modified files (3):

  • setup.cfg -- reproject = pyproj extra, pyproj added to test deps
  • xrspatial/__init__.py -- exports reproject, merge
  • xrspatial/accessor.py -- adds .xrs.reproject() method

Design notes

  • Cannot use map_overlap or map_blocks because the output grid has entirely different geometry from the input. Uses dask.delayed + da.block tile assembly (same pattern as rasterize.py).
  • CRS objects are serialized to WKT before passing into delayed functions (pyproj.Transformer is not picklable).
  • The approximate transform evaluates pyproj on a 17x17 control grid per chunk, then uses a numba parallel bilinear interpolator for all output pixels. About 45x faster than scipy's RegularGridInterpolator for this workload.
  • Cubic resampling uses Catmull-Rom (same as GDAL), not B-spline. Handles NaN inline in a single pass rather than the two-pass approach scipy requires.

Test plan

  • 53 unit tests pass (pytest xrspatial/tests/test_reproject.py)
  • Backend parity: 144 cross-backend comparisons all PASS (numpy vs dask+numpy vs cupy vs dask+cupy)
  • Consistency vs rioxarray: all correlations > 0.999
  • Notebook executes cleanly (jupyter nbconvert --execute)
  • CI

Adds xrspatial.reproject with CRS reprojection and multi-raster merge,
supporting all 4 backends (numpy, cupy, dask+numpy, dask+cupy).

Uses an approximate transform (bilinear interpolation on a coarse pyproj
control grid) and numba-JIT resampling kernels (nearest, bilinear, cubic).
Faster than rioxarray/GDAL for bilinear (1.8-2.9x) and cubic (2.2-5.8x)
on single-chunk arrays, with no compiled C dependencies beyond numba.

pyproj is an optional dependency (lazy import, clear error if missing).

New files:
- xrspatial/reproject/ (6 modules)
- xrspatial/tests/test_reproject.py (53 tests)
- benchmarks/REPROJECT_BENCHMARKS.md
- examples/user_guide/34_Reproject.ipynb

Modified files:
- setup.cfg (pyproj extras)
- xrspatial/__init__.py (export reproject, merge)
- xrspatial/accessor.py (.xrs.reproject method)
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 19, 2026
Fixes:
- 1x1 raster crash: control grid with zero span caused division by zero
  in the numba interpolation kernel. Guard with max(height-1, 1).
- Antimeridian blow-up: rasters near +/-180 longitude produced output
  grids spanning the full globe. Clamp geographic boundary samples to
  +/-179.99 and fall back to IQR-based bounds when the transformed
  extent exceeds 50x the source extent.
- Sentinel nodata: non-NaN nodata values (e.g., -9999) are now converted
  to NaN before resampling so the numba kernels propagate them correctly.

Adds 14 edge case tests covering 1x1/2x2 rasters, antimeridian east/west,
arctic/antarctic, polar stereographic, south-up orientation, UTM roundtrip,
all-NaN input, sentinel nodata, merge with gap, CONUS to Albers, and
extreme aspect ratios.
Adds three sections and six functions that were exported from
xrspatial but not listed in the README:

- Reproject / Merge: reproject, merge (new section)
- Flood: vegetation_roughness, vegetation_curve_number,
  flood_depth_vegetation (3 functions added to existing section)
- Dasymetric: disaggregate, pycnophylactic (new section)
numba @njit(parallel=True) with prange crashes when called from inside
dask's thread pool (nested TBB/OpenMP threading causes SIGABRT on CI
across all platforms). The rest of the codebase uses @ngjit (nogil=True)
without parallel for the same reason.

Switch all reproject numba kernels from parallel=True to nogil=True.
Sequential numba is still faster than scipy for bilinear/cubic (the
workloads are memory-bound, not compute-bound), and dask provides
chunk-level parallelism anyway.

Updated REPROJECT_BENCHMARKS.md with new timings. Performance is
consistent: xrspatial is still 2-2.7x faster than rioxarray/GDAL for
bilinear and cubic at all sizes tested.
xarray's plot.imshow sets aspect ratio based on coordinate extent.
When mixing degree-scale (EPSG:4326) and meter-scale (EPSG:3857)
panels in the same figure, the meter-scale panels get squeezed to
near-zero width. Fix by calling set_aspect('auto') on all subplot
axes so each panel fills its allocated space regardless of coordinate
scale.
@brendancol brendancol merged commit 820c0aa into master Mar 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant