Add dask-native out-of-core reproject and merge module#1031
Merged
brendancol merged 5 commits intomasterfrom Mar 19, 2026
Merged
Add dask-native out-of-core reproject and merge module#1031brendancol merged 5 commits intomasterfrom
brendancol merged 5 commits intomasterfrom
Conversation
Adds xrspatial.reproject with CRS reprojection and multi-raster merge, supporting all 4 backends (numpy, cupy, dask+numpy, dask+cupy). Uses an approximate transform (bilinear interpolation on a coarse pyproj control grid) and numba-JIT resampling kernels (nearest, bilinear, cubic). Faster than rioxarray/GDAL for bilinear (1.8-2.9x) and cubic (2.2-5.8x) on single-chunk arrays, with no compiled C dependencies beyond numba. pyproj is an optional dependency (lazy import, clear error if missing). New files: - xrspatial/reproject/ (6 modules) - xrspatial/tests/test_reproject.py (53 tests) - benchmarks/REPROJECT_BENCHMARKS.md - examples/user_guide/34_Reproject.ipynb Modified files: - setup.cfg (pyproj extras) - xrspatial/__init__.py (export reproject, merge) - xrspatial/accessor.py (.xrs.reproject method)
Fixes: - 1x1 raster crash: control grid with zero span caused division by zero in the numba interpolation kernel. Guard with max(height-1, 1). - Antimeridian blow-up: rasters near +/-180 longitude produced output grids spanning the full globe. Clamp geographic boundary samples to +/-179.99 and fall back to IQR-based bounds when the transformed extent exceeds 50x the source extent. - Sentinel nodata: non-NaN nodata values (e.g., -9999) are now converted to NaN before resampling so the numba kernels propagate them correctly. Adds 14 edge case tests covering 1x1/2x2 rasters, antimeridian east/west, arctic/antarctic, polar stereographic, south-up orientation, UTM roundtrip, all-NaN input, sentinel nodata, merge with gap, CONUS to Albers, and extreme aspect ratios.
Adds three sections and six functions that were exported from xrspatial but not listed in the README: - Reproject / Merge: reproject, merge (new section) - Flood: vegetation_roughness, vegetation_curve_number, flood_depth_vegetation (3 functions added to existing section) - Dasymetric: disaggregate, pycnophylactic (new section)
numba @njit(parallel=True) with prange crashes when called from inside dask's thread pool (nested TBB/OpenMP threading causes SIGABRT on CI across all platforms). The rest of the codebase uses @ngjit (nogil=True) without parallel for the same reason. Switch all reproject numba kernels from parallel=True to nogil=True. Sequential numba is still faster than scipy for bilinear/cubic (the workloads are memory-bound, not compute-bound), and dask provides chunk-level parallelism anyway. Updated REPROJECT_BENCHMARKS.md with new timings. Performance is consistent: xrspatial is still 2-2.7x faster than rioxarray/GDAL for bilinear and cubic at all sizes tested.
xarray's plot.imshow sets aspect ratio based on coordinate extent.
When mixing degree-scale (EPSG:4326) and meter-scale (EPSG:3857)
panels in the same figure, the meter-scale panels get squeezed to
near-zero width. Fix by calling set_aspect('auto') on all subplot
axes so each panel fills its allocated space regardless of coordinate
scale.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xrspatial.reproject()andxrspatial.merge()for CRS reprojection and multi-raster mosaickingPerformance vs rioxarray (GDAL warp)
Benchmarked on synthetic terrain, EPSG:4326 to EPSG:3857:
Full results in
benchmarks/REPROJECT_BENCHMARKS.md.Consistency vs rioxarray
All 18 comparisons have correlation > 0.999 when sampled at matching geographic coordinates. Identity reprojections (4326 to 4326) produce bit-identical results (correlation = 1.000000).
What's included
New files (10):
xrspatial/reproject/-- 6 modules (__init__.py,_transform.py,_interpolate.py,_grid.py,_merge.py,_crs_utils.py)xrspatial/tests/test_reproject.py-- 53 tests covering CRS detection, approximate transform accuracy, resampling, grid computation, merge strategies, end-to-end reproject/merge for all backends, accessor integration, integer rastersbenchmarks/REPROJECT_BENCHMARKS.mdexamples/user_guide/34_Reproject.ipynbwith preview imageModified files (3):
setup.cfg--reproject = pyprojextra, pyproj added to test depsxrspatial/__init__.py-- exportsreproject,mergexrspatial/accessor.py-- adds.xrs.reproject()methodDesign notes
map_overlapormap_blocksbecause the output grid has entirely different geometry from the input. Usesdask.delayed+da.blocktile assembly (same pattern asrasterize.py).Test plan
pytest xrspatial/tests/test_reproject.py)jupyter nbconvert --execute)