Fix perlin/terrain dask backends: enable parallelism and out-of-core support by brendancol · Pull Request #870 · xarray-contrib/xarray-spatial

brendancol · 2026-02-24T16:39:45Z

Summary

da.linspace without chunks= in both perlin.py and terrain.py produced single-chunk coordinate arrays, so da.map_blocks processed everything as one block — no actual parallelism
Diamond dependency in normalization ((data - da.min(data)) / da.ptp(data)) forced the scheduler to hold all source blocks in memory simultaneously, making the dask path OOM on larger-than-memory inputs
terrain.py used np.min/np.ptp instead of da.min/da.ptp, relying on fragile __array_function__ dispatch, and used setitem (data[data < 0.3] = 0) which isn't dask-native

Fixes

Pass chunks= to da.linspace matching the input data's chunk structure so da.map_blocks distributes work across blocks
Compute reductions in a separate pass via dask.compute(da.min(data), da.ptp(data)), then normalize with concrete scalars — breaks the diamond dependency so blocks can be processed and released independently
Replace np.min/np.ptp with explicit da.min/da.ptp and data[data < 0.3] = 0 with da.where

Fixes #869

Test plan

test_perlin_cpu, test_perlin_dask_cpu, test_perlin_gpu pass
test_terrain_cpu, test_terrain_dask_cpu, test_terrain_gpu pass
Verified chunked input produces chunked output (e.g. 2x4 block input → 2x4 block output)

Pass chunks= to da.linspace so coordinate arrays match the input data's chunk structure. Without this, da.linspace created single-chunk arrays, making da.map_blocks process everything as one block. Fixes #869

Same issue as perlin(): da.linspace was called without chunks=, producing single-chunk coordinate arrays and negating any parallelism from da.map_blocks. Refs #869

The normalization `(data - da.min(data)) / da.ptp(data)` creates a diamond dependency in the task graph: every source block feeds both the global reduction and the final elementwise op. The scheduler cannot release any block until both paths complete, so all blocks must be in memory simultaneously — OOM for large-than-memory inputs. Fix by computing reductions in a separate pass via dask.compute(), producing concrete scalars before building the elementwise graph. Each block can then be processed and released independently. Also in terrain.py: - Replace np.min/np.ptp with da.min/da.ptp (explicit dask ops instead of relying on __array_function__ dispatch) - Replace data[data < 0.3] = 0 with da.where (dask-native) Refs #869

brendancol added 3 commits February 24, 2026 08:28

Fix perlin() dask backend to actually parallelize

2fb54ec

Pass chunks= to da.linspace so coordinate arrays match the input data's chunk structure. Without this, da.linspace created single-chunk arrays, making da.map_blocks process everything as one block. Fixes #869

Fix terrain() dask backend to actually parallelize

22b5907

Same issue as perlin(): da.linspace was called without chunks=, producing single-chunk coordinate arrays and negating any parallelism from da.map_blocks. Refs #869

brendancol mentioned this pull request Feb 24, 2026

perlin() dask backend does not actually parallelize — coordinate arrays are single-chunk #869

Closed

brendancol merged commit 1f3a7cd into master Feb 24, 2026
10 checks passed

brendancol mentioned this pull request Feb 24, 2026

Release v0.6.0 #889

Merged

brendancol deleted the fix/perlin-dask-chunking branch February 25, 2026 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix perlin/terrain dask backends: enable parallelism and out-of-core support#870

Fix perlin/terrain dask backends: enable parallelism and out-of-core support#870
brendancol merged 3 commits intomasterfrom
fix/perlin-dask-chunking

brendancol commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Feb 24, 2026

Summary

Fixes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant