Summary
_build_row_csr_numba in xrspatial/rasterize.py allocates row_ptr and diff as np.int32, then computes total = row_ptr[height] and uses it to size col_idx. For very tall rasters with many long edges the cumulative sum overflows int32 (max ~2.15e9), wraps to a negative or wrong-positive value, and the subsequent np.empty(total, dtype=np.int32) either raises a confusing ValueError (negative size) or allocates an undersized buffer that the Pass 2 fill writes past, corrupting memory inside the numba kernel.
Trigger
Worst case is a CSR fan-out where every edge spans most of the raster. With height = 50_000 rows and ~50_000 edges each spanning the full height, total = sum of edges_per_row across rows ~= 2.5e9, which overflows int32. Real polygon rasterizations rarely hit this on realistic inputs, but the upper bound on len(edge_y_min) is now bounded only by the raster guard added in #1223, so a deliberately tall raster with many long polygon edges can reach this regime.
Site
xrspatial/rasterize.py, _build_row_csr_numba:
diff = np.zeros(height + 1, dtype=np.int32)
...
row_ptr = np.empty(height + 1, dtype=np.int32)
row_ptr[0] = 0
running = np.int32(0)
for r in range(height):
running += diff[r]
row_ptr[r + 1] = row_ptr[r] + running
total = row_ptr[height]
col_idx = np.empty(total, dtype=np.int32)
running is forced to int32 by the explicit np.int32(0) cast, and row_ptr storage is int32, so the per-row addition wraps without warning under numba.
Fix
Cast row_ptr, diff, running, and offsets to int64. The CSR offset values index col_idx, so the size domain is what needs to grow; the values stored in col_idx are edge indices and stay int32. Downstream consumers (_scanline_fill_gpu and np.diff(row_ptr).max()) accept int64 without modification.
Audit reference
Flagged as MEDIUM Cat 2 in the rasterize security audit row, deferred from PR #1223 / #1224 (the original HIGH allocation-cap fix).
Summary
_build_row_csr_numbainxrspatial/rasterize.pyallocatesrow_ptranddiffasnp.int32, then computestotal = row_ptr[height]and uses it to sizecol_idx. For very tall rasters with many long edges the cumulative sum overflows int32 (max ~2.15e9), wraps to a negative or wrong-positive value, and the subsequentnp.empty(total, dtype=np.int32)either raises a confusingValueError(negative size) or allocates an undersized buffer that the Pass 2 fill writes past, corrupting memory inside the numba kernel.Trigger
Worst case is a CSR fan-out where every edge spans most of the raster. With
height = 50_000rows and ~50_000 edges each spanning the full height,total = sum of edges_per_row across rows ~= 2.5e9, which overflows int32. Real polygon rasterizations rarely hit this on realistic inputs, but the upper bound onlen(edge_y_min)is now bounded only by the raster guard added in #1223, so a deliberately tall raster with many long polygon edges can reach this regime.Site
xrspatial/rasterize.py,_build_row_csr_numba:runningis forced to int32 by the explicitnp.int32(0)cast, androw_ptrstorage is int32, so the per-row addition wraps without warning under numba.Fix
Cast
row_ptr,diff,running, andoffsetsto int64. The CSR offset values indexcol_idx, so the size domain is what needs to grow; the values stored incol_idxare edge indices and stay int32. Downstream consumers (_scanline_fill_gpuandnp.diff(row_ptr).max()) accept int64 without modification.Audit reference
Flagged as MEDIUM Cat 2 in the rasterize security audit row, deferred from PR #1223 / #1224 (the original HIGH allocation-cap fix).