Reduce CuPy host round-trips and remove redundant copies in reproject

Two small performance items in `xrspatial/reproject/__init__.py`.

## Batch the four CuPy `.get()` calls per chunk

`_reproject_chunk_cupy` (around lines 357-364) does four sequential `.get()` calls to bring `nanmin`/`nanmax` of the row/col pixel arrays back to host:

```python
r_min_val = float(cp.nanmin(src_row_px).get())
if not np.isfinite(r_min_val):
    return cp.full(chunk_shape, nodata, dtype=cp.float64)
r_max_val = float(cp.nanmax(src_row_px).get())
c_min_val = float(cp.nanmin(src_col_px).get())
c_max_val = float(cp.nanmax(src_col_px).get())
```

Each `.get()` is a synchronous device-to-host transfer and stalls the GPU. Stacking the four reductions into a single 4-element CuPy array and pulling that across in one `.get()` cuts the round-trips from four to one per chunk. The finite checks then run on host scalars, which is free.

The same pattern repeats in `_reproject_dask_cupy` around lines 1122-1128.

## Drop redundant `.copy()` after `.astype()`

`numpy.ndarray.astype()` and `cupy.ndarray.astype()` both default to `copy=True`, so they always return a new array. The follow-up `.copy()` in:

- `_reproject_chunk_numpy` multi-band path (line ~290)
- `_reproject_chunk_numpy` single-band path (line ~305)
- `_reproject_chunk_cupy` (line ~443)
- `_reproject_dask_cupy` (line ~1193)

is therefore redundant and can be removed. No correctness change; one fewer array allocation per chunk.

## Impact

For an N-chunk reprojection on GPU the batching saves roughly `3 * N` synchronous device-to-host syncs. The `.copy()` removal saves one window-sized allocation per chunk. Existing parity tests cover correctness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce CuPy host round-trips and remove redundant copies in reproject #1457

Batch the four CuPy `.get()` calls per chunk

Drop redundant `.copy()` after `.astype()`

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce CuPy host round-trips and remove redundant copies in reproject #1457

Description

Batch the four CuPy .get() calls per chunk

Drop redundant .copy() after .astype()

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Batch the four CuPy `.get()` calls per chunk

Drop redundant `.copy()` after `.astype()`