perf(geotiff): _write_vrt_tiled uses synchronous scheduler, defeating parallel tile writes

## Reason or Problem

`_write_vrt_tiled` in `xrspatial/geotiff/__init__.py` (line 1708) writes a dask-backed DataArray to a directory of tiled GeoTIFFs by building one `dask.delayed` task per tile, then executing them all with:

```python
dask.compute(*delayed_tasks, scheduler='synchronous')
```

The `synchronous` scheduler runs every task one at a time on the current thread. Tile writes are independent (different chunks, different output files, no shared mutable state inside `_write_single_tile`), so the parallelism is left on the table.

Microbench on a single 16-thread machine, 4096x4096 float32 random data, chunks=256, zstd compression, 256 output tiles:

| Scheduler | Wall time |
|-----------|-----------|
| synchronous (current) | 0.49 s |
| threads (monkey-patched) | 0.33 s |

That is a ~33% reduction with zero correctness risk on this path. The gain grows with tile count and with codec cost (zstd level 9 / LERC are higher per-tile CPU).

## Proposal

Switch to the threaded scheduler explicitly:

```python
dask.compute(*delayed_tasks, scheduler='threads')
```

Considerations:
1. `_write_single_tile` opens a fresh file path per tile and never mutates shared Python state, so threading is safe.
2. zstd/zlib/LZW release the GIL during compression, so threading delivers real parallelism on the compression stage.
3. File-write concurrency is bounded by the dask thread pool default (a few threads on a typical box). Local filesystems handle that fine; the OS write-back coalesces.
4. On `dask+cupy` data, each thread will land in `chunk_data.get()` independently. cupy releases the GIL on D2H, so threading is still safe.

If a future caller wants to write to a slow networked filesystem and serialise writes intentionally, they can set `DASK_SCHEDULER=synchronous` in the environment to override.

## Acceptance criteria

- `_write_vrt_tiled` uses `scheduler='threads'`.
- Existing tests in `xrspatial/geotiff/tests/test_vrt_tiled_metadata_1606.py`, `test_polish_1488.py`, and the dask-backed VRT write paths continue to pass.
- Microbench (`to_geotiff(da, 'out.vrt', compression='zstd')` on a 4096x4096 chunks=256 dask DataArray) shows wall-time reduction comparable to the 33% measured.

## Context

Found via deep-sweep performance audit on 2026-05-12. Cat 2 (Dask chunking): synchronous scheduler on an embarrassingly-parallel write loop.

Original code from #1083 / #1085 (May 2025) used `synchronous` without a documented rationale; the comment block above the call does not mention threading. Looks like a defensive default that no longer needs to be defensive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(geotiff): _write_vrt_tiled uses synchronous scheduler, defeating parallel tile writes #1714

Reason or Problem

Proposal

Acceptance criteria

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf(geotiff): _write_vrt_tiled uses synchronous scheduler, defeating parallel tile writes #1714

Description

Reason or Problem

Proposal

Acceptance criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions