Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
1878c9f
Add lightweight GeoTIFF/COG reader and writer
brendancol Mar 20, 2026
3710354
Add multi-band write, integer nodata, PackBits, dask reads, BigTIFF w…
brendancol Mar 20, 2026
576c7d2
Skip unneeded strips in windowed reads
brendancol Mar 20, 2026
1421cae
Add ZSTD compression support (tag 50000)
brendancol Mar 20, 2026
e898b0d
Handle planar configuration (separate band planes) on read
brendancol Mar 20, 2026
171c95e
Handle sub-byte bit depths: 1-bit, 2-bit, 4-bit, 12-bit
brendancol Mar 20, 2026
0161d37
Add palette/indexed-color TIFF support with automatic colormap
brendancol Mar 20, 2026
9cf43ab
Move palette plot to da.xrs.plot() accessor
brendancol Mar 20, 2026
75737ad
Thread-safe reads via reference-counted mmap cache
brendancol Mar 20, 2026
f90791f
Atomic writes via temp file + os.replace
brendancol Mar 20, 2026
61178c3
Add overview resampling options: nearest, min, max, median, mode, cubic
brendancol Mar 20, 2026
77e8bfb
Read and write resolution/DPI tags (282, 283, 296)
brendancol Mar 20, 2026
45dfb02
Expose full GeoKey metadata: CRS names, units, datum, ellipsoid, vert…
brendancol Mar 20, 2026
6601bcf
Reuse HTTP connections via urllib3 pool for COG range requests
brendancol Mar 20, 2026
cfaf93d
Add WKT/PROJ CRS support via pyproj
brendancol Mar 20, 2026
7cc65b2
Preserve GDALMetadata XML (tag 42112) through read/write
brendancol Mar 20, 2026
cc77511
Preserve arbitrary TIFF tags through read/write round-trip
brendancol Mar 20, 2026
a7df688
Fix BigTIFF auto-detection and add bigtiff= parameter
brendancol Mar 20, 2026
ed1e40f
Handle big-endian pixel data correctly on read
brendancol Mar 20, 2026
cf3183d
Add cloud storage support via fsspec (S3, GCS, Azure)
brendancol Mar 20, 2026
af14140
Add VRT (Virtual Raster Table) reader
brendancol Mar 20, 2026
4a3791c
Fix 8 remaining gaps for production readiness
brendancol Mar 20, 2026
1caf519
Replace rioxarray with xrspatial.geotiff in examples
brendancol Mar 20, 2026
f6b374e
Add matplotlib and zstandard as core dependencies
brendancol Mar 20, 2026
d69d34f
Add GPU-accelerated TIFF reader via Numba CUDA
brendancol Mar 20, 2026
95c2a48
Add CUDA inflate (deflate decompression) kernel
brendancol Mar 20, 2026
25c0d84
Add nvCOMP batch decompression fast path for GPU reads
brendancol Mar 20, 2026
53c63e3
Fix nvCOMP ctypes binding: ZSTD batch decompress working
brendancol Mar 20, 2026
1553d03
Add KvikIO GDS (GPUDirect Storage) path for GPU reads
brendancol Mar 20, 2026
339581f
Fix KvikIO GDS error handling and ZSTD GPU fallback
brendancol Mar 20, 2026
26b6404
Fix nvCOMP deflate: use CUDA backend (backend=2) instead of DEFAULT
brendancol Mar 20, 2026
7ad20fe
Update README with GeoTIFF I/O feature matrix and GPU benchmarks
brendancol Mar 20, 2026
eee2245
Reorder README feature matrix by GIS workflow frequency
brendancol Mar 20, 2026
ce64901
Move Reproject to #2 and Utilities to #3 in feature matrix
brendancol Mar 20, 2026
b1ed372
Add GPU-accelerated GeoTIFF write via nvCOMP batch compress
brendancol Mar 20, 2026
9cca00b
Update README benchmarks and enable all backend write support
brendancol Mar 20, 2026
4c53027
Enable Dask+CuPy for GPU read and write
brendancol Mar 20, 2026
230573c
Unified API: read_geotiff/write_geotiff auto-dispatch CPU/GPU/Dask
brendancol Mar 20, 2026
72b580a
Update README: unified API with all 5 backends in feature matrix
brendancol Mar 20, 2026
fd22dc9
Pass chunks= and gpu= through open_cog to read_geotiff
brendancol Mar 20, 2026
3ffd82a
Deprecate open_cog -- read_geotiff handles all sources
brendancol Mar 20, 2026
66fc110
Simplify public API to 3 functions
brendancol Mar 20, 2026
e8448c8
Add JPEG 2000 codec with optional nvJPEG2000 GPU acceleration (#1049)
brendancol Mar 22, 2026
66bb549
Rename GeoTIFF API to xarray conventions (#1047) (#1056)
brendancol Mar 23, 2026
8fca78a
Numba/CUDA projection kernels for reproject, README update (#1046)
brendancol Mar 23, 2026
8cfe62c
added datum transformation support
brendancol Mar 23, 2026
8895419
Merge origin/master into geotiff-reader-writer
brendancol Mar 23, 2026
6f31a11
Fix test_unsupported_compression: use bzip2 instead of jpeg
brendancol Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions .claude/commands/dask-notebook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Dask ETL Notebook

Create a Jupyter notebook that sets up a Dask distributed LocalCluster and walks
through an ETL (Extract, Transform, Load) workflow. The prompt is: $ARGUMENTS

Use the prompt to determine the data domain, transformations, and output format.
If no prompt is given, use a geospatial raster ETL as the default domain
(consistent with the xarray-spatial project).

---

## Notebook structure

Every Dask ETL notebook follows this cell sequence:

```
0 [markdown] # Title + one-line description of the pipeline
1 [markdown] ### Overview (what the pipeline does, what you'll learn)
2 [markdown] One-liner about the imports
3 [code ] Imports
4 [markdown] ## Cluster Setup
5 [code ] Create and inspect a dask.distributed LocalCluster + Client
6 [markdown] Brief note on the dashboard URL and how to read it
7 [markdown] ## Extract
8 [code ] Load or generate source data as lazy Dask arrays
9 [markdown] Describe the raw data: shape, dtype, chunk layout
10 [code ] Inspect / visualize a sample of the raw data
11 [markdown] ## Transform
12 [code ] Apply transformations (filtering, rechunking, computation)
13 [markdown] Explain what the transform does and why it benefits from Dask
14 [code ] (Optional) Additional transform step(s)
15 [markdown] ## Load
16 [code ] Write results to disk (Zarr, Parquet, GeoTIFF, etc.)
17 [markdown] Confirm output and show summary statistics
18 [code ] Read back and verify the output
19 [markdown] ## Cleanup
20 [code ] Close the client and cluster
21 [markdown] ### Summary + next steps
```

Sections can be repeated or extended when the prompt calls for more transform
steps. The core requirement is that every notebook has all five phases: Cluster
Setup, Extract, Transform, Load, Cleanup.

---

## Cluster Setup cell

Always use this pattern for the cluster:

```python
from dask.distributed import Client, LocalCluster

cluster = LocalCluster(
n_workers=4,
threads_per_worker=2,
memory_limit="2GB",
)
client = Client(cluster)
client
```

Include a markdown cell after the cluster cell noting:
- The dashboard link (usually `http://localhost:8787/status`)
- That `n_workers` and `memory_limit` should be tuned for the machine

If the prompt asks for a specific cluster configuration (GPU workers, adaptive
scaling, remote scheduler), adjust accordingly but keep the default simple.

---

## Code conventions

### Imports

Standard import block for a Dask ETL notebook:

```python
import numpy as np
import xarray as xr
import dask
import dask.array as da
from dask.distributed import Client, LocalCluster
```

Add extras only when needed (e.g. `import pandas as pd`, `import rioxarray`,
`from xrspatial import slope`). Keep the import cell minimal.

### Dask best practices to demonstrate

- **Lazy by default**: build the computation graph before calling `.compute()`.
Show the repr of a lazy array at least once so the reader sees the task graph.
- **Chunking**: explain chunk choices. Use `dask.array.from_array(..., chunks=)`
or `xr.open_dataset(..., chunks={})` depending on the source.
- **Avoid full materialization mid-pipeline**: no `.values` or `.compute()` until
the Load phase unless there is a good reason (and if so, explain why).
- **Persist when reused**: if an intermediate result is used in multiple
downstream steps, call `client.persist(result)` and explain why.
- **Progress feedback**: use `dask.diagnostics.ProgressBar` or point the reader
to the dashboard.

### Data handling

- Generate or load data lazily. For synthetic data, use `dask.array.random` or
wrap numpy arrays with `da.from_array(..., chunks=...)`.
- For file-based sources, prefer `xr.open_dataset` / `xr.open_mfdataset` with
explicit `chunks=` to get lazy Dask-backed arrays.
- For the Load phase, prefer Zarr (`to_zarr()`) as the default output format
since it supports parallel writes natively. Mention Parquet or GeoTIFF as
alternatives when relevant.

### Cleanup

Always close the client and cluster at the end:

```python
client.close()
cluster.close()
```

---

## Writing rules

1. **Run all markdown cells and code comments through `/humanizer`.**
2. Never use em dashes.
3. Short and direct. Technical but not sterile.
4. Title cell (h1): describe the pipeline, e.g.
`Dask ETL: Raster Slope Analysis at Scale` or
`Dask ETL: Aggregating Sensor Readings to Parquet`.
5. Overview cell: 2-3 sentences on what the pipeline does and what Dask concepts
the reader will pick up. No hype.
6. Each phase (Extract, Transform, Load) gets a brief markdown intro (2-4
sentences) explaining what happens and why.
7. Use inline comments in code cells sparingly. Let the markdown cells carry the
explanation.

---

## Checklist

When creating the notebook:

1. Pick a data domain from the prompt (or default to geospatial raster).
2. Write the full cell sequence following the structure above.
3. Verify all code cells are syntactically correct and self-contained.
4. Run all markdown through `/humanizer`.
5. Ensure the notebook cleans up after itself (cluster closed, temp files noted).
102 changes: 80 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,15 @@

:fast_forward: Scalable with [Dask](http://dask.pydata.org)

:desktop_computer: GPU-accelerated with [CuPy](https://cupy.dev/) and [Numba CUDA](https://numba.readthedocs.io/en/stable/cuda/index.html)

:confetti_ball: Free of GDAL / GEOS Dependencies

:earth_africa: General-Purpose Spatial Processing, Geared Towards GIS Professionals

-------

Xarray-Spatial implements common raster analysis functions using Numba and provides an easy-to-install, easy-to-extend codebase for raster analysis.
Xarray-Spatial is a Python library for raster analysis built on xarray. It has 100+ functions for surface analysis, hydrology (D8, D-infinity, MFD), fire behavior, flood modeling, multispectral indices, proximity, classification, pathfinding, and interpolation. Functions dispatch automatically across four backends (NumPy, Dask, CuPy, Dask+CuPy). A built-in GeoTIFF/COG reader and writer handles raster I/O without GDAL.

### Installation
```bash
Expand Down Expand Up @@ -119,9 +121,9 @@ In all the above, the command will download and store the files into your curren

`xarray-spatial` grew out of the [Datashader project](https://datashader.org/), which provides fast rasterization of vector data (points, lines, polygons, meshes, and rasters) for use with xarray-spatial.

`xarray-spatial` does not depend on GDAL / GEOS, which makes it fully extensible in Python but does limit the breadth of operations that can be covered. xarray-spatial is meant to include the core raster-analysis functions needed for GIS developers / analysts, implemented independently of the non-Python geo stack.
`xarray-spatial` does not depend on GDAL or GEOS. Raster I/O, reprojection, compression codecs, and coordinate handling are all pure Python and Numba -- no C/C++ bindings anywhere in the stack.

Our documentation is still under construction, but [docs can be found here](https://xarray-spatial.readthedocs.io/en/latest/).
[API reference docs](https://xarray-spatial.readthedocs.io/en/latest/) and [33+ user guide notebooks](examples/user_guide/) cover every module.

#### Raster-huh?

Expand All @@ -132,7 +134,7 @@ In the GIS world, rasters are used for representing continuous phenomena (e.g. e
#### Supported Spatial Functions with Supported Inputs
✅ = native backend    🔄 = accepted (CPU fallback)

[Classification](#classification) · [Diffusion](#diffusion) · [Focal](#focal) · [Morphological](#morphological) · [Fire](#fire) · [Multispectral](#multispectral) · [Multivariate](#multivariate) · [Pathfinding](#pathfinding) · [Proximity](#proximity) · [Reproject / Merge](#reproject--merge) · [Raster / Vector Conversion](#raster--vector-conversion) · [Surface](#surface) · [Hydrology](#hydrology) · [Flood](#flood) · [Interpolation](#interpolation) · [Dasymetric](#dasymetric) · [Zonal](#zonal) · [Utilities](#utilities)
[Classification](#classification) · [Diffusion](#diffusion) · [Focal](#focal) · [Morphological](#morphological) · [Fire](#fire) · [Multispectral](#multispectral) · [Multivariate](#multivariate) · [MCDA](#multi-criteria-decision-analysis-mcda) · [Pathfinding](#pathfinding) · [Proximity](#proximity) · [Reproject / Merge](#reproject--merge) · [Raster / Vector Conversion](#raster--vector-conversion) · [Surface](#surface) · [Hydrology](#hydrology) · [Flood](#flood) · [Interpolation](#interpolation) · [Dasymetric](#dasymetric) · [Zonal](#zonal) · [Utilities](#utilities)

-------
### **GeoTIFF / COG I/O**
Expand All @@ -148,6 +150,8 @@ Native GeoTIFF and Cloud Optimized GeoTIFF reader/writer. No GDAL required.
`open_geotiff` and `to_geotiff` auto-dispatch to the correct backend:

```python
from xrspatial.geotiff import open_geotiff, to_geotiff

open_geotiff('dem.tif') # NumPy
open_geotiff('dem.tif', chunks=512) # Dask
open_geotiff('dem.tif', gpu=True) # CuPy (nvCOMP + GDS)
Expand All @@ -166,9 +170,9 @@ da.xrs.to_geotiff('out.tif', compression='lzw') # write from DataArray
ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset extent
```

**Compression codecs:** Deflate, LZW (Numba JIT), ZSTD, PackBits, JPEG (Pillow), uncompressed
**Compression codecs:** Deflate, LZW (Numba JIT), ZSTD, PackBits, JPEG (Pillow), JPEG 2000 (glymur), uncompressed

**GPU codecs:** Deflate and ZSTD via nvCOMP; LZW via Numba CUDA; JPEG via nvJPEG
**GPU codecs:** Deflate and ZSTD via nvCOMP batch API; JPEG 2000 via nvJPEG2000; LZW via Numba CUDA kernels

**Features:**
- Tiled, stripped, BigTIFF, multi-band (RGB/RGBA), sub-byte (1/2/4/12-bit)
Expand Down Expand Up @@ -209,17 +213,70 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset
| 4096x4096 | deflate | 1.68s | 447ms | **302ms** |
| 8192x8192 | deflate | 6.84s | 2.03s | **1.11s** |
| 8192x8192 | zstd | 847ms | 822ms | 1.03s |

**Consistency:** 100% pixel-exact match vs rioxarray on all tested files (Landsat 8, Copernicus DEM, USGS 1-arc-second, USGS 1-meter).

-----------
### **Reproject / Merge**

| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
|:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:|
| [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS using an approximate transform and numba JIT resampling | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS with Numba JIT / CUDA coordinate transforms and resampling. Supports vertical datums (EGM96, EGM2008) and horizontal datum shifts (NAD27, OSGB36, etc.) | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Merge](xrspatial/reproject/__init__.py) | Merges multiple rasters into a single mosaic with configurable overlap strategy | Standard (mosaic) | ✅️ | ✅️ | 🔄 | 🔄 |

Built-in Numba JIT and CUDA projection kernels bypass pyproj for per-pixel coordinate transforms. pyproj is used only for CRS metadata parsing (~1ms, once per call) and output grid boundary estimation (~500 control points, once per call). Any CRS pair without a built-in kernel falls back to pyproj automatically.

| Projection | EPSG examples | CPU Numba | CUDA GPU |
|:-----------|:-------------|:---------:|:--------:|
| Web Mercator | 3857 | ✅️ | ✅️ |
| UTM / Transverse Mercator | 326xx, 327xx, State Plane | ✅️ | ✅️ |
| Ellipsoidal Mercator | 3395 | ✅️ | ✅️ |
| Lambert Conformal Conic | 2154, 2229, State Plane | ✅️ | ✅️ |
| Albers Equal Area | 5070 | ✅️ | ✅️ |
| Cylindrical Equal Area | 6933 | ✅️ | ✅️ |
| Sinusoidal | MODIS grids | ✅️ | ✅️ |
| Lambert Azimuthal Equal Area | 3035, 6931, 6932 | ✅️ | ✅️ |
| Polar Stereographic | 3031, 3413, 3996 | ✅️ | ✅️ |
| Oblique Stereographic | custom WGS84 | ✅️ | pyproj fallback |
| Oblique Mercator (Hotine) | 3375 (RSO) | implemented, disabled | pyproj fallback |

**Vertical datum support:** `geoid_height`, `ellipsoidal_to_orthometric`, `orthometric_to_ellipsoidal` convert between ellipsoidal (GPS) and orthometric (map/MSL) heights using EGM96 (vendored, 2.6MB) or EGM2008 (77MB, downloaded on first use). Reproject can apply vertical shifts during reprojection via the `vertical_crs` parameter.

**Datum shift support:** Reprojection from non-WGS84 datums (NAD27, OSGB36, DHDN, MGI, ED50, BD72, CH1903, D73, AGD66, Tokyo) applies grid-based shifts from PROJ CDN (sub-metre accuracy) with 7-parameter Helmert fallback (1-5m accuracy). 14 grids are registered covering North America, UK, Germany, Austria, Spain, Netherlands, Belgium, Switzerland, Portugal, and Australia.

**ITRF frame support:** `itrf_transform` converts between ITRF2000, ITRF2008, ITRF2014, and ITRF2020 using 14-parameter time-dependent Helmert transforms from PROJ data files. Shifts are mm-level.

**Reproject performance** (reproject-only, 1024x1024, bilinear, vs rioxarray):

| Transform | xrspatial | rioxarray |
|:---|---:|---:|
| WGS84 -> Web Mercator | 23ms | 14ms |
| WGS84 -> UTM 33N | 24ms | 18ms |
| WGS84 -> Albers CONUS | 41ms | 33ms |
| WGS84 -> LAEA Europe | 57ms | 17ms |
| WGS84 -> Polar Stere S | 44ms | 38ms |
| WGS84 -> LCC France | 44ms | 25ms |
| WGS84 -> Ellipsoidal Merc | 27ms | 14ms |
| WGS84 -> CEA EASE-Grid | 24ms | 15ms |

**Full pipeline** (read 3600x3600 Copernicus DEM + reproject to EPSG:3857 + write GeoTIFF):

| Backend | Time |
|:---|---:|
| NumPy | 2.7s |
| CuPy GPU | 348ms |
| Dask+CuPy GPU | 343ms |
| rioxarray (GDAL) | 418ms |

**Merge performance** (4 overlapping same-CRS tiles, vs rioxarray):

| Tile size | xrspatial | rioxarray | Speedup |
|:---|---:|---:|---:|
| 512x512 | 16ms | 29ms | **1.8x** |
| 1024x1024 | 52ms | 76ms | **1.5x** |
| 2048x2048 | 361ms | 280ms | 0.8x |

Same-CRS tiles skip reprojection entirely and are placed by direct coordinate alignment.

-------

### **Utilities**
Expand Down Expand Up @@ -462,6 +519,7 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset

-------


### **Pathfinding**

| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
Expand Down Expand Up @@ -496,29 +554,29 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset
Importing `xrspatial` registers an `.xrs` accessor on DataArrays and Datasets, giving you tab-completable access to every spatial operation:

```python
import xrspatial
from xrspatial.geotiff import open_geotiff
import xrspatial as xrs
from xrspatial.geotiff import open_geotiff, to_geotiff

# Read a GeoTIFF (no GDAL required)
elevation = open_geotiff('dem.tif')

# Surface analysis — call operations directly on the DataArray
# Surface analysis
slope = elevation.xrs.slope()
hillshaded = elevation.xrs.hillshade(azimuth=315, angle_altitude=45)
aspect = elevation.xrs.aspect()

# Reproject and write as a Cloud Optimized GeoTIFF
dem_wgs84 = elevation.xrs.reproject(target_crs='EPSG:4326')
to_geotiff(dem_wgs84, 'output.tif', cog=True)

# Classification
classes = elevation.xrs.equal_interval(k=5)
breaks = elevation.xrs.natural_breaks(k=10)

# Proximity
distance = elevation.xrs.proximity(target_values=[1])

# Multispectral — call on the NIR band, pass other bands as arguments
nir = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])
red = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])
blue = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])

# Multispectral
vegetation = nir.xrs.ndvi(red)
enhanced_vi = nir.xrs.evi(red, blue)
```
Expand All @@ -539,14 +597,14 @@ ndvi_result = ds.xrs.ndvi(nir='band_5', red='band_4')

##### Function Import Style

All operations are also available as standalone functions if you prefer explicit imports:
All operations are also available as standalone functions:

```python
from xrspatial import hillshade, slope, ndvi
import xrspatial as xrs

hillshaded = hillshade(elevation)
slope_result = slope(elevation)
vegetation = ndvi(nir, red)
hillshaded = xrs.hillshade(elevation)
slope_result = xrs.slope(elevation)
vegetation = xrs.ndvi(nir, red)
```

Check out the user guide [here](/examples/user_guide/).
Expand Down Expand Up @@ -576,7 +634,7 @@ Check out the user guide [here](/examples/user_guide/).

- **Zero GDAL installation hassle.** `pip install xarray-spatial` gets you everything needed to read and write GeoTIFFs, COGs, and VRT files.
- **Pure Python, fully extensible.** All codec, header parsing, and metadata code is readable Python/Numba, not wrapped C/C++.
- **GPU-accelerated reads.** With optional nvCOMP, compressed tiles decompress directly on the GPU via CUDA -- something GDAL cannot do.
- **GPU-accelerated reads.** With optional nvCOMP and nvJPEG2000, compressed tiles decompress directly on the GPU via CUDA -- something GDAL cannot do.

The native reader is pixel-exact against rasterio/GDAL across Landsat 8, Copernicus DEM, USGS 1-arc-second, and USGS 1-meter DEMs. For uncompressed files it reads 5-7x faster than rioxarray; for compressed COGs it is comparable or faster with GPU acceleration.

Expand Down
Loading
Loading