Skip to content

uw-cryo/raster_sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

raster_sampling

Compilation of resources and inter-comparison of raster sampling strategies

Introduction and Motivation

Sampling a raster or ndarray at points is a common, basic need, but there is surprisingly little discussion or intercomparison of available options.

This is such an important topic, and the scientific literature is full of errors related to a lack of expertise when sampling (i.e. using nearest neighbor for large grid cells). Most new users will go with rasterio sample function, without considering consequences.

There is value in general implementation of different approaches in a centralized library, potentially with different wrappers depending on input object (DataFrame, numpy arrays, xarray DataSet). This could be installed as a dependency and used for different applications.

Goals for sprint

  • Continue compiling notes/resources on available approaches
  • Discuss and prepare test data
  • Design intercomparison function/notebook to document spread of values/results when using different methods
  • Discuss potential integration with existing efforts (demquery, icepyx, rioxarray)

Existing approaches

  1. rasterio sample for 2D raster

  2. Zonal statistics (local window or circular region)

  3. scipy.ndimage.map_coordinates

  4. scipy interpolation options (for a local window of raster pixels)

  5. demquery https://github.com/kylebarron/demquery (thanks to @scottyhq!)

  6. xarray interpolation and sampling (http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation)

    • Example for GeoDataFrame of points points (creates new dimension "z" to store output):
    import xarray, rioxarray, geopandas
    
    #Open raster dataset to sample
    dem = rioxarray.open_rasterio("dem.tif")
    
    #Open point vector dataset and reproject
    gdf = geopandas.read_file("points.geojson").to_crs(dem.rio.crs)
    
    #Add z dimension to store samples
    x = xarray.DataArray(gdf.geometry.x.values, dims=“z”)
    y = xarray.DataArray(gdf.geometry.y.values, dims=“z”)
    
    #Sample the raster
    samples = dem.interp(x=x, y=y) #default method is linear
    
    #Add to the geodataframe
    gdf[‘dem’] = samples.values
    
    #Compute difference with values in another column
    gdf[‘h_mean diff’] = gdf[‘h_mean’] - gdf[‘dem’]
    
  7. pangeo-pyinterp: https://github.com/CNES/pangeo-pyinterp

  8. Custom implementations in existing tools from @SmithB and @tsutterley

  9. Build bilinear equations in lsq framework, then solve: https://github.com/SmithB/LSsurf

  10. https://geoviews.org/user_guide/Resampling_Grids.html (suggested by Friedrich)

  11. https://github.com/OGGM/oggm

  12. https://pyresample.readthedocs.io/en/latest/

  13. https://docs.generic-mapping-tools.org/dev/grdtrack.html

Scaling Considerations

  1. scipy ndinterp - create interpolator once, use multiple times
  2. @tsutterley has implemention for MPI tile-based loading/sampling, using scipy splines:

Issues

  • Handling of raster nodata/nan - many algorithms will propagate nans differently
  • Performance for large point arrays, large rasters
  • Input raster cell dimensions vs point sampling density
  • Handling of different raster data types (e.g., only use nearest for classified rasters)
  • xarray - great for ndarrays from GCMs, most efficient ways to chunk large point coordinate arrays, perform operations

How to choose?

  • Functions to assess input raster properties (e.g., classified vs. continuous, large gradients, spatial frequency content) and point properties
  • Relative point density for points and raster
  • Potentially add some logic to select best method based on these metrics

Method Comparison

  • For same input points and raster/ndarray, extract values from all samples
  • No "truth", but can look at various statistics
  • ATL06 Already has DEM attributes in ATL06, ArcticDEM/REMA (reverts to GDEM) - details on how sampling is performed unknown (Jeff Lee)

Candidate use cases

  • Simple use case: starting with point coordinates in NumPy arrays, raster DEM as GeoTiff
  • More advanced use case: GCM reanalysis data, xyzt point coordinates

User Guide and Documentation

  • Provide general recommendations based on typical scenarios
  • Document case where different methods yield very different results

Next steps

  • Review resources listed above, compile notes
  • Start creating notebooks with snippet for approach or specific problem, links to documentation

About

Compilation of resources and inter-comparison of raster sampling strategies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published