# Sample points from a geospatial raster

When you have an N-dimensional raster cube (`Dataset` or `DataArray`) composed of geospatial data indexed by latitude and longitude or any projected coordinates, you may be interested in the values from that cube only for a small subset of locations. Vector data cubes are an ideal data structure for such a use case as they preserve the structure of the original cube and all its attributes while allowing you to index it by a point geometry. The geometry can represent any arbitrary points within the bounds of the original raster and does not have to be constrained to a grid.

In [3]:
import geopandas as gpd
import numpy as np
import shapely
import xarray as xr
import xvec

The example using the ERA-Interim reanalysis, monthly averages of upper level data:

In [2]:
ds = xr.tutorial.open_dataset("eraint_uvz")
ds

This Dataset is indexed by longitude and latitude representing the spatial grid. When sampling points using `ds.xvec.sample_points`, you are replacing these two dimensions with a single one with shapely geometry.

## Array of shapely geometries

Create an array of points used for sampling. Usually, you would have specific locations, like weather stations, cities or anything else of interest. Here, we can create points randomly within the bounds. 

In [4]:
points = shapely.points(
    np.random.uniform(ds.longitude.min(), ds.longitude.max(), 10),
    np.random.uniform(ds.latitude.min(), ds.latitude.max(), 10),
)
points

array([<POINT (-60.246 -76.905)>, <POINT (-109.642 -61.446)>,
       <POINT (-54.661 14.712)>, <POINT (-104.115 27.979)>,
       <POINT (79.945 -79.931)>, <POINT (-63.862 -39.181)>,
       <POINT (-165.045 47.703)>, <POINT (47.578 13.693)>,
       <POINT (88.992 82.829)>, <POINT (2.792 77.518)>], dtype=object)

Using the `.xvec.sample_points` method with a numpy array of geometries will create a `Dataset` indexed by the `GeometryIndex`:

In [11]:
sampled = ds.xvec.sample_points(points, x_coords="longitude", y_coords="latitude")
sampled

However, since the numpy array of geometries does not hold any information on CRS, the resulting `GeometryIndex` has no CRS assigned.

In [13]:
sampled.xindexes

Indexes:
    level     PandasIndex
    month     PandasIndex
    geometry  GeometryIndex (crs=None)

In that situation, you can (and should) specify it manually:

In [8]:
sampled = ds.xvec.sample_points(
    points, x_coords="longitude", y_coords="latitude", crs=4326
)
sampled

In [9]:
sampled.geometry.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## GeoPandas GeoSeries

If your points are stored as a GeoPandas `GeoSeries` or `GeometryArray`, Xvec will retrieve the CRS automatically.

In [18]:
gs = gpd.GeoSeries(points, crs=4326)
gs

0     POINT (-60.24622 -76.90546)
1    POINT (-109.64171 -61.44555)
2      POINT (-54.66080 14.71159)
3     POINT (-104.11485 27.97929)
4      POINT (79.94457 -79.93142)
5     POINT (-63.86153 -39.18149)
6     POINT (-165.04458 47.70308)
7       POINT (47.57798 13.69327)
8       POINT (88.99217 82.82860)
9        POINT (2.79190 77.51837)
dtype: geometry

In [15]:
sampled = ds.xvec.sample_points(gs, x_coords="longitude", y_coords="latitude")
sampled

## DataArray

It will also be used if you have a DataArray of shapely geometries with a `"crs"` key with the CRS information in its attributes. The typical situation is to reuse a DataArray created by xvec before.

In [16]:
sampled.geometry

You can see above that the `sampled.geometry` has a `crs` stored as an attribute.

In [17]:
sampled_da = ds.xvec.sample_points(
    sampled.geometry, x_coords="longitude", y_coords="latitude"
)
sampled_da