# Accessing SNAP's raster database

This notebook covers accessing the raster data provided through SNAP's raster database, which is one of the "back ends" of the SNAP data API. 

All of the datasets hosted here can be thought of as "data cubes" or "hyper cubes", meaning data that is stored in a structure with 3 or more (usually more) dimensions. Most of these datasets have two spatial axes representing a regular spatial grid, and a temporal axis such as timestamp, month, year, or even decade, and similar temporal groupings over which some source data has been aggregated. What pushes many of these datasets beyond true "cubes" is that they offer various configurations of modeled data, where there are additional dimensions included for things like the source climate model used, the emissions scenario used for the particular model run, or the operation used for a temporal aggregation.

These data cubes are referred to as "coverages" in the protocols we will be using to access this data, so we will stick with that term for the remainder of this notebook. Read more about the concept of a coverage at the [Open Geospatial Consotrium's website](https://ogcapi.ogc.org/coverages/overview.html). 

Here are the packages we will need:

In [1]:
import requests
import xarray as xr
from lxml import etree as ET

# base packages
import io

## Making a WCS request

Our production rasdaman server is currently `apollo.snap.uaf.edu` and you can browse it at [https://apollo.snap.uaf.edu/rasdaman/ows](https://apollo.snap.uaf.edu/rasdaman/ows).

**Note - this notebook will need to be updated if we ever switch the production server!**

Rasdaman offers a few protocols for data access. The one we will use in this notebook is the [Web Coverage Service](https://www.ogc.org/standard/wcs/), or WCS.

Here is an example request that will return some data from a coverage called "tas_2km_projected", subsetting the dimensions "model", "scenario", "year", and "month" to the grid value of 0, and requesting whatever values intersect the X and Y dimensions at 0 and 1000000, respectively:

https://apollo.snap.uaf.edu/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage&COVERAGEID=tas_2km_projected&SUBSET=model(0)&SUBSET=scenario(0)&SUBSET=year(0)&SUBSET=month(0)&SUBSET=X(0)&SUBSET=Y(1000000)&FORMAT=application/json

Let's break this request down. We have the base URL, preceding the actual request:

`https://apollo.snap.uaf.edu/rasdaman/ows`

Which is the same URL provided above that you can use to explore the available coverages. 

Then we have the start of the request, which begins with `?&`, and says we want to use WCS, version 2.0.1, and a "GetCoverage" request. The syntax for distinguising different parts of the request is to use `&`:

`?&SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage`

Then we specify the coverage we are after. This will be the tas_2km_projected coverage, which contains monthly summaries of modeled near-surface air temperature that has been statistically downscaled to a 2km resolution over Alaska:

`&COVERAGEID=tas_2km_projected`

And the next six parts specify how we want to subset the data, describing how we want to subset each of the six dimensions:

`&SUBSET=model(0)&SUBSET=scenario(0)&SUBSET=year(0)&SUBSET=month(0)&SUBSET=X(0)&SUBSET=Y(1000000)`

In this simple example, we are only using single values to subset, but you can specify a range with a `,`, e.g.:

`&SUBSET=model(0,2)` 

will get models 0, 1 and 2. 

You can omit that axis from the request entirely to access all coordinates in the dimension, e.g. only providing:

`&SUBSET=X(0)&SUBSET=Y(1000000)`

will request that all models, scenarios, years, and months be accessed for those particular X and Y values. 

Finally, we have the format we want to receive the data in. With the example requaest:

`&FORMAT=application/json`

we are requesting a JSON output, so the returned value is:

`"-7.3 -1.2 -10"`

There are other formats available, including netCDF, which is ideal for accessing coverage subsets which are still a datacube. That is the subject of the next section.

## Working with datacubes returned via WCS

Say that we only want to get data for a spatial subset of the entire tas_2km_projected coverage used above. This will return a structure that is still inherently 6-dimensional, only with smaller X and Y axes. This is the ideal case for returning a netCDF.

We will modify the above request by omitting all of the other axes subset operations, and specifying the X and Y ranges we care about. The following provides an extent around the Kenai peninsula in the native CRS of the coverage, EPSG:3338:


`SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage&COVERAGEID=tas_2km_projected&SUBSET=X(0,400000)&SUBSET=Y(1000000,1400000)&FORMAT=application/netcdf`

So, here is the full URL, and we will request it using the `requests` library:

In [87]:
url = "https://apollo.snap.uaf.edu/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage&COVERAGEID=tas_2km_projected&SUBSET=year(0,1)&SUBSET=X(0,400000)&SUBSET=Y(1000000,1400000)&FORMAT=application/netcdf"
r = requests.get(url)

Check the request was successful (200 is good):

In [88]:
r

<Response [200]>

And we can use the `io` package to get it in a format which will allow us to load that data into an `xarray.Dataset` object using the `xarray.open_dataset` function:

In [90]:
ds = xr.open_dataset(io.BytesIO(r.content))
ds

### Assigning metadata (labeling the dataset)

You'll probably have noticed that we are only working with integer values for the discretely-valued axes of this coverage. This is currently the case for all of the coverages, i.e. there is no functionality built-in (to our knowledge) for non-numeric coordinates. 

So, you will need to use a "DescribeCoverage" request, which is slightly different from the "GetCoverage" request we used to fetch the data. 

Here is an example of such a request for this tas_2km_projected coverage:

In [None]:
meta_url = "https://apollo.snap.uaf.edu/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=DescribeCoverage&COVERAGEID=tas_2km_projected"
meta_resp = requests.get(meta_url)

In [None]:
def parse_meta_xml_str(meta_xml_str):
    """Parse the DescribeCoverage request to get the XML and restructure the block called "Encoding" to a dict.

    Arguments:
        meta_xml_str (str): string representation of the byte XML response from the WCS DescribeCoverage request

    Returns:
        dim_encodings (dict): lookup table to match data axes or parameters to integer encodings, e.g., '2': 'GFDL-CM3'
    """
    xml_bytes = bytes(bytearray(meta_xml_str, encoding="utf-8"))
    meta_tree = ET.XML(xml_bytes)
    encoding_el = meta_tree.findall(".//Encoding")[0]

    dim_encodings = {}
    for dim in encoding_el.iter():
        if dim.text.startswith("{") and dim.text.endswith("}"):
            encoding_di = eval(dim.text)
            for key, value in encoding_di.items():
                if isinstance(value, dict):
                    dim_encodings[key] = {int(k): v for k, v in value.items()}
                else:
                    dim_encodings[dim.tag] = {int(k): v for k, v in encoding_di.items()}

    return dim_encodings

In [None]:
meta_di = parse_meta_xml_str(meta_resp.text)

In [None]:
def assign_metadata_to_dataset(meta_di, ds):
    """Assign the metadata for a dataset to"""
    for k in meta_di.keys():
        try:
            new_coords = [int(meta_di[k][i]) for i in ds[k].values]
        except ValueError:
            new_coords = [meta_di[k][i] for i in ds[k].values]
        ds = ds.assign({k: new_coords})

    return ds

In [None]:
ds = assign_metadata_to_dataset(meta_di, ds)
ds

#### A note on accessing large data

Many of the datasets hosted in SNAP's Rasdaman instance are rather large, and, as Rasdaman is optimized for subsetting and extraction operations, it is not as efficient for accessing entire datasets. If you are interested in accessing an entire dataset or large chunks of it, please consider finding it via our [our data portal](https://catalog.snap.uaf.edu/geonetwork) or contact us at uaf-snap-data-tools@alaska.edu for help.