A place to test the `xarray` functionality
============================================

## Use `intake` to pull in remote data as well as STAC catelogs

* Note: This is **NOT** a part of the deliverable library, however, we were tasked with gaining experience in this domain and this is a great place to do so. See this [example](http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/intake.html#Intake-xarray-example) by Pangeo.
* [Intake API docs](https://intake.readthedocs.io/en/latest/api_user.html).
* Amazing [list of STAC Catalogs](https://stacindex.org/catalogs?access=public) (some are APIs with access to the data, others are staic catalogs that point to data).
* **Note:** `PySTAC` and [`pystac_client`](https://pystac-client.readthedocs.io/en/stable/api.html) (made by STAC) can be used to query by BBOX! See some cool examples [here](https://github.com/stac-utils/pystac-client/blob/6f0c933509f80b724d46042beffc8273b0e5d301/docs/tutorials/item-search-intersects.ipynb).
* **We should use [`stackstac`](https://github.com/gjoseph92/stackstac) to combine raster endpoints into one DataArrays IF they only have one band (major limitation).**

### Access Planet's STAC for Surface Reflectance data @ `planet/fusion/14N/29E-188N`

In [12]:
import xarray as xr
import rioxarray as rio
import numpy as np
from pathlib import Path
from typing import Union
import os
import gc
import intake
import xarray as xr
import hvplot.xarray
import matplotlib.pyplot as plt
import matplotlib.animation as animation

In [2]:
gee_stac = r'https://radiantearth.github.io/stac-browser/#/external/storage.googleapis.com/earthengine-stac/catalog/catalog.json'
planet_stac = r'https://www.planet.com/data/stac/catalog.json'
planetary_comp_stac = r'https://planetarycomputer.microsoft.com/api/stac/v1'

In [3]:
# open a STAC catalog
stac_catalog = intake.open_stac_catalog(planet_stac)
print(f'The following collections can be accessed from stac_catalog: {list(stac_catalog)}')

The following collections can be accessed from stac_catalog: ['planet-stac-skysat', 'planet-disaster-data', 'sn7', 'planet/fusion/14N/29E-188N']


In [4]:
collection = 'planet/fusion/14N/29E-188N'
s_collections = stac_catalog[collection]
print(f'The following sub-collections are available within {collection}: {list(s_collections)}')

The following sub-collections are available within planet/fusion/14N/29E-188N: ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16', '2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20', '2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24', '2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28', '2021-01-29', '2021-01-30', '2021-01-31', '2021-02-01', '2021-02-02', '2021-02-03', '2021-02-04', '2021-02-05', '2021-02-06', '2021-02-07', '2021-02-08', '2021-02-09', '2021-02-10', '2021-02-11', '2021-02-12', '2021-02-13', '2021-02-14', '2021-02-15', '2021-02-16', '2021-02-17', '2021-02-18', '2021-02-19', '2021-02-20', '2021-02-21', '2021-02-22', '2021-02-23', '2021-02-24', '2021-02-25', '2021-02-26', '2021-02-27', '2021-02-28', '2021-03-01', '2021-03-02', '2021-03-03', '2021-03-04', '2021-03-05', '2021-03-06', '2021-03-0

In [5]:
list(s_collections)[0].split('-', 3)

['2021', '01', '01']

### Merge all surface reflectance data into one giant dataarray

In [26]:
%%time
collection_data = []
labels = []
past_months = []
for i, date in enumerate(list(s_collections)):
    month = date.split('-', 3)[1]
    if month not in past_months:
        past_months.append(month)
        labels.append(date)
        data = s_collections[date]['sr'].to_dask()[:, 7500:, 7500:]
        print(f'Added: {date}')
        collection_data.append(data)

Added: 2021-01-01
Added: 2021-02-01
Added: 2021-03-01
Added: 2021-04-01
Added: 2021-05-01
Added: 2021-06-01
Added: 2021-07-01
Added: 2021-08-01
Added: 2021-09-01
Added: 2021-10-01
Added: 2021-11-01
Added: 2021-12-01
CPU times: total: 438 ms
Wall time: 597 ms


### Transform the data array into the format we want
* Options include stack/unstacking, multi-indexing, or **just creating a new DataArray (apparently the [fastest option?](https://stackoverflow.com/questions/72264983/how-to-reshape-xarray-dataset-by-collapsing-coordinate))**
* **Takeaways:**
    * Do not convert to numpy! It massively kills performance.
    * On the backend `hvplot` converts to ``xr.DataSet`, which is impossible if the name attribute of the dataarray matches a variable name. Keep this in mind!

In [27]:
def ndvi(nir: xr.DataArray, red: xr.DataArray):
    return (nir - red) / (nir + red)

In [28]:
%%time
ndvi_data = [ndvi(d[-1], d[0]) for d in collection_data]
del collection_data

CPU times: total: 141 ms
Wall time: 186 ms


In [29]:
len(ndvi_data)

12

In [30]:
# create new dataarray from the existing one
def stack_data(data_arrays: list, labels: list, coords_ref: xr.DataArray):
    return xr.DataArray(data=data_arrays,
             dims=['time', 'x', 'y'],
             coords={
             'x': coords_ref.coords['x'],
             'y': coords_ref.coords['y'],
             'time': labels[:len(data_arrays)],
             'ndvi': (('time', 'x', 'y'), data_arrays)},
             name='NDVI_data')

In [31]:
gc.collect()

791

### Explore visualization
**Notes:**
* Despite not being included in `us_fdr` attributes, `us_fdr.rio.encoded_nodata` returns 255.0. This explains why when we load the dataset with `mask=True` the plot exlucdes nodata.
* **To manually mask data:** Use `xarray.where` to set value, and then use `raster.rio.write_nodata(nodata_cell_value, encoded=True, inplace=True`) which will then make `raster.rio.nodata = nan`, but `raster.rio.encoded_nodata = nodata_cell_value`.

In [None]:
%%time
stack_array = stack_data(ndvi_data,
                         labels,
                         coords_ref=ndvi_data[0])

In [None]:
%matplotlib widget
stack_array[6].plot(vmin=-1, vmax=1, cmap='PRGn')

In [23]:
%matplotlib widget
stack_array.hvplot.image(groupby='time', cmap='PRGn', clim=(-1, 1), width=500)

In [24]:
#stack_array.plot()