
# How to use `xarray.DataTree` with hierarchical data


## Overview: 

This notebook will demonstrate how to use `xarray.DataTree` with [_GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHH_07)_](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_07/summary) and use xarray's plotting capabilities to plot precipitation in the Gulf of Mexico during Hurricane Ida. GPM_3IMERGHH_07 is a L3 gridded product with a group hierarchical structure.

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import xarray as xr

### Opening the dataset with `open_datatree()`

In [None]:
imerghh_730 = xr.open_datatree('~/xarray-data/imerghh_730.hdf5', engine='h5netcdf')
imerghh_730

### Nodes
Groups in a netcdf4 or hdf5 file in the DataTree model are represented as "nodes" in the DataTree model.
We can list all of the groups with `.groups`

In [None]:
imerghh_730.groups

### Accessing variables in a nested groups
Nested variables and groups can be accessed with either dict-like syntax or method based syntax.

In [None]:
imerghh_730['/Grid']

# Returns only the data contained in the "/Grid" group

In [None]:
imerghh_730['/Grid/precipitation']

In [None]:
imerghh_730.Grid.precipitation

# Method based syntax

### Get the parent and child nodes from a group

In [None]:
imerghh_730['/Grid/Intermediate'].parent

In [None]:
imerghh_730.Grid.children

### `Xarray.DataTree` objects and `xarray.Dataset` objects have the same key properties like:

- `dims`: a dictionary mapping of dimension names to lengths, for the variables in a node, and a node’s ancestors.

- `data_vars`: a dict-like container of DataArrays corresponding to variables in a node.

- `coords`: another dict-like container of DataArrays, corresponding to coordinate variables in a node, and a node’s ancestors.

- `attrs`: dict with metadata relevant to data in a node.

With `DataTree` you can get these properties at any of the nodes (groups) they are defined in.

In [None]:
imerghh_730.dims
# Note there are no dimensions, coordinates, or data variables defined at the root node

In [None]:
imerghh_730.attrs

In [None]:
imerghh_730['/Grid'].dims

In [None]:
imerghh_730['/Grid/Intermediate'].dims

In [None]:
imerghh_730['/Grid/Intermediate'].data_vars

### Creating a DataTree from a dictionary with `DataTree.from_dict()`
You can create a DataTree from a dictionary of `xr.Datasets` objects or `xr.DataTree` objects.
The key of the dictionary is the node/group of the new DataTree object.

In [None]:
imerghh_830 = xr.open_datatree('~/xarray-data/imerghh_830.hdf5', engine='h5netcdf')
xr.DataTree.from_dict({'time_830': imerghh_830})

### Using `DataTree.from_dict()` to make a DataTree object
Lets combine our two DataTree objects (`imerghh_730` and `imerghh_830`) at each time stamp with `DataTree.from_dict()`.
All of the groups in the original datasets will remain intact but now we have two additional groups `/time_730` and `/time_830`.
The groups `/Grid` and `/Grid/Intermediate`are nested in ancestor node's `/time_730` and `/time_830` respectively. They are all children of the root node `'/'`

In [None]:
combined_imerghh_tree = xr.DataTree.from_dict({'time_730': imerghh_730, 'time_830': imerghh_830})
combined_imerghh_tree

In [None]:
combined_imerghh_tree.children

### Combining data with DataTree
DataTree objects (like Dataset objects) can contain `DataArray` objects. We can `concat` and `merge` DataArrays in an DataTree along a specified dimension. Lets combine the precipitation data from nodes `/time_730` and `/time_830`. Note these datasets have the same size across their `"time"`, `"lat"` and `"lon"` dimensions.


In [None]:
precip_concat = xr.concat(
    [
        combined_imerghh_tree['time_730/Grid/precipitation'],
        combined_imerghh_tree['time_830/Grid/precipitation'],
    ],
    dim='time',
)

### Plotting precipitation data with DataTree
Xarray’s plotting capabilities are centered around DataArray objects. To plot DataTree objects we access their relevant DataArrays in this case, our concatenated `DataArray` `precip_concat`. 

We use the `.where()` method to get a subset of precipitation data over the Gulf of Mexico.

In [None]:
precip_concat_sub = precip_concat.where(
    (precip_concat.lat >= 20)
    & (precip_concat.lat <= 35)
    & (precip_concat.lon >= -110)
    & (precip_concat.lon <= -78),
    drop=True,
)

### Data masking
We add a data mask to the precipitation values that are zero. We will use the `.where()` method to get data values greater than 0.0

In [None]:
precipitation_subset_mask = precip_concat_sub.where(precip_concat_sub > 0.0)

### Plot the data with `.plot()` as a `FacetGrid` object
We can use `xarray.plot.FacetGrid` objects to make plots with multiple axes. Each axes shows the same relationship conditioned on different levels of some dimension, in our case different time stamps. Note since this data is two-dimensional it calls `xarray.plot.pcolormesh()` by default with just the `.plot()` method.

In [None]:
# Plot the precipitation data
precip_plot = precipitation_subset_mask.plot(
    figsize=(12, 6),
    transform=ccrs.PlateCarree(),
    subplot_kws={'projection': ccrs.PlateCarree()},
    x="lon",
    y="lat",
    col='time',  # The dimension ("time") we are faceting our plot on
    col_wrap=2,  # Number of subplots
    cmap='jet',
    cbar_kwargs={"orientation": "horizontal", "pad": 0.15, "shrink": 0.6},
    vmin=precipitation_subset_mask.min(),
    vmax=precipitation_subset_mask.max(),
)


for ax in precip_plot.axs.flat:
    ax.set_extent([-100, -80, 20, 35])
    ax.coastlines()
    gl = ax.gridlines(linewidth=1, color='black', linestyle='--')
    gl.left_labels = True
    gl.bottom_labels = True