<div style="text-align: left">
    <img src="../../_static/xcdat-logo.png" alt="xCDAT logo" style="display: inline-block; width:600px;">
</div>

# Xarray Climate Data Analysis Tools (xCDAT)

<h3 style="text-align: left;">A Python Package for Simple and Robust Analysis of Climate Data</h3>
<div style="text-align: left">
    <h4><img src="images/scipy-logo.png" alt="xCDAT logo" style="display: inline-block; width:50px;">SciPy 2024, 07/11/2024</h3>
</div>
<h4 style="text-align: left; font-style:italic">Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee</h4>
<p style="text-align: left;">With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz</p>

---

_This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344._


## Notebook Setup

Create an Anaconda environment for this notebook using the command below:

```bash
conda create -n xcdat_scipy_2024 -c conda-forge xcdat=0.7.0 xesmf matplotlib ipython ipykernel cartopy nc-time-axis gsw-xarray jupyter jupyter_contrib_nbextensions rise

conda activate xcdat_scipy_2024
```

Then run:

```bash
jupyter contrib nbextension install --user

jupyter nbextension enable splitcell/splitcell
```


To open Jupyter Notebook GUI:

```bash
jupyter notebook
```

To print notebook as PDF:

```bash
jupyter nbconvert —to html <PATH-TO-NOTEBOOK>

# Then Print HTML to PDF (Browser)
```


## An Overview of this Talk

1. Preliminary introduction to Xarray
2. Brief history, scope, and mission of xCDAT
3. Design philosophy
4. Key features with code examples
5. Parallelizing with Dask
6. xCDAT’s community and how to get Involved


<div style="text-align: left; margin-top:10px">
<h3><img src="../../_static/xarray-logo.png" alt="Xarray logo" align=\"center\" style="display: inline-block; width:300px; margin-right:20px">"N-D labeled arrays and datasets in Python"</h3>
</div>

- An evolution of an internal tool developed at The Climate Corporation.
- Released as open source in May 2014
- NumFocus fiscally sponsored project since August 2018
- Based on NumPy, heavily inspired by Pandas

<div style="text-align: center;">
  <img src="../../_static/NumFocus-logo.png" alt="NumFocus logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/numpy-logo.svg" alt="NumPy logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/pandas-logo.svg" alt="Pandas logo" align=\"center\" style="display: inline-block; width:200px;">
</div>


Why Xarray?

- Introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays
- Intuitive, more concise, and less error-prone user experience

Key features include

- File I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)
  Supports various file formats netCDF, Iris, OPeNDAP, Zarr, and more
- Interoperability with scientific Python ecosystem such as NumPy, Dask, Pandas, and Matplotlib

<div style="text-align: center;">
  <img src="../../_static/dask-logo.svg" alt="NumFocus logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/matplotlib-logo.svg" alt="NumPy logo" align=\"center\" style="display: inline-block; width:200px;">
</div>


<div style="text-align: left; margin-top:10px">
<h4><img src="../../_static/xcdat-logo.png" alt="xCDAT logo" align=\"center\" style="display: inline-block; width:300px; margin-right:20px">Xarray Climate Data Analysis Tools for Structured Grid Analysis</h4>
</div>

- Collaboration between:

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/e3sm-logo.jpg" alt="E3SM logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:150px;">
    <img src="../../_static/pcmdi-logo.png" alt="PCMDI logo" style="display: inline-block; margin-right:50px; width:150px;">
    <img src="../../_static/seats-logo.png" alt="SEATS logo" style="display: inline-block; width:150px">
</div>

- xCDAT is an extension of Xarray for climate data analysis on structured grids, a modern successor to the Community Data Analysis Tools (CDAT) library
- Scope is focused on routine climate research analysis operations such as loading, averaging, and regridding data
- Aims to provide features and utilities for simple and robust analysis of climate data


<div style="text-align: left; margin-top:10px">
    <h3>xCDAT features and utilities for simple, robust, and less error-prone analysis code</h3>
</div>

- Extension of `xr.open_dataset()` and `xr.open_mfdataset()` with post-processing options
- Generate missing bounds, center time coords, convert lon axis orientation
- Geospatial weighted averaging
- Temporal averaging, climatologies, departures
- Horizontal structured regridding (extension of xESMF and Python port of regrid2)
- Vertical structured regridding (extension of xGCM)

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/xarray-logo.png" alt="Xarray logo" style="display: inline-block; width:200px">
    <img src="../../_static/esmf-logo.png" alt="ESMF logo" style="display: inline-block; margin-right:50px; width:200px;">
    <img src="../../_static/xgcm-logo.png" alt="xGCM logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:200px;">
</div>

<div style="text-align: center;">
    <img src="../../_static/thumbnails/spatial-avg.png" alt="Spatial average chart" style="display: inline-block; width:200px">
    <img src="../../_static/thumbnails/temporal-average.png" alt="Temporal average chart" style="display: inline-block; margin-right:50px; width:200px;">
    <img src="../../_static/thumbnails/regridding-vertical.png" alt="Vertical regridding chart" style="display: inline-block; margin-right:50px; width:175px;">
</div>


<div style="text-align: left; margin-top:10px">
<h3>The Software Design Philosophy of xCDAT</h3>
</div>

- Intentionality designed to **encourage software sustainability** and **reproducible science**
- **Well-documented and configurable features** allow scientists to rapidly develop robust, reusable, less-error prone, more maintainable code
- **Streamline the developer experience** of writing analysis code by reducing complexity
- xCDAT aims to contribute to **Pangeo's** effort to foster an ecosystem of mutually compatible geoscience Python packages

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/pangeo-logo.png" alt="Pangeo logo" align=\"center\" style="display: inline-block; width:200px;">
    <img src="../../demos/1-25-23-cwss-seminar/images/rtd-logo.png" alt="PCMDI logo" style="display: inline-block; width:200px;">
</div>


### Getting Started with xCDAT

- xCDAT is available for installation through Anaconda on the `conda-forge` channel
  - Install command: `conda install -c conda-forge xcdat`
- Check out xCDAT’s Read the Docs, which we strive to keep up-to-date
  - https://xcdat.readthedocs.io/en/stable/

<div style="text-align: center;">
    <img src="../../demos/1-25-23-cwss-seminar/images/anaconda-logo.png" alt="Anaconda logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:200px;">
    <img src="../../demos/1-25-23-cwss-seminar/images/conda-forge-logo.png" alt="Conda Forge logo" style="display: inline-block; margin-right:50px; width:200px;">
</div>

<div style="text-align: center; margin-top:10px">
  <img src="../../demos/1-25-23-cwss-seminar/images/rtd-logo.png" alt="Read the docs logo" style="display: inline-block; width:200px">
  <img src="../../demos/1-25-23-cwss-seminar/images/rtd-screenshot.png" alt="xCDAT docs" style="display: inline-block; width:200px">
</div>


### How to use xCDAT

- xCDAT extends Xarray Dataset objects via "accessor" classes.
- In the example below, custom spatial functionality is exposed by chaining the spatial accessor attribute to the Dataset object. This chaining enables access to the underlying spatial average() method.

<div style="text-align: center;">
    <img src="../../_static/accessor_api.svg
" alt="Anaconda logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:500px;">
</div>

- `temporal`
  - `.average()`, `.group_average()`, `.climatology()`, `.depatures()`
- `spatial`
  - `.average()`, `.get_weights()`
- `bounds`
  - `.get_bounds()`, `.add_bounds()`, `.add_missing_bounds()`


xCDAT also provides general utilities in the form of functions

- `open_dataset()`, `open_mfdataset()`
- `center_times()`, `decode_time()`
- `swap_lon_axis()`
- `create_axis()`
- `create_grid()`
- `get_dim_coords()`
- `get_dim_keys()`

Visit the API Reference page for a complete list: https://xcdat.readthedocs.io/en/latest/api.html


## End-to-End Analysis and Visualization of E3SM Data using nco and xCDAT

### Overview

This exercise will walkthrough using regridding E3SM data to a
rectilinear grid using `ncremap`, then performing analysis and visualization using xCDAT.

### Sections

1. Prerequisite: Set up the E3SM Unified Environment v1.10.0 Python Kernel
2. Setup Code
3. Use NCO to regrid E3SM data to a rectilinear grid
4. I/O
5. Regridding
6. Spatial Averaging
7. Temporal Computations
8. General Dataset Utilities


### Setup Code


In [1]:
# This style import is necessary to properly render Xarray's HTML output with
# the Jupyer RISE extension.
# GitHub Issue: https://github.com/damianavila/RISE/issues/594
# Source: https://github.com/smartass101/xarray-pydata-prague-2020/blob/main/rise.css

from IPython.core.display import HTML

style = """
<style>
.reveal pre.xr-text-repr-fallback {
    display: none;
}
.reveal ul.xr-sections {
    display: grid
}

.reveal ul ul.xr-var-list {
    display: contents
}
</style>
"""


HTML(style)

In [None]:
import glob

import numpy as np
from xarray.coding.calendar_ops import _datetime_to_decimal_year as dt2decimal
import xcdat as xc
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

### Use NCO to regrid E3SM data to a rectilinear grid


#### Now call ncremap to regrid the file to a 0.5 x 0.5 degree grid

Typically a user would call this command directly from the shell or write a batch script to run ncremap. Here we use the bash decorator in Jupyter to run `ncremap` on a directory of files (using a wildcard to filter for `.h0` files, which include monthly output we’d like to analyze). We will then move the remapped files to a `remapped/` directory.


In [None]:
%%bash
# create output directory
mkdir -p remapped
# source e3sm-unified environment
source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh
# do regridding
# format: ncremap -m REMAPFILE.nc -t 1 -v VAR_OF_INTEREST /PATH/TO/DATA/*nc
# Subsetting files for "h0", which is the monthly history field.
ncremap -m /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc -t 1 -v TREFHT /global/cfs/cdirs/e3sm/www/Tutorials/2024/simulations/extendedOutput.v3.LR.historical_0101/archive/atm/hist/extendedOutput.v3.LR.historical_0101.eam.h0.*nc >/dev/null 2>&1
# move output to remapped directory
mv extendedOutput.v3.LR.historical_0101.eam.h0.*nc remapped/

### I/O


#### Now let's load in the regridded data and use xcdat to perform additional calculations on the 0.5 x 0.5 degree grid


Use `xc.open_mfdataset()` to open all the remapped netcdf files in a single `xr.Dataset` object. With `xcdat`, you can specify the directory `remapped` and xcdat will read in all netcdf files as one `xr.dataset` object. You could also use a wildcard with xarray (`ds = xr.open_mfdataset('remapped/*.nc’)`). `open_mfdataset` is essentially the same operation in both Xarray and xCDAT, but `xcdat` will add missing bounds and handles some additional time axes.

We will be analyzing a few years of temperature (`TREFHT`) data from E3SM v3.

- Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.open_mfdataset.html


In [None]:
ds = xc.open_mfdataset("remapped")
ds

#### ...but checkout the time coordinate:


In [None]:
ds.time.values[0:3]

The monthly time coordinates begin in 2/2000, even though our first file is for 1/2000. This is because E3SM saves out monthly history at midnight at the end of the month. xCDAT can handle this by centering the time coordinates between the monthly time bounds (using `center_times=True`):


In [None]:
ds = xc.open_mfdataset("remapped", center_times=True)
ds

### Regridding with xCDAT

We often want to regrid a dataset to a new grid to facilitate data analysis or comparisons with other datasets. The current dataset is at 0.5 x 0.5<sup>o</sup> resolution, so we'll start be remapping it to a 4 x 4<sup>o</sup> grid.


#### First, we specify the target grid


In [None]:
# create target axes
nlat = xc.create_axis(
    "lat", np.arange(-88, 90, 4), attrs={"units": "degrees_north", "axis": "Y"}
)
nlon = xc.create_axis(
    "lon", np.arange(2, 360, 4), attrs={"units": "degrees_east", "axis": "X"}
)

Create the target grid using the target axes and bounds.

- Documentation: https://xcdat.readthedocs.io/en/latest/generated/xcdat.create_grid.html#xcdat.create_grid


In [None]:
ngrid = xc.create_grid(x=nlon, y=nlat)

#### Call the xESMF regridder

Here we're using bilinear regridding, but other methods may be appropriate (e.g., you may want to use "conservative_normed" for fields that should be conserved globally).


Regrid "TREFHT" with the `ngrid` created above using `xesmf` and `bilinear`.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.regridder.horizontal.html


In [None]:
ds_xesmf = ds.regridder.horizontal("TREFHT", ngrid, tool="xesmf", method="bilinear")

#### Compare the results (for the first timestep)

Now we just plot the results for comparison.


In [None]:
map_proj = ccrs.Robinson()

# plot original data (first time step)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1, projection=map_proj)
p = ds.TREFHT[0].plot(
    transform=ccrs.PlateCarree(),  # the data's projection
    subplot_kws={"projection": map_proj},
    cbar_kwargs={"orientation": "horizontal"},
    cmap=plt.cm.RdBu_r,
)
ax = plt.gca()
ax.coastlines()
plt.title("Original")

# plot the remapped data (first time step)
plt.subplot(1, 2, 2, projection=map_proj)
p = ds_xesmf.TREFHT[0].plot(
    transform=ccrs.PlateCarree(),  # the data's projection
    subplot_kws={"projection": map_proj},
    cbar_kwargs={"orientation": "horizontal"},
    cmap=plt.cm.RdBu_r,
)
ax = plt.gca()
ax.coastlines()
plt.title("xESMF 4$^{\circ}$ x 4$^{\circ}$")

### Vertical Regridding

xcdat can also regrid in the vertical. Here we'll grab some 3D temperature data and regrid it in the vertical. First, we need to remap some 3-dimensional data to a rectilinear grid (like we did for the surface air temperature data, `TREFHT`).


In [None]:
%%bash
# source e3sm-unified environment
source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh
# remap (we are only remapping one file and specifying the output location)
ncremap -m /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc -t 1 -v T /global/cfs/cdirs/e3sm/www/Tutorials/2024/simulations/extendedOutput.v3.LR.historical_0101/archive/atm/hist/extendedOutput.v3.LR.historical_0101.eam.h0.2000-01.nc T_extendedOutput.v3.LR.historical_0101.eam.h0.2000-01.nc >/dev/null 2>&1

Now let's load the data:


In [None]:
# specify file we just regridded
fn = "T_extendedOutput.v3.LR.historical_0101.eam.h0.2000-01.nc"

# load regridded data
ds3d = xc.open_dataset(fn)

Next, we will do the vertical remapping...


In [None]:
# first construct the 3D pressure field
pressure = ds3d["hyam"] * 1000.0 + ds3d["hybm"] * ds3d["PS"]

# next, construct the target pressure axis
target_plevs = [
    100000,
    92500,
    85000,
    75000,
    70000,
    60000,
    50000,
    40000,
    30000,
    25000,
    20000,
    15000,
    10000,
    7000,
    5000,
    3000,
    1000,
    500,
    300,
    100,
]
nplev = xc.create_grid(z=xc.create_axis("lev", target_plevs))

Regrid the `"T"` variable using `nplev` as the output grid, `"log"` method, and `pressure` as the target data.

- Example Documentation: https://xcdat.readthedocs.io/en/stable/examples/regridding-vertical.html#4:-Remap-cloud-fraction-from-model-hybrid-coordinate-to-pressure-levels


In [None]:
dsvr = ds3d.regridder.vertical("T", nplev, method="log", target_data=pressure)

Finally, we plot the result:


In [None]:
# plot result
dsvr_zonal = dsvr.spatial.average("T", axis=["X"]).squeeze()
dsvr_zonal.T.plot(cmap=plt.cm.RdBu_r)
plt.gca().invert_yaxis()

### Spatial Averaging with xCDAT

Area-weighted spatial averaging is a common technique to reduce dimensionality in geospatial datasets. xCDAT can perform this calculation over full domains or regions of interest.


Calculate the spatial average of "TREFHT" and store the results in a Python variable.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.spatial.average.html


In [None]:
ds_global = ds.spatial.average("TREFHT")

#### Now let's plot the results.

Note that the spatial averager returns a dataset object so we still need to specify "TREFHT" to plot the dataarray.


In [None]:
dtime = dt2decimal(ds_global.time)  # decimal time
plt.plot(dtime, ds_global["TREFHT"].values)
plt.xlabel("Year")
plt.ylabel("Global Mean Temperature [K]")

Above, we did not specify any constraints. So xCDAT calculated the domain (global) average. Users can also specify their own bounds.


Calculate the the average surface temperature (`"TREFHT"`) in the Niño 3.4 region.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.spatial.average.html
- Hint: Pass latitude bounds of (-5, 5) and longitude bounds of (190, 240) and keep the weights.


In [None]:
ds_nino34 = ds_xesmf.spatial.average(
    "TREFHT", lat_bounds=(-5, 5), lon_bounds=(190, 240), keep_weights=True
).load()

In this case, we specified `keep_weights=True`. The weights provide full spatial weighting for grid cells entirely within the Niño 3.4 region. If a grid cell is partially in the Niño 3.4 region, it received partial weight (note we use the 4 x 4 degree grid in this example to show the partial weights and to speed up plotting). Note that you can also supply your own weights (but you can't automatically subset with `lat_bounds` and `lon_bounds` if you supply your own weights).


In [None]:
# show the nino 3.4 time series
plt.figure(figsize=(10, 2))
plt.subplot(1, 2, 1)
plt.plot(dtime, ds_nino34["TREFHT"].values)
plt.xlabel("Year")
plt.ylabel("Surface Temperature [K]")
plt.title("Niño 3.4 time series")

# show the weights
map_proj = ccrs.PlateCarree(central_longitude=180)
ax = plt.subplot(1, 2, 2, projection=map_proj)
plt.pcolor(
    ds_nino34.lon,
    ds_nino34.lat,
    ds_nino34.lat_lon_wts.T,
    transform=ccrs.PlateCarree(),
    cmap=plt.cm.GnBu,
)
ax.set_extent([120, 300, -30, 30], crs=ccrs.PlateCarree())
ax.coastlines()
plt.colorbar(orientation="horizontal")
plt.title("Nino 3.4 Weights")

### Temporal Computations with xCDAT

In the examples below, we will performing temporal computations on the `xarray.Dataset` object using xCDAT.


#### Annual cycle

In the global mean time series above, there are large seasonal swings in global temperature. Here we compute the seasonal mean climatology.


Calculate the seasonal mean climatology for the `"TREFHT"` variable.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.climatology.html


In [None]:
# compute the climatology
ds_clim = ds.temporal.climatology("TREFHT", freq="season")

#### Now we plot the season means


In [None]:
map_proj = ccrs.Robinson()
titles = ["DJF", "MAM", "JJA", "SON"]
plt.figure(figsize=(12, 10))
for i in range(4):
    plt.subplot(2, 2, i + 1, projection=map_proj)
    p = ds_clim.TREFHT[i].plot(
        transform=ccrs.PlateCarree(),
        subplot_kws={"projection": map_proj},
        cbar_kwargs={"orientation": "horizontal"},
        cmap=plt.cm.RdBu_r,
        vmin=220,
        vmax=310,
    )
    ax = plt.gca()
    ax.coastlines()
    plt.title(titles[i])

#### Departures


It can also be useful to show the departures from the climatological average.

Calculate the seasonal mean climatology for the `"TREFHT"` variable. In this case, `xcdat` will operate on the global mean time series we calculated above. Note that you can set the climatological reference period (e.g., with `reference_period=("1950-01-01", "1999-12-31")` for historical era departures).

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.departures.html


In [None]:
ds_global_anomaly = ds_global.temporal.departures(
    "TREFHT", freq="month", reference_period=("2000-01-01", "2009-12-31")
)

#### Now let's plot the departures from the climatological average.


In [None]:
plt.plot(dtime, ds_global_anomaly.TREFHT.values)
plt.xlabel("Year")
plt.ylabel("Global Mean Surface Temperature Anomaly [K]")

### Group averages

`xcdat` also allows you to calculate group averages (e.g., annual or seasonal mean from monthly data or monthly mean from daily data).


Calculate the annual mean from anomaly time series.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.group_averages.html


In [None]:
# compute annual mean from anomaly time series
ds_global_anomaly_annual = ds_global_anomaly.temporal.group_average(
    "TREFHT", freq="year"
)

#### Now let's plot the results.


In [None]:
# plot data
dtime_annual = dt2decimal(ds_global_anomaly_annual.time) + 0.5
plt.plot(
    dtime, ds_global_anomaly.TREFHT.values, label="Monthly departure", color="gray"
)
plt.plot(
    dtime_annual,
    ds_global_anomaly_annual.TREFHT.values,
    color="k",
    linestyle="",
    marker="_",
    label="Annual Mean",
)
plt.legend(frameon=False)
plt.xlabel("Year")
plt.ylabel("Global Mean Surface Temperature [K]")

### General Dataset Utilities

xCDAT includes various utilities for data manipulation, including
reorientation of the longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.


#### Reorient the longitude axis

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes systems.


In [None]:
ds.lon

Use `xc.swap_lon_axis` to swap the longitude axis from (0, 360) to (-180, 180) and view
the new longitude axis.

- Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.swap_lon_axis.html


In [None]:
ds2 = xc.swap_lon_axis(ds, to=(-180, 180))

ds2.lon

#### Add missing bounds

Bounds are critical to many `xcdat` operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. `add_missing_bounds()` will attempt to produce bounds if they do not exist in the original dataset.


In [None]:
# We are dropping the existing bounds to demonstrate adding bounds.
ds4 = ds.drop_vars("time_bnds")

In [None]:
try:
    ds4.bounds.get_bounds("T")
except KeyError as e:
    print(e)

Add the missing time bounds using `.bounds.add_missing_bounds()`.

- Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.bounds.add_missing_bounds.html
- Hint: Use the `axes` arg and pass a list containing a single string, `"T"` for time.


In [None]:
ds5 = ds4.bounds.add_missing_bounds(axes=["T"])
ds5

## Parallelism with Dask

<div style="text-align:center">
  <img src="../../_static/dask-logo.svg" alt="Dask logo" style="display: inline-block; width:300px;">
</div>

> Nearly all existing xarray methods have been extended to work automatically with Dask arrays for parallelism
>
> &mdash; <cite>https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray</cite>


- Parallelized xarray methods include **indexing, computation, concatenating and grouped operations**
- xCDAT APIs that build upon xarray methods inherently support Dask parallelism
- Dask arrays are loaded into memory only when absolutely required (e.g., generating weights for averaging)


In [14]:
filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

# Use .chunk() to activate Dask arrays
# NOTE: `open_mfdataset()` automatically chunks by the number of files, which
# might not be optimal.
ds = xc.open_dataset(filepath, chunks={"time": "auto"})

In [15]:
ds.tas

Unnamed: 0,Array,Chunk
Bytes,210.28 MiB,127.97 MiB
Shape,"(1980, 145, 192)","(1205, 145, 192)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 210.28 MiB 127.97 MiB Shape (1980, 145, 192) (1205, 145, 192) Dask graph 2 chunks in 2 graph layers Data type float32 numpy.ndarray",192  145  1980,

Unnamed: 0,Array,Chunk
Bytes,210.28 MiB,127.97 MiB
Shape,"(1980, 145, 192)","(1205, 145, 192)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Now let's run the xCDAT `spatial.average()` method and view the output


In [16]:
ds.spatial.average("tas").tas

Unnamed: 0,Array,Chunk
Bytes,15.47 kiB,9.41 kiB
Shape,"(1980,)","(1205,)"
Dask graph,2 chunks in 29 graph layers,2 chunks in 29 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.47 kiB 9.41 kiB Shape (1980,) (1205,) Dask graph 2 chunks in 29 graph layers Data type float64 numpy.ndarray",1980  1,

Unnamed: 0,Array,Chunk
Bytes,15.47 kiB,9.41 kiB
Shape,"(1980,)","(1205,)"
Dask graph,2 chunks in 29 graph layers,2 chunks in 29 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Notice how the `tas` array is still a Dask Array.


### Further Dask Guidance

Visit these pages for more guidance (e.g., when to parallelize):

- Ongoing xCDAT Dask Investigation: https://github.com/xCDAT/xcdat/discussions/376
  - Performance metrics, best practices, and possibly a guide
- Parallel computing with Dask: https://docs.xarray.dev/en/stable/user-guide/dask.html
- Xarray with Dask Arrays: https://examples.dask.org/xarray.html


<div style="text-align: left; margin-top:10px">
    <img src="../../_static/github-logo.png" alt="GitHub logo" align=\"center\" style="display: inline-block; width:300px;">
    <img src="../../_static/github-logo-icon.png" alt="GitHub logo" align=\"center\" style="display: inline-block; width:100px;">
    <h3>Get involved and join the xCDAT community!</h3>
</div>

- **Code contributions** are welcome and appreciated
  - GitHub Repository: https://github.com/xCDAT/xcdat
  - Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html
- **Submit and/or address tickets** for feature suggestions, bugs, and documentation updates
  - GitHub Issues: https://github.com/xCDAT/xcdat/issues
- **Participate in forum discussions** on version releases, architecture, feature suggestions, etc.
  - GitHub Discussions: https://github.com/xCDAT/xcdat/discussions


<div style="text-align: left; margin-top:10px">
    <img src="../../_static/xcdat-logo.png" alt="xCDAT logo" align=\"center\" style="display: inline-block; width:400px;">
    <h3>Recap of Key Points</h3>
</div>

- xCDAT is an **extension of Xarray for climate data analysis on structured grids**, a modern successor to the Community Data Analysis Tools (CDAT) library
- **Focused on routine climate research analysis operations** including loading, wrangling, averaging, such as temporal averaging, spatial averaging, and regridding
- Designed to encourages **software sustainability and reproducible science**
- **Parallelizable** through Xarray’s support for Dask, which enables efficient processing of large datasets
