<div style="text-align: left">
    <img src="../../_static/xcdat-logo.png" alt="xCDAT logo" style="display: inline-block; width:600px;">
    <div style="text-align: left">
    <h3><img src="images/scipy-logo.png" alt="SciPy logo" style="display: inline-block; width:50px; margin-right:10px">SciPy 2024, 07/11/2024</h3>
    </div>
    <h1>Xarray Climate Data Analysis Tools</h1>
    <h3 style="font-style:italic">
        A Python Package for Simple and Robust Analysis of Climate Data
    </h3>
    <h3>
        Core Developers: Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee
    </h3>
    <p>With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz</p>
</div>

---

<p style="text-align: left; font-size: 12px">This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344.</p>


## Notebook Setup

Create an Anaconda environment for this notebook using the command below:

```bash
conda create -n xcdat_scipy_2024 -c conda-forge xcdat=0.7.0 nco matplotlib ipython ipykernel cartopy nc-time-axis gsw-xarray jupyter jupyter_contrib_nbextensions rise wget

conda activate xcdat_scipy_2024
```

Then run:

```bash
jupyter contrib nbextension install --user

jupyter nbextension enable splitcell/splitcell
```

To open Jupyter Notebook GUI:

```bash
jupyter notebook
```

To print notebook as PDF:

```bash
jupyter nbconvert —-to html <PATH-TO-NOTEBOOK>
# Then "Print HTML to PDF" in your browser
```


## An Overview of this Talk

1. The driving force behind xCDAT
2. Basic intro to Xarray
3. Brief history, scope, and mission of xCDAT
4. Design philosophy and key features
5. Technical demo: end-to-end analysis workflow
6. Parallelism with xCDAT and Dask
7. xCDAT’s community and how to get involved


### The Driving Force Behind xCDAT

- CDAT (Community Data Analysis Tools) library is the predecessor to xCDAT.
- CDAT has provided open-source climate data analysis and visualization packages for over 20 years.

- Since CDAT’s inception, the volume of climate data has grown substantially as a result of:

  - Larger pool of data products
  - Increasing spatiotemporal resolution of model and observational data.

<div style="text-align: center; margin-top:10px">
<img src="../../_static/cdat-logo.png" alt="CDAT logo" align=\"center\" style="display: inline-block; width:300px;">
</div>


#### CDAT has been end-of-life as of December 2023

- This presents an issue for the many users and packages that depend on CDAT
- All of these factors sparked a driving need for new analysis software


### What should this new analysis software (aka xCDAT) offer?

- Offer similar core capabilities as CDAT
  - e.g., geospatial averaging, temporal averaging, regridding
- Use modern technologies in the library’s stack
  - Capable of handling large datasets (e.g., parallelism, lazy operations)
- Be maintainable, extensible, and easy-to-use
  - Python Enhancement Proposals (PEPs)
- Automated DevOps processes (unit testing, code coverage)
  - Actively maintained documentation
- An open-source community that can sustain the project
  - Encourage GitHub contributions
  - Serve the needs of the climate community in the long-term
  - Community engagement efforts (e.g., Pangeo, ESGF)


<div style="text-align: left; margin-top:10px">
<img src="../../_static/xarray-logo.png" alt="Xarray logo" align=\"center\" style="display: inline-block; width:300px;">

<h3>"N-D labeled arrays and datasets in Python"</h3>
</div>

- The core technology of xCDAT
- An evolution of an internal tool developed at The Climate Corporation.
- Released as open source in May 2014
- NumFocus fiscally sponsored project since August 2018
- Based on NumPy, heavily inspired by Pandas

<div style="text-align: center;">
  <img src="../../_static/NumFocus-logo.png" alt="NumFocus logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/numpy-logo.svg" alt="NumPy logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/pandas-logo.svg" alt="Pandas logo" align=\"center\" style="display: inline-block; width:200px;">
</div>


**Why Xarray?**

- Introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays
- Intuitive, more concise, and less error-prone user experience

**Key features include:**

- File I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)
  Supports various file formats netCDF, Iris, OPeNDAP, Zarr, and more
- Interoperability with scientific Python ecosystem such as NumPy, Dask, Pandas, and Matplotlib

<div style="text-align: center;">
  <img src="../../_static/dask-logo.svg" alt="NumFocus logo" align=\"center\" style="display: inline-block; width:200px;">
  <img src="../../_static/matplotlib-logo.svg" alt="NumPy logo" align=\"center\" style="display: inline-block; width:200px;">
</div>


<div style="text-align: left; margin-top:10px">
<img src="../../_static/xcdat-logo.png" alt="xCDAT logo" align=\"center\" style="display: inline-block; width:300px;">
<h4>Xarray Climate Data Analysis Tools for Structured Grid Analysis</h4>
</div>

- Collaboration between:

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/e3sm-logo.jpg" alt="E3SM logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:200px;">
    <img src="../../_static/pcmdi-logo.png" alt="PCMDI logo" style="display: inline-block; margin-right:50px; width:200px;">
    <img src="../../_static/seats-logo.png" alt="SEATS logo" style="display: inline-block; width:200px">
</div>

- xCDAT is an extension of Xarray for climate data analysis on structured grids, a modern successor to the Community Data Analysis Tools (CDAT) library
- Scope is focused on routine climate research analysis operations such as loading, wrangling, averaging, and regridding data
- Aims to provide features and utilities for simple and robust analysis of climate data


<div style="text-align: left; margin-top:10px">
    <h3>xCDAT features and utilities for simple, robust, and less error-prone analysis code</h3>
</div>

- Extension of `xr.open_dataset()` and `xr.open_mfdataset()` with post-processing options
- Generate missing bounds, center time coords, convert lon axis orientation
- Geospatial weighted averaging
- Temporal averaging, climatologies, departures
- Horizontal structured regridding (extension of xESMF and Python port of regrid2)
- Vertical structured regridding (extension of xGCM)

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/xarray-logo.png" alt="Xarray logo" style="display: inline-block; width:300px; margin-right:50px">
    <img src="../../_static/esmf-logo.png" alt="ESMF logo" style="display: inline-block; margin-right:50px; width:300px;">
    <img src="../../_static/xgcm-logo.png" alt="xGCM logo" align=\"center\" style="display: inline-block;     margin-right:50px; width:300px;">
</div>

<div style="text-align: center;">
    <img src="../../_static/thumbnails/spatial-avg.png" alt="Spatial average chart" style="display: inline-block; width:250px; margin-right:50px">
    <img src="../../_static/thumbnails/temporal-average.png" alt="Temporal average chart" style="display: inline-block; margin-right:50px; width:250px;">
    <img src="../../_static/thumbnails/regridding-vertical.png" alt="Vertical regridding chart" style="display: inline-block; margin-right:50px; width:250px;">
</div>


<div style="text-align: left; margin-top:10px">
<h3>The Software Design Philosophy of xCDAT</h3>
</div>

- Intentionality designed to **encourage software sustainability** and **reproducible science**
- **Well-documented and configurable features** allow scientists to rapidly develop robust, reusable, less-error prone, more maintainable code
- xCDAT aims to contribute to **Pangeo's** effort to foster an ecosystem of mutually compatible geoscience Python packages

<div style="text-align: center; margin-top:10px">
    <img src="../../_static/pangeo-logo.png" alt="Pangeo logo" align=\"center\" style="display: inline-block; width:300px; margin-right:50px">
    <img src="../../demos/1-25-23-cwss-seminar/images/rtd-logo.png" alt="PCMDI logo" style="display: inline-block; width:300px;">
</div>


### xCDAT simplifies Xarray code for specific analysis operations

#### A comparison of code to calculate global-mean, monthly anomalies.

- First code block (xCDAT) -> 17 lines, flexible (via API arguments), easy to read and write
- Second code block (Xarray) -> 31 lines, not flexible, hard to read and write


In [None]:
import xcdat as xc

# 1. Open the dataset.
dpath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
ds = xc.open_dataset(dpath)

# 2. Calculate monthly departures.
ds_anom = ds.temporal.departures("tas", freq="month")

# 3. Compute global average.
ds_anom_glb = ds_anom.spatial.average("tas")

# 4. Calculate annual averages
ds_anom_glb_ann = ds_anom_glb.temporal.group_average("tas", freq="year")

In [None]:
import numpy as np
import xarray as xr

# 1. Open the dataset.
dpath = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
ds = xr.open_dataset(dpath)

# 2. Calculate monthly departures.
tas_mon = ds.tas.groupby("time.month")
tas_mon_clim = tas_mon.mean(dim="time")
tas_anom = tas_mon - tas_mon_clim

# 3. Compute global average.
coslat = np.cos(np.deg2rad(ds.lat))
tas_anom_wgt = tas_anom.weighted(coslat)
tas_anom_global = tas_anom_wgt.mean(dim="lat").mean(dim="lon")

# 4. Calculate annual averages.
# ncar.github.io/esds/posts/2021/yearly-averages-xarray/
mon_len = tas_anom_global.time.dt.days_in_month
mon_len_by_year = mon_len.groupby("time.year")
wgts = mon_len_by_year / mon_len_by_year.sum()

temp_sum = tas_anom_global * wgts
temp_sum = temp_sum.resample(time="YS").sum(dim="time")
denom_sum = (wgts).resample(time="YS").sum(dim="time")

tas_anom_global_ann = temp_sum / denom_sum

### Getting Started with xCDAT

- xCDAT is available for installation through Anaconda on the `conda-forge` channel
  - Install command: `conda install -c conda-forge xcdat`
- Check out xCDAT’s Read the Docs, which we strive to keep up-to-date
  - https://xcdat.readthedocs.io/en/stable/

<div style="text-align: center;">
    <img src="../../demos/1-25-23-cwss-seminar/images/anaconda-logo.png" alt="Anaconda logo" align=\"center\" style="display: inline-block; margin-right:50px; width:300px;">
    <img src="../../demos/1-25-23-cwss-seminar/images/conda-forge-logo.png" alt="Conda Forge logo" style="display: inline-block; margin-right:50px; width:300px;">
</div>

<div style="text-align: center; margin-top:10px">
  <img src="../../demos/1-25-23-cwss-seminar/images/rtd-logo.png" alt="Read the docs logo" style="display: inline-block; width:300px; margin-right:50px">
  <img src="../../demos/1-25-23-cwss-seminar/images/rtd-screenshot.png" alt="xCDAT docs" style="display: inline-block; width:300px">
</div>


### How to use xCDAT

- xCDAT extends Xarray Dataset objects via "accessor" classes.

<div style="text-align: center;">
    <img src="../../_static/accessor_api.svg
" alt="Anaconda logo" align=\"center\" style="display: inline-block; margin-right:50px; width:900px;">
      <figcaption style="text_align:center; font-style: italic; font-size: 14px;">In the example above, custom spatial functionality is exposed by chaining the `spatial` accessor attribute to the Dataset object. This chaining enables access to the underlying `spatial.average()` method.</figcaption>
</div>

#### Accessors classes include:

- `spatial`
  - `.average()`, `.get_weights()`
- `temporal`
  - `.average()`, `.group_average()`, `.climatology()`, `.depatures()`
- `bounds`
  - `.get_bounds()`, `.add_bounds()`, `.add_missing_bounds()`


#### xCDAT also provides general utilities as Python functions

- `open_dataset()`, `open_mfdataset()`
- `center_times()`, `decode_time()`
- `swap_lon_axis()`
- `create_axis()`
- `create_grid()`
- `get_dim_coords()`
- `get_dim_keys()`

Visit the API Reference page for a complete list: https://xcdat.readthedocs.io/en/latest/api.html


## End-to-End Analysis and Visualization of CMIP Data using xCDAT

### Overview

This exercise will walkthrough using regridding E3SM data to a
rectilinear grid `xcdat`.

### Sections

1. Setup Code
2. I/O
3. Horizontal Regridding
4. Vertical Regridding
5. Spatial Averaging
6. Temporal Computations
7. General Dataset Utilities


#### (1) Working with local files downloaded from ESGF

Note, wget bash scripts are provided for you to download the datasets locally. Run them with the commands below . `-s` flag is required to bypass ESG credential check.

- `cl` (~17 GB): `bash docs/demos/24-07-11-scipy-2024/cl_Amon_E3SM-2-0_historical_r1i1p1f1_gr/cl_Amon_wget_script_2024-6-25_14-39-4.sh -s`
- `tas` (~11 GB): `bash docs/demos/24-07-11-scipy-2024/tas_day_E3SM-2-0_historical_r1i1p1f1_gr/tas_day_wget_script_2024-6-25_11-31-0.sh -s`

#### (2) Working with remote files on ESGF via OPeNDAP

If the datasets are not downloaded locally, the notebook will automatically open the datasets (lazily) from ESGF using Xarray + OPeNDAP.

**The datasets will be downloaded into memory once computations are performd.**


### 1. Setup Code


In [None]:
# This style import is necessary to properly render Xarray's HTML output with
# the Jupyer RISE extension.
# GitHub Issue: https://github.com/damianavila/RISE/issues/594
# Source: https://github.com/smartass101/xarray-pydata-prague-2020/blob/main/rise.css

from IPython.core.display import HTML

style = """
<style>
.reveal pre.xr-text-repr-fallback {
    display: none;
}
.reveal ul.xr-sections {
    display: grid
}

.reveal ul ul.xr-var-list {
    display: contents
}
</style>
"""


HTML(style)

In [None]:
import os

import numpy as np
from xarray.coding.calendar_ops import _datetime_to_decimal_year as dt2decimal
import xcdat as xc
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

### 2. I/O


- We will be analyzing a few years of near-sea surface air temperature (`tas`) data from E3SM v2 CMIP data.


- Use `xc.open_dataset()` to a single netCDF dataset as an `xr.Dataset` object.
- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.open_dataset.html
- [ESGF Search URL](https://aims2.llnl.gov/search?project=CMIP6&activeFacets=%7B%22activity_id%22%3A%22CMIP%22%2C%22source_id%22%3A%22E3SM-2-0%22%2C%22variable_id%22%3A%22tas%22%2C%22table_id%22%3A%22day%22%2C%22frequency%22%3A%22day%22%2C%22variant_label%22%3A%22r1i1p1f1%22%2C%22experiment_id%22%3A%22historical%22%7D)


In [None]:
data_tas_url = "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

ds = xc.open_dataset(data_tas_url, chunks={"time": "auto"})

In [None]:
ds.tas

### 3. Horizonal Regridding

We often want to regrid a dataset to a new grid to facilitate data analysis or comparisons with other datasets.

The current dataset is on a native atmosphere N96 grid (145x192, latxlon), so we'll start be remapping it to a 4 x 4<sup>o</sup> grid.


#### First, we specify the target grid


In [None]:
# create target axes
nlat = xc.create_axis(
    "lat", np.arange(-88, 90, 4), attrs={"units": "degrees_north", "axis": "Y"}
)
nlon = xc.create_axis(
    "lon", np.arange(2, 360, 4), attrs={"units": "degrees_east", "axis": "X"}
)

#### Create the target grid using the target axes and bounds.

- API Documentation: https://xcdat.readthedocs.io/en/latest/generated/xcdat.create_grid.html#xcdat.create_grid


In [None]:
ngrid = xc.create_grid(x=nlon, y=nlat)

In [None]:
ngrid

#### Call the xESMF regridder

Here we're using bilinear regridding, but other methods may be appropriate (e.g., you may want to use "conservative_normed" for fields that should be conserved globally).

- Regrid `tas` with the `ngrid` created above using `xesmf` and `bilinear`.
- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.regridder.horizontal.html


In [None]:
ds_xesmf = ds.regridder.horizontal("tas", ngrid, tool="xesmf", method="bilinear")
ds_xesmf.compute()

In [None]:
ds_xesmf.tas

#### Compare the results (for the first timestep)

Now we just plot the results for comparison.


In [None]:
map_proj = ccrs.Robinson()

# plot original data (first time step)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1, projection=map_proj)
p = ds.tas[0].plot(
    transform=ccrs.PlateCarree(),  # the data's projection
    subplot_kws={"projection": map_proj},
    cbar_kwargs={"orientation": "horizontal"},
    cmap=plt.cm.RdBu_r,
)
ax = plt.gca()
ax.coastlines()
plt.title("Original")

# plot the remapped data (first time step)
plt.subplot(1, 2, 2, projection=map_proj)
p = ds_xesmf.tas[0].plot(
    transform=ccrs.PlateCarree(),  # the data's projection
    subplot_kws={"projection": map_proj},
    cbar_kwargs={"orientation": "horizontal"},
    cmap=plt.cm.RdBu_r,
)
ax = plt.gca()
ax.coastlines()
plt.title("xESMF 4$^{\circ}$ x 4$^{\circ}$")

### 4. Vertical Regridding

xCDAT can also regrid in the vertical dimension.

Here we'll grab some cloud fraction data (`cl`) and regrid it from model hybrid coordinate to pressure levels.


Documentation: https://xcdat.readthedocs.io/en/stable/examples/regridding-vertical.html#4:-Remap-cloud-fraction-from-model-hybrid-coordinate-to-pressure-levels


In [None]:
data_url = "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/CMIP6/CMIP/E3SM-Project/E3SM-2-0/historical/r1i1p1f1/Amon/cl/gr/v20220830/cl_Amon_E3SM-2-0_historical_r1i1p1f1_gr_185001-189912.nc"

ds_3d = xc.open_dataset(data_url, chunks={"time": "auto"})
ds_3d = ds_3d.sel(time=ds_3d.time.dt.year == 1850)

ds_3d.cl

#### Regrid the `cl` variable using `new_pressure_grid` as the output grid, `"linear"` method, and `pressure_data` as the target data.


Example Documentation: https://xcdat.readthedocs.io/en/stable/examples/regridding-vertical.html#4:-Remap-cloud-fraction-from-model-hybrid-coordinate-to-pressure-levels


In [None]:
def hybrid_coordinate(p0, a, b, ps, **kwargs):
    return a * p0 + b * ps


pressure_data = hybrid_coordinate(**ds_3d.data_vars)
new_pressure_grid = xc.create_grid(z=xc.create_axis("lev", np.linspace(100000, 1, 13)))

ds_vr = ds_3d.regridder.vertical(
    "cl", new_pressure_grid, method="log", target_data=pressure_data
)
ds_vr.compute()

#### Finally, we plot the result:


In [None]:
# plot result
ds_vr_zonal = ds_vr.isel(time=0).spatial.average("cl", axis=["X"]).squeeze()
ds_vr_zonal.cl.plot(cmap=plt.cm.RdBu_r)
plt.gca().invert_yaxis()

### 5. Spatial Averaging

Area-weighted spatial averaging is a common technique to reduce dimensionality in geospatial datasets. xCDAT can perform this calculation over full domains or regions of interest.


#### Calculate the spatial average of `tas` and store the results in a Python variable.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.spatial.average.html


In [None]:
ds_global = ds.spatial.average("tas")

In [None]:
ds_global

#### Now let's plot the results (first 120 timesteps)

Note that the spatial averager returns a dataset object so we still need to specify `tas` to plot the dataarray.


In [None]:
ds_global.tas.isel(time=slice(0, 120)).plot()
plt.title("Global Average Surface Temperature")
plt.xlabel("Year")
plt.ylabel("Near Surface Air Temperature [K]")

Above, we did not specify any constraints. So xCDAT calculated the domain (global) average. Users can also specify their own bounds.


#### Calculate the near-surface air temperature (`tas`) in the Niño 3.4 region.


API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.spatial.average.html


In [None]:
ds_nino34 = ds_xesmf.spatial.average(
    "tas", lat_bounds=(-5, 5), lon_bounds=(190, 240), keep_weights=True
).load()

In this case, we specified `keep_weights=True`. The weights provide full spatial weighting for grid cells entirely within the Niño 3.4 region.

- If a grid cell is partially in the Niño 3.4 region, it received partial weight.
- Note we use the 4 x 4 degree grid in this example to show the partial weights and to speed up plotting).
- Note that you can also supply your own weights (but you can't automatically subset with `lat_bounds` and `lon_bounds` if you supply your own weights).


#### Plot the Niño 3.4 region time series


In [None]:
dtime = dt2decimal(ds_nino34.time)  # decimal time

plt.figure(figsize=(10, 2))
plt.subplot(1, 2, 1)
plt.plot(dtime, ds_nino34["tas"].values)
plt.xlabel("Year")
plt.ylabel("Near-Surface Air Temperature [K]")
plt.title("Niño 3.4 time series")

# show the weights
map_proj = ccrs.PlateCarree(central_longitude=180)
ax = plt.subplot(1, 2, 2, projection=map_proj)
plt.pcolor(
    ds_nino34.lon,
    ds_nino34.lat,
    ds_nino34.lat_lon_wts.T,
    transform=ccrs.PlateCarree(),
    cmap=plt.cm.GnBu,
)
ax.set_extent([120, 300, -30, 30], crs=ccrs.PlateCarree())
ax.coastlines()
plt.colorbar(orientation="horizontal")
plt.title("Nino 3.4 Weights")

### 6. Temporal Computations with xCDAT

In the examples below, we will performing temporal computations on the `xarray.Dataset` object using xCDAT.


#### 6.1 Seasonal cycle mean

In the global mean time series above, there are large seasonal swings in global near-surface air temperature. Here we compute the seasonal mean climatology.


#### Calculate the seasonal mean climatology for the `tas` variable.


API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.climatology.html


In [None]:
ds_clim = ds.temporal.climatology("tas", freq="season")

#### Now we plot the season means


In [None]:
map_proj = ccrs.Robinson()
titles = ["DJF", "MAM", "JJA", "SON"]
plt.figure(figsize=(12, 10))
for i in range(4):
    plt.subplot(2, 2, i + 1, projection=map_proj)
    p = ds_clim.tas[i].plot(
        transform=ccrs.PlateCarree(),
        subplot_kws={"projection": map_proj},
        cbar_kwargs={"orientation": "horizontal"},
        cmap=plt.cm.RdBu_r,
        vmin=220,
        vmax=310,
    )
    ax = plt.gca()
    ax.coastlines()
    plt.title(titles[i])

#### 6.2 Departures

It can also be useful to show the departures from the climatological average.


#### Calculate the seasonal mean departures for the `tas` variable.

In this case, `xcdat` will operate on the global mean time series we calculated above.

Note that you can set the climatological reference period (e.g., with `reference_period=("2000-01-01", "2009-12-31")` for historical era departures).


API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.departures.html


In [None]:
ds_global_anomaly = ds_global.temporal.departures(
    "tas", freq="month", reference_period=("2000-01-01", "2009-12-31")
)

#### Now let's plot the departures from the climatological average.


In [None]:
plt.plot(dtime, ds_global_anomaly.tas.values)
plt.xlabel("Year")
plt.ylabel("Global Mean Near-Surface Air Temperature Anomaly [K]")

#### 6.3 Group averages

`xcdat` also allows you to calculate group averages (e.g., annual or seasonal mean from monthly data or monthly mean from daily data).


#### Calculate the annual mean from anomaly time series.

- API Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.temporal.group_averages.html


In [None]:
ds_global_anomaly_annual = ds_global_anomaly.temporal.group_average("tas", freq="year")

#### Now let's plot the results.


In [None]:
# plot data
dtime_annual = dt2decimal(ds_global_anomaly_annual.time) + 0.5
plt.plot(dtime, ds_global_anomaly.tas.values, label="Monthly departure", color="gray")
plt.plot(
    dtime_annual,
    ds_global_anomaly_annual.tas.values,
    color="k",
    linestyle="",
    marker="_",
    label="Annual Mean",
)
plt.legend(frameon=False)
plt.xlabel("Year")
plt.ylabel("Global Mean Near-Surface Air Temperature [K]")

### 7. General Dataset Utilities

xCDAT includes various utilities for data manipulation, including
reorientation of the longitude axis, centering of time coordinates using time bounds, and adding and getting bounds.


#### 7.1. Reorient the longitude axis

Longitude can be represented from 0 to 360 E or as 180 W to 180 E. xcdat allows you to convert between these axes systems.


In [None]:
ds.lon

#### Use `xc.swap_lon_axis` to swap the longitude axis from (0, 360) to (-180, 180) and view

the new longitude axis.

- Documentation: https://xcdat.readthedocs.io/en/stable/generated/xcdat.swap_lon_axis.html


In [None]:
ds2 = xc.swap_lon_axis(ds, to=(-180, 180))

ds2.lon

#### 7.2. Add missing bounds

Bounds are critical to many `xcdat` operations. For example, they are used in determining the weights in spatial or temporal averages and in regridding operations. `add_missing_bounds()` will attempt to produce bounds if they do not exist in the original dataset.

- Documentation: https://xcdat.readthedocs.io/en/stable/generated/xarray.Dataset.bounds.add_missing_bounds.html


In [None]:
# We are dropping the existing bounds to demonstrate adding bounds.
ds4 = ds.drop_vars("time_bnds")

In [None]:
try:
    ds4.bounds.get_bounds("T")
except KeyError as e:
    print(e)

In [None]:
ds5 = ds4.bounds.add_missing_bounds(axes=["T"])
ds5

## Parallelism with Dask

<div style="text-align:center">
  <img src="../../_static/dask-logo.svg" alt="Dask logo" style="display: inline-block; width:300px;">
</div>

> "Nearly all existing xarray methods have been extended to work automatically with Dask arrays for parallelism"

- Parallelized xarray methods include **indexing, computation, concatenating and grouped operations**
- xCDAT APIs that build upon xarray methods inherently support Dask parallelism
- Dask arrays are loaded into memory only when absolutely required (e.g., generating weights for averaging)

&mdash; <cite>https://docs.xarray.dev/en/stable/user-guide/dask.html#using-dask-with-xarray</cite>


#### Example of Parallelizing xCDAT with Dask

1. Open ~11 GB dataset using `xc.open_mfdataset()`
2. Chunked on time dimension by 10 (parallel) and by 1 (serial)
3. Compare run times of global spatial averaging


In [None]:
import os

import xcdat as xc

FILENAMES = [
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_18500101-18591231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_18600101-18691231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_18700101-18791231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_18800101-18891231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19000101-19091231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19100101-19191231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19300101-19391231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19400101-19491231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19600101-19691231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19700101-19791231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19800101-19891231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_19900101-19991231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_20000101-20091231.nc",
    "tas_day_E3SM-2-0_historical_r1i1p1f1_gr_20100101-20141231.nc",
]

LOCAL_DIR = "docs/demos/24-07-11-scipy-2024/tas_day_E3SM-2-0_historical_r1i1p1f1_gr"
LOCAL_FILEPATHS = [os.path.join(LOCAL_DIR, file) for file in FILENAMES]

ds_parallel = xc.open_mfdataset(LOCAL_FILEPATHS, chunks={"time": 10})
ds_serial = xc.open_mfdataset(LOCAL_FILEPATHS, chunks={"time": 1})

#### Now let's run compare the xCDAT `spatial.average()` performance (parallel vs. serial)

Properly chunking the dataset on the time dimension results in speedups when using xCDAT's
spatial averager.


In [None]:
%%time
ds_parallel_avg = ds_parallel.spatial.average("tas")
ds_parallel_avg = ds_parallel.compute()

In [None]:
%%time
ds_serial_avg = ds_serial.spatial.average("tas")
ds_serial_avg = ds_serial.compute()

### Further Dask Guidance

Visit these pages for more guidance (e.g., when to parallelize):

<div style="text-align: center; margin-top:10px">
    <img src="./images/dask-guide.png" alt="Dask Guide" align=\"center\" style="display: inline-block; width:450px;border: 3px solid #555;">
</div>

- Parallel computing with Dask (xCDAT, show above): https://xcdat.readthedocs.io/en/latest/examples/parallel-computing-with-dask.html
- Parallel computing with Dask (Xarray): https://docs.xarray.dev/en/stable/user-guide/dask.html
- Xarray with Dask Arrays: https://examples.dask.org/xarray.html


<div style="text-align: left; margin-top:10px">
    <img src="../../_static/xcdat-logo.png" alt="xCDAT logo" align=\"center\" style="display: inline-block; width:300px;">
    <h3>Recap of Key Points</h3>
</div>

- xCDAT is an **extension of Xarray for climate data analysis on structured grids**, a modern successor to the Community Data Analysis Tools (CDAT) library
- **Focused on routine climate research analysis operations** including loading, wrangling, averaging, such as temporal averaging, spatial averaging, and regridding
- Designed to encourages **software sustainability and reproducible science**
- **Parallelizable** through Xarray’s support for Dask, which enables efficient processing of large datasets


<div style="text-align: left; margin-top:10px">
    <h3><img src="../../_static/github-logo.png" alt="GitHub logo" align=\"center\" style="display: inline-block; width:200px;">
    <img src="../../_static/github-logo-icon.png" alt="GitHub logo" align=\"center\" style="display: inline-block; width:75px;">
    Get involved and join the xCDAT community!</h3>
</div>

- **Code contributions** are welcome and appreciated
  - GitHub Repository: https://github.com/xCDAT/xcdat
  - Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html
- **Submit and/or address tickets** for feature suggestions, bugs, and documentation updates
  - GitHub Issues: https://github.com/xCDAT/xcdat/issues
- **Participate in forum discussions** on version releases, architecture, feature suggestions, etc.
  - GitHub Discussions: https://github.com/xCDAT/xcdat/discussions
