<div style="text-align: center">
<img src="https://github.com/xCDAT/xcdat/raw/main/docs/_static/xcdat_logo.png" alt="xCDAT logo" style="display: inline-block; width:35%;">
</div>
</div>

<h1 style="text-align: lef;">A gentle introduction to xCDAT (Xarray Climate Data Analysis Tools)</h1>
<h2 style="text-align: left;">A package for simple and robust climate data analysis</h2>
<h3 style="text-align: left; font-style:italic">Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee</h3>

---
<p style="text-align: left;">With thanks to Peter Gleckler, Paul Durack, Karl Taylor, and Chris Golaz</p>


_This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344._

## What this talk covers

* What is xCDAT?
* An overview of Xarray
  * Historial Context
  * Key features and capabilities
  * The Xarray data models (xr.Dataset, xr.DataArray)
  * Resources for learning Xarray
* How does xCDAT fit in the Xarray ecosystem?
* The API design of xCDAT
  - Understand how to leverage xCDAT with Xarray
* Demo of xCDAT capabilities

## What is xCDAT?

* xCDAT is a lightweight extension of xarray for climate data analysis on structured grids 
* This package is jointly developed by scientists and developers from E3SM and PCMDI
    * Performed for the E3SM and SEATS (Simplifying ESM Analysis Through Standards) projects
    
<div style="text-align: center">   
<img src="https://e3sm.org/wp-content/uploads/2019/05/E3SM_Logo.jpg" alt="E3SM logo" style="display: inline-block; width:25%;">
<img src="https://pcmdi.llnl.gov/Data/media/images/220224_durack1_PCMDILogoWithText-trim-940Wpx-png8.png" alt="PCMDI logo" style="display: inline-block; width:25%;">
<img src="https://www.seatstandards.org/SEATSlogo.png" alt="SEATS logo" style="display: inline-block; width:25%;">

</div>
</div>

## Let's Begin with Xarray

### Historical context:
* Xarray is an evolution of an internal tool developed at The Climate Corporation
* Released as open source in May 2014
* NumFocus fiscally sponsored project since August 2018


<div style="text-align: center">
    <img src="https://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png" alt="xarray logo" style="display: inline-block; width:25%;">
    <img src="https://xarray.dev/NumFOCUS_sponsored_project_logo.svg" alt="NumFOCUS logo" style="display: inline-block; width:25%;">
</div>


### Key Features and Capabilities in Xarray

* “N-D labeled arrays and datasets in Python”
    * Based on NumPy, heavily inspired by Pandas
    * Supports I/O for netCDF, Iris, OPeNDAP, Zarr, and GRIB.
* Interoperable with scientific Python ecosystem including NumPy, Dask, Pandas, and Matplotlib
* Features include:
    * File I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)
    

<div style="text-align: center">
    <img src=" https://xarray.dev/xarray-datastructure.png" alt="xarray logo" style="display: inline-block; width:45%;">
</div>

Source: https://xarray.dev/#features


## The Xarray Data Models in a Nutshell

**_"Xarray data models are borrowed from netCDF file format, which provides xarray with a natural and portable serialization format."_**

* `xarray.Dataset`
  * Dictionary-like containers of DataArrays, mapping a variable name to each DataArray

* `xarray.DataArray`
  * A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays
  
    

Source: https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html#core-data-structures

## The `Dataset` Model


Datasets have four key properties:
* `dims`: a dictionary mapping from dimension names to the fixed length of each dimension (e.g., {'x': 6, 'y': 6, 'time': 8})
* `data_vars`: a dict-like container of DataArrays corresponding to variables
* `coords`: another dict-like container of DataArrays intended to label points used in data_vars (e.g., arrays of numbers, datetime objects or strings)
* `attrs`: dict to hold arbitrary metadata


<div style="text-align: center">
    <img src="https://docs.xarray.dev/en/stable/_images/dataset-diagram.png" alt="xarray logo" style="display: inline-block; width:50%">
</div>

Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset


### Let's open a netCDF4 dataset!

We will open a netCDF4 file from ESGF containing the `tas` variable, which represents near-surface air temperature.

In [12]:
import xarray as xr

filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

ds = xr.open_dataset(filepath)

In [13]:
print(ds)

<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height     float64 ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] ...
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    tas        (time, lat, lon) float32 ...
Attributes: (12/49)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...                              ...
    version:                         v20200605
    license:                   

## The `DataArray` model

xarray.DataArray is xarray’s implementation of a labeled, multi-dimensional array. It has several key properties:

* `values`: a numpy.ndarray holding the array’s values
* `dims`: dimension names for each axis (e.g., ('x', 'y', 'z'))
* `coords`: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)
* `attrs`: dict to hold arbitrary metadata (attributes)


<div style="text-align: center">
    <img src="https://docs.xarray.dev/en/stable/_images/dataset-diagram.png" alt="xarray logo" style="display: inline-block; width:50%">
</div>

Source: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray

### Let's take a look at `tas`, which is an `xr.DataArray`

In [20]:
print(ds.tas) # or ds["tas"]

<xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
array([[[245.96303, 245.96303, ..., 245.96303, 245.96303],
        [247.19702, 247.1598 , ..., 247.2934 , 247.24402],
        ...,
        [237.34947, 237.35645, ..., 237.33876, 237.33958],
        [236.75006, 236.75006, ..., 236.75006, 236.75006]],

       [[234.33388, 234.33388, ..., 234.33388, 234.33388],
        [236.57535, 236.52422, ..., 236.6786 , 236.62816],
        ...,
        [232.13614, 232.15143, ..., 232.1143 , 232.12619],
        [231.81165, 231.81165, ..., 231.81165, 231.81165]],

       ...,

       [[234.71193, 234.71193, ..., 234.71193, 234.71193],
        [237.06593, 237.02936, ..., 237.1341 , 237.10274],
        ...,
        [258.94232, 259.02502, ..., 258.7631 , 258.84378],
        [256.24075, 256.24075, ..., 256.24075, 256.24075]],

       [[244.1595 , 244.1595 , ..., 244.1595 , 244.1595 ],
        [244.95708, 244.92542, ..., 245.024  , 244.99197],
        ...,
        [248.52098, 248.5366 , ..., 248.42839

## Resources for Learning Xarray
* Now that you have a general sense of xarray data models, you can apply the numerous xarray APIs on these objects to perform analysis work.

* Here are some highly recommended resources:

  * [Xarray Documentation](https://docs.xarray.dev/en/stable/index.html)
  * [Xarray API Reference](https://docs.xarray.dev/en/stable/api.html)
  * [Tutorial: "Xarray in 45 minutes"](https://tutorial.xarray.dev/overview/xarray-in-45-min.html#) 

## So how does xCDAT fit in the Xarray Ecosystem?

_"Xarray is designed as a general purpose library, and hence tries to avoid including overly domain specific functionality. But inevitably, the need for more domain specific logic arises."_

* The goal of xCDAT is to provide generalizable climate domain features and utilities for simple and robust analysis of climate data. 
* xCDAT's design philosophy is centered on reducing the overhead required from xarray users to accomplish specific tasks. 


## xCDAT Key Features

## xCDAT API Design
* Accessor classes
* Top-level modules

## A Demo of xCDAT Capabilities