# xCDAT: Xarray Climate Data Analysis Tools

_A package for robust and simple climate data analysis._

Tom Vo, Stephen Po-Chedley, Jason Boutte, Jill Zhang, Jiwoo Lee

This work is performed under the auspices of the U. S. DOE by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344.

## What is xCDAT?

* xCDAT is a lightweight extension of xarray for climate data analysis on structured grids 
* Jointly developed by scientists and developers from E3SM and PCMDI
    * Performed for the E3SM and SEATS (Simplifying ESM Analysis Through Standards) projects


## Xarray, the core of xCDAT


Historical Context:

* Xarray is an evolution of an internal tool developed at The Climate Corporation
* Released as open source in May 2014
* NumFocus fiscally sponsored project since August 2018

Features:

* “N-D labeled arrays and datasets in Python”
    * Based on NumPy, heavily inspired by Pandas
    * Supports netCDF, Iris, OPeNDAP, Zarr, and more
- Pure Python with (optional) compiled dependencies (NumPy, Pandas, NetCDF, ...)
* Some features include: 
    * File I/O, indexing and selecting, interpolating, grouping, aggregating, parallelism (Dask), plotting (matplotlib wrapper)
 
<div>
<img src="https://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png" alt="xarray logo" style="float:left; width:49%;">
   <img src="https://xarray.dev/NumFOCUS_sponsored_project_logo.svg" alt="NumFOCUS logo" style="float:right; width:49%;">
</div>


## The Xarray Data Models in a Nutshell

"Xarray data models are borrowed from netCDF file format, which provides xarray with a natural and portable serialization format."

* `xarray.DataArray`
  * A class that attaches dimension names, coordinates, and attributes to multi-dimensional arrays
* `xarray.Dataset`
  * Dictionary-like containers of DataArrays, mapping a variable name to each DataArray


Resources: 
* https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html#core-data-structures
* 

## Let's open up a NetCDF4 dataset in Xarray!

In [2]:
import xarray as xr

filepath = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

ds = xr.open_dataset(filepath)

In [5]:
print(ds)

<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    height     float64 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 1850-01-01 1850-02-01 ... 2015-01-01
    lat_bnds   (lat, bnds) float64 -90.0 -89.38 -89.38 ... 89.38 89.38 90.0
    lon_bnds   (lon, bnds) float64 -0.9375 0.9375 0.9375 ... 357.2 357.2 359.1
    tas        (time, lat, lon) float32 ...
Attributes: (12/49)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           87658.0
    creation_date:                   2020-06-05T04:06:11Z
    ...  

## The variable `tas` (an `xr.DataArray`) is found in `ds.data_vars`.

In [6]:
print(ds.tas) # or ds["tas"]

<xarray.DataArray 'tas' (time: 1980, lat: 145, lon: 192)>
[55123200 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T12:00:00
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon      (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
    height   float64 2.0
Attributes:
    standard_name:  air_temperature
    long_name:      Near-Surface Air Temperature
    comment:        near-surface (usually, 2 meter) air temperature
    units:          K
    cell_methods:   area: time: mean
    cell_measures:  area: areacella
    history:        2020-06-05T04:06:10Z altered by CMOR: Treated scalar dime...
    _ChunkSizes:    [  1 145 192]


Now that you have a general sense of xarray data models, you can apply the numerous xarray APIs on these objects to perform analysis work.

## xCDAT adds convenient APIs on top of the xarray 