<img src="http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png" align="right" width="30%">

# Computation with Xarray

In this lesson, we discuss how to do scientific computations with xarray
objects. Our learning goals are as follows. By the end of the lesson, we will be
able to:

- Apply basic arithmetic and numpy functions to xarray DataArrays / Dataset.
- Use Xarray's label-aware reduction operations (e.g. `mean`, `sum`) weighted
  reductions.
- Apply arbitrary functions to Xarray data via `apply_ufunc`.
- Use Xarray's broadcasting to compute on arrays of different dimensionality.

In [None]:
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt

## Example Dataset

First we load a dataset. We will use the
[NOAA Extended Reconstructed Sea Surface Temperature (ERSST) v5](https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v5)
product, a widely used and trusted gridded compilation of of historical data
going back to 1854.

Since the data is provided via an
[OPeNDAP](https://en.wikipedia.org/wiki/OPeNDAP) server, we can load it directly
without downloading anything:


In [None]:
### NOTE: If hundreds of people connect to this server at once and download the same dataset,
###       things might not go so well! Recommended to use the Google Cloud copy instead.

# url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc"
# # drop an unnecessary variable which complicates some operations
# ds = xr.open_dataset(url, drop_variables=["time_bnds"])
# # will take a minute or two to complete
# ds = ds.sel(time=slice("1960", "2018")).load()
# ds

In [None]:
import gcsfs

fs = gcsfs.GCSFileSystem(token="anon")
ds = xr.open_zarr(
    fs.get_mapper("gs://pangeo-noaa-ncei/noaa.ersst.v5.zarr"), consolidated=True
).load()
ds

Let's do some basic visualizations of the data, just to make sure it looks
reasonable.


In [None]:
ds.sst[0].plot(vmin=-2, vmax=30)

## Basic Arithmetic

Xarray dataarrays and datasets work seamlessly with arithmetic operators and
numpy array functions.

For example, imagine we want to convert the temperature (given in Celsius) to
Kelvin:


In [None]:
sst_kelvin = ds.sst + 273.15
sst_kelvin

The dimensions and coordinates were preserved following the operation.

<div class="alert alert-warning">
    <strong>Warning:</strong> Although many xarray datasets have a <code>units</code> attribute, which is used in plotting,
    Xarray does not inherently understand units. However, work is underway to integrate xarray
    with <a href="https://pint.readthedocs.io/en/0.12/">pint</a>, which will provide full unit-aware operations.
</div>

We can apply more complex functions, including numpy ufuncs, to Xarray objects.
Imagine we wanted to compute the following expression as a function of SST
($\Theta$) in Kelvin:

$$ f(\Theta) =  0.5 \ln(\Theta^2) $$


In [None]:
f = 0.5 * np.log(sst_kelvin**2)
f

## Applying Arbitrary Functions

It's awesome that we can call `np.log(ds)` and have it "just work". However, not
all third party libraries work this way.

In this example, we will use functions from the
[Gibbs Seawater Toolkit](https://teos-10.github.io/GSW-Python/), a package for
the thermodynamics of seawater. This package provides ufuncs that operate on
numpy arrays.


In [None]:
import gsw

# an example function
# http://www.teos-10.org/pubs/gsw/html/gsw_t90_from_t68.html
?gsw.t90_from_t68

In [None]:
gsw.t90_from_t68(ds.sst)  # -> returns a numpy array

It would be nice to keep our dimensions and coordinates. We can accomplish this
with `xr.apply_ufunc`.


In [None]:
xr.apply_ufunc(gsw.t90_from_t68, ds.sst)

<div class="alert alert-info">
    <strong>Note:</strong> <code>apply_ufunc</code> is a powerful and mysterious function.
    It has many options for doing more complicated things.
    Unfortunately, we don't have time to go into more depth here.
    Please consult the [Xarray docs](http://xarray.pydata.org/en/latest/generated/xarray.apply_ufunc.html) for more details.
</div>


## Reductions

Just like in numpy, we can reduce xarray DataArrays along any number of axes:


In [None]:
sst = ds.sst
sst.mean(axis=0)

In [None]:
sst.mean(axis=(1, 2))

In [None]:
sst.mean()

However, rather than performing reductions on axes (as in numpy), we can perform
them on dimensions. This turns out to be a huge convenience, particularly in
complex calculations when you can't easily remember which axis corresponds to
which dimension:


In [None]:
sst.mean(dim="time")

All of the standard numpy reductions (e.g. `min`, `max`, `sum`, `std`, etc.) are
available


### Exercise

Take the mean of `sst` in both longitude and latitude. Make a simple timeseries
plot:


In [None]:
# your code here

## Weighted Reductions

For this example, we will create a “weights” array proportional to cosine of latitude.
Modulo a normalization, this is the correct area-weighting factor for data on a regular lat-lon grid.


In [None]:
weights = np.cos(np.deg2rad(ds.lat))
weights.dims

Thanks to the automatic broadcasting and alignment discussed earlier, if we multiply this by SST, it “just works,” and the arrays are broadcasted properly:

In [None]:
(ds.sst * weights).dims

We could imagine computing the weighted spatial mean of SST manually.

In [None]:
sst_mean = (ds.sst * weights).sum(dim=("lon", "lat")) / weights.sum(dim="lat")
sst_mean.plot()
plt.title("This is wrong!")

That would be wrong, however, because the denominator (`weights.sum(dim='lat')`)
needs to be expanded to include the `lon` dimension and modified to account for
the missing values (land points).

In general, weighted reductions on multidimensional arrays are complicated. To
make it a bit easier, Xarray provides a mechanism for weighted reductions. It
does this by creating a special intermediate `DataArrayWeighted` object, to
which different reduction operations can applied.


In [None]:
sst_weighted = ds.sst.weighted(weights)
sst_weighted

In [None]:
sst_weighted.mean(dim=("lon", "lat")).plot()
plt.title("Correct Global Mean SST")