# Xarrays

Xarrays are multi-dimensional arrays ("tensors") that can have several attributes and dimensions. The core structure is ``DataArray``, the N dimensional array that is similar to a ``pandas.Series``. The second is the ``Dataset`` that is a multi-dimensional, in-memory array database. It is a dictionary like container of ``DataArray``, the equivalent to ``pandas.DataFrame``.

Xarrrays can be read from netCDF and from Zarr.

You will find plenty of useful tutorials from the Xarray project. This one is a summary [this tutorial](!https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html).


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline
%config InlineBackend.figure_format='retina'

In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds

In [None]:
ds["air"]

In [None]:
ds.air

In [None]:
with xr.set_options(display_style="html"):
    display(ds)

The DataArray has named dimension:

In [None]:
ds.air.dims

and the coordinates are saved  in ``.coord``:

In [None]:
ds.air.coords

DataArrays can save attributes

In [None]:
ds.air.attrs

Add new attributes

In [None]:
# assign your own attributes!
ds.air.attrs["who_is_awesome"] = "xarray"
ds.air.attrs

The underlying data is a numpy array

In [None]:
print(type(ds.air.data))
print(ds.air.data)


How to extract data:

* label-based indexing using ``.sel``

* position-based indexing using ``.isel``

In [None]:
ds.air.isel(time=1).plot(x="lon")

You would notice that the air temperature is in Kelvin. We can convert it to Celsius by removing 273.15 and changing the attributes ``units``.

In [None]:
ds2=ds
ds2['air']=ds['air']-273.15
ds2['air']['units']='degC'

We also want to show the longitudes in the west direction by removing 360$^\circ$.

In [None]:
ds2.coords["lon"]=ds2.coords["lon"]-360

Show the mean temperature

In [None]:
ds2.air.mean("time").plot()

In [None]:
ds2.sel(time="2013-05")

Select data between two dates and reduce the size of the Xarray

In [None]:
# demonstrate slicing
ds.sel(time=slice("2013-05", "2013-07"))

In [None]:
# "nearest indexing at multiple points"
ds.sel(lon=[240.125-360, 234-360], lat=[40.3, 50.3], method="nearest")

### High level computation

* groupby : Bin data in to groups and reduce

* resample : Groupby specialized for time axes. Either downsample or upsample your data.

* rolling : Operate on rolling windows of your data e.g. running mean

* coarsen : Downsample your data

* weighted : Weight your data before reducing

In [None]:
# seasonal groups
ds.groupby("time.season")

In [None]:
# make a seasonal mean
seasonal_mean = ds.groupby("time.season").mean()
seasonal_mean = seasonal_mean.sel(season=["DJF", "MAM", "JJA", "SON"])
seasonal_mean

In [None]:
# resample to monthly frequency
ds.resample(time="M").mean()

In [None]:
# facet the seasonal_mean
seasonal_mean.air.plot(col="season")

We can save Xarrays in to NetCDF and Zarr files

In [None]:
# write to netCDF
%timeit ds.to_netcdf("my-example-dataset.nc")
!ls -lh my-example-dataset.nc

In [None]:

%timeit ds.to_zarr(store="./my-example-dataset.zarr",mode="w")
!du -sh ./my-example-dataset.zarr