# Introduction to `cf_xarray`

This notebook is a brief introduction to `cf_xarray`'s current capabilities.


In [None]:
import cf_xarray
import numpy as np
import xarray as xr

xr.set_options(display_style="text")  # work around issue 57

Lets read two datasets.


In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds.air.attrs["standard_name"] = "air_temperature"
ds

This one is inspired by POP model output and illustrates how the coordinates
attribute is interpreted. It also illustrates one way of tagging curvilinear
grids for convenient use of `cf_xarray`


In [None]:
pop = xr.Dataset()

# set 2D coordinate variables as latitude, longitude
pop.coords["TLONG"] = (
    ("nlat", "nlon"),
    np.ones((20, 30)),
    {"units": "degrees_east"},
)
pop.coords["TLAT"] = (
    ("nlat", "nlon"),
    2 * np.ones((20, 30)),
    {"units": "degrees_north"},
)
pop.coords["ULONG"] = (
    ("nlat", "nlon"),
    0.5 * np.ones((20, 30)),
    {"units": "degrees_east"},
)
pop.coords["ULAT"] = (
    ("nlat", "nlon"),
    2.5 * np.ones((20, 30)),
    {"units": "degrees_north"},
)

# set dimensions as X, Y
pop["nlon"] = ("nlon", np.arange(pop.sizes["nlon"]), {"axis": "X"})
pop["nlat"] = ("nlat", np.arange(pop.sizes["nlat"]), {"axis": "Y"})

# actual data vriables with coordinates attribute set
pop["UVEL"] = (
    ("nlat", "nlon"),
    np.ones((20, 30)) * 15,
    {"coordinates": "ULONG ULAT", "standard_name": "sea_water_x_velocity"},
)
pop["TEMP"] = (
    ("nlat", "nlon"),
    np.ones((20, 30)) * 15,
    {
        "coordinates": "TLONG TLAT",
        "standard_name": "sea_water_potential_temperature",
    },
)
pop

This synthetic dataset has multiple `X` and `Y` coords. An example would be
model output on a staggered grid.


In [None]:
multiple = xr.Dataset()
multiple.coords["x1"] = ("x1", range(30), {"axis": "X"})
multiple.coords["y1"] = ("y1", range(20), {"axis": "Y"})
multiple.coords["x2"] = ("x2", range(10), {"axis": "X"})
multiple.coords["y2"] = ("y2", range(5), {"axis": "Y"})

multiple["v1"] = (("x1", "y1"), np.ones((30, 20)) * 15)
multiple["v2"] = (("x2", "y2"), np.ones((10, 5)) * 15)
multiple

In [None]:
# This dataset has ancillary variables

anc = xr.Dataset()
anc["q"] = (
    ("x", "y"),
    np.random.randn(10, 20),
    dict(
        standard_name="specific_humidity",
        units="g/g",
        ancillary_variables="q_error_limit q_detection_limit",
    ),
)
anc["q_error_limit"] = (
    ("x", "y"),
    np.random.randn(10, 20),
    dict(standard_name="specific_humidity standard_error", units="g/g"),
)
anc["q_detection_limit"] = xr.DataArray(
    1e-3,
    attrs=dict(
        standard_name="specific_humidity detection_minimum", units="g/g"
    ),
)
anc

## What attributes have been discovered?


In [None]:
ds.lon

`ds.lon` has attributes `axis: X`. This means that `cf_xarray` can identify the
`'X'` axis as being represented by the `lon` variable.

It can also use the `standard_name` and `units` attributes to infer that `lon`
is "Longitude". To see variable names that `cf_xarray` can infer, use
`.cf.describe()`


In [None]:
ds.cf.describe()

For `pop`, only `latitude` and `longitude` are detected, not `X` or `Y`. Please
comment here: https://github.com/xarray-contrib/cf-xarray/issues/23 if you have
opinions about this behaviour.


In [None]:
pop.cf.describe()

For `multiple`, multiple `X` and `Y` coordinates are detected


In [None]:
multiple.cf.describe()

## Feature: Accessing coordinate variables

`.cf` implements `__getitem__` to allow easy access to coordinate and axis
variables.


In [None]:
ds.cf["X"]

Indexing with a scalar key raises an error if the key maps to multiple variables
names


In [None]:
multiple.cf["X"]

In [None]:
pop.cf["longitude"]

To get back all variables associated with that key, pass a single element list
instead.


In [None]:
multiple.cf[["X"]]

In [None]:
pop.cf[["longitude"]]

DataArrays return DataArrays


In [None]:
pop.UVEL.cf["longitude"]

`Dataset.cf[...]` returns a single `DataArray`, parsing the `coordinates`
attribute if present, so we correctly get the `TLONG` variable and not the
`ULONG` variable


In [None]:
pop.cf["TEMP"]

`Dataset.cf[...]` also interprets the `ancillary_variables` attribute. The
ancillary variables are returned as coordinates of a DataArray


In [None]:
anc.cf["q"]

## Feature: Accessing variables by standard names


In [None]:
pop.cf[["sea_water_potential_temperature", "UVEL"]]

In [None]:
anc.cf["specific_humidity"]

## Feature: Utility functions

There are some utility functions to allow use by downstream libraries


In [None]:
pop.cf.keys()

You can test for presence of these keys


In [None]:
"sea_water_x_velocity" in pop.cf

You can also get out the available Axis names


In [None]:
pop.cf.axes

or available Coordinate names. Same for cell measures (`.cf.cell_measures`) and
standard names (`.cf.standard_names`).


In [None]:
pop.cf.coordinates

**Note:** Although it is possible to assign additional coordinates and cell
measures, `.cf.coordinates` and `.cf.cell_measures` only return a subset of
`("longitude", "latitude", "vertical", "time")` and `("area", "volume")`,
respectively.


## Feature: Rewriting property dictionaries

`cf_xarray` will rewrite the `.sizes` and `.chunks` dictionaries so that one can
index by a special CF axis or coordinate name


In [None]:
ds.cf.sizes

Note the duplicate entries above:

1. One for `X`, `Y`, `T`
2. and one for `longitude`, `latitude` and `time`.

An error is raised if there are multiple `'X'` variables (for example)


In [None]:
multiple.cf.sizes

In [None]:
multiple.v1.cf.sizes

## Feature: Renaming coordinate variables

`cf_xarray` lets you rewrite coordinate variables in one dataset to like
variables in another dataset. This can only be done when a one-to-one mapping is
possible

In this example, `TLONG` and `TLAT` are renamed to `lon` and `lat` i.e. their
counterparts in `ds`. Note the the `coordinates` attribute is appropriately
changed.


In [None]:
pop.cf["TEMP"].cf.rename_like(ds)

## Feature: Rewriting arguments

`cf_xarray` can rewrite arguments for a large number of xarray functions. By
this I mean that instead of specifing say `dim="lon"`, you can pass `dim="X"` or
`dim="longitude"` and `cf_xarray` will rewrite that to `dim="lon"` based on the
attributes present in the dataset.

Here are a few examples


### Slicing


In [None]:
ds.air.cf.isel(T=1)

Slicing works will expand a single key like `X` to multiple dimensions if those
dimensions are tagged with `axis: X`


In [None]:
multiple.cf.isel(X=1, Y=1)

### Reductions


In [None]:
ds.air.cf.mean("X")

Expanding to multiple dimensions is also supported


In [None]:
# takes the mean along ["x1", "x2"]
multiple.cf.mean("X")

### Plotting


In [None]:
ds.air.cf.isel(time=1).cf.plot(x="X", y="Y")

In [None]:
ds.air.cf.isel(T=1, Y=[0, 1, 2]).cf.plot(x="longitude", hue="latitude")

`cf_xarray` can facet


In [None]:
seasonal = (
    ds.air.groupby("time.season")
    .mean()
    .reindex(season=["DJF", "MAM", "JJA", "SON"])
)
seasonal.cf.plot(x="longitude", y="latitude", col="season")

### Resample & groupby


In [None]:
ds.cf.resample(T="D").mean()

`cf_xarray` also understands the "datetime accessor" syntax for groupby


In [None]:
ds.cf.groupby("T.month").mean("longitude")

### Rolling & coarsen


In [None]:
ds.cf.rolling(X=5).mean()

`coarsen` works but everything later will break because of xarray bug
https://github.com/pydata/xarray/issues/4120

`ds.isel(lon=slice(50)).cf.coarsen(Y=5, X=10).mean()`


## Feature: mix "special names" and variable names


In [None]:
ds.cf.groupby("T.month").mean(["lat", "X"])

## Feature: Weight by Cell Measures

`cf_xarray` can weight by cell measure variables `"area"` and `"volume"` if the
appropriate attribute is set


In [None]:
# Lets make some weights (not sure if this is right)
ds.coords["cell_area"] = (
    np.cos(ds.air.cf["latitude"] * np.pi / 180)
    * xr.ones_like(ds.air.cf["longitude"])
    * 105e3
    * 110e3
)
# and set proper attributes
ds.air.attrs["cell_measures"] = "area: cell_area"

In [None]:
ds.air.cf.weighted("area").mean(["latitude", "time"]).cf.plot(x="longitude")
ds.air.mean(["lat", "time"]).cf.plot(x="longitude")

## Feature: Cell boundaries and vertices

`cf_xarray` can infer cell boundaries (for rectilinear grids) and convert
CF-standard bounds variables to vertices.


In [None]:
ds_bnds = ds.cf.add_bounds(["lat", "lon"])
ds_bnds

In [None]:
# We can convert each bounds variable independently with the helper:
import cf_xarray as cfxr

lat_bounds = ds_bnds.cf.get_bounds("latitude")

lat_vertices = cfxr.bounds_to_vertices(lat_bounds, bounds_dim="bounds")
lat_vertices

In [None]:
# Or we can convert _all_ bounds variables on a dataset
ds_crns = ds_bnds.cf.bounds_to_vertices()
ds_crns