<img src="https://docs.xarray.dev/en/stable/_static/dataset-diagram-logo.png" align="right" width="30%">

# Working with labeled data

Learning goals:

- Use different forms of indexing to select data based on position and
  coordinates
- Select datetime ranges

Scientific data is inherently *labeled*. For example, time series data includes timestamps that label individual periods or points in time, spatial data has coordinates (e.g. longitude, latitude, elevation), and model or laboratory experiments are often identified by unique identifiers. In this notebook we'll see that labeled dimensions make code much easier to understand!

In [None]:
import numpy as np
import pandas as pd
import xarray as xr

We'll start by comparing common indexing operations with a `numpy` array and equivalent `xarray` DataArray:

In [None]:
# axis0: x, axis1: y
np_array = np.arange(10).reshape(2, 5)
np_array

In [None]:
da = xr.DataArray(np_array, dims=("x", "y"))
da

## Position-based indexing

### Indexing

Recall that *indexing* is selecting a value from an array based on its position

In [None]:
np_array[0, 3]

In [None]:
da.isel(x=0, y=3)  # or da[{"x": 0, "y": 3}]

### Slicing

And *slicing* retrieves a range of values

In [None]:
np_array[:2, 1:]

In [None]:
da.isel(x=slice(None, 2), y=slice(1, None))

## Label-based indexing


Remembering the axis order can be challenging even with 2D arrays (is np_array[0,3] the first row and third column *or first column and third row*? or did I store these samples by row or by column when I saved the data?!). The difficulty is compounded with added dimensions. Xarray objects eliminate much of the mental overhead by adding coordinate labels:

In [None]:
arr = xr.DataArray(
    data=np.arange(48).reshape(4, 2, 6),
    dims=("u", "v", "time"),
    coords={
        "u": [-3.2, 2.1, 5.3, 6.5],
        "v": [-1, 2.6],
        "time": pd.date_range("2009-01-05", periods=6, freq="M"),
    },
)
arr

To select data by coordinate **labels** instead of *integer indices* we can use the
same syntax, using `sel` instead of `isel`:


In [None]:
arr.sel(u=5.3, time="2009-04-30")  # or arr.loc[{"u": 5.3, "time": "2009-04-30"}]

this will require us to specify exact coordinate values. If we don't have those, we can use the `method` parameter (see `Dataset.sel` for documentation):

In [None]:
arr.sel(u=5, time="2009-04-28", method="nearest")

We can also select multiple values:


In [None]:
arr.sel(u=[-3.2, 6.5], time=slice("2009-02-28", "2009-05-31"))

If instead of selecting data we want to drop it, we can use `drop_sel`:


In [None]:
arr.drop_sel(u=[-3.2, 6.5])

### Exercises

Practice the syntax you've learned with the xarray tutorial dataset! 

In [None]:
ds = xr.tutorial.open_dataset("air_temperature")
ds

1. Select the first 30 entries of latitude and 20th to 40th entries of longitude


In [None]:
ds.isel(lat=slice(None, 30), lon=slice(20, 40))

2. Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013


In [None]:
ds.sel(lat=75, time=slice("2013-01-01", "2013-10-15"))

3. Remove all entries at 260 and 270 degrees

In [None]:
ds.drop_sel(lon=[260, 270])