<img src="http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png" align="right" width="30%">

# Working with labeled data

Learning goals:

- Use different forms of indexing to select data based on position and
  coordinates
- Select datatime ranges
- Interpolate data to new coordinates

## Named dimensions

As mentioned in the previous session, labeled dimensions really help to make the
code less difficult to understand. Compare pure `numpy` indexing:


In [None]:
import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(0)

In [None]:
# axis0: x, axis1: y
np_array = np.random.randn(3, 4)
np_array[1, 3]

and slicing:


In [None]:
np_array[:2, 1:]

with label based indexing:


In [None]:
arr = xr.DataArray(np_array, dims=("x", "y"))
arr.isel(x=1, y=3)

This is the same as


In [None]:
arr[{"x": 1, "y": 1}]

Due to the language syntax, slices have to be constructed manually:


In [None]:
ds = xr.Dataset(
    {
        "a": (("x", "y"), np.random.randn(3, 4)),
        "b": (("x", "y"), np.random.randn(3, 4)),
    }
)
ds.isel(x=slice(None, 2), y=slice(1, None))

We can also use these names to peek at the data if the automatic preview is not
enough:


In [None]:
ds.head(x=2, y=3)

see also `tail` and `thin`.


## Coordinate labels and label based indexing


xarray objects become much more interesting when adding coordinate labels:


In [None]:
arr = xr.DataArray(
    np.random.randn(4, 6),
    dims=("x", "y"),
    coords={
        "x": [-3.2, 2.1, 5.3, 6.5],
        "y": pd.date_range("2009-01-05", periods=6, freq="M"),
    },
)
arr

To select data by coordinate labels instead of integer indices we can use the
same syntax, using `sel` instead of `isel`:


In [None]:
arr.sel(x=5.3, y="2009-04-30")  # or a.loc[{"x": 5.3, "y": "2009-04-30"}]

this will require us to specify exact values. If we don't have those, we can use
the `method` parameter (see `Dataset.sel` for documentation):


In [None]:
arr.sel(x=4, y="2009-04-01", method="nearest")

We can also select multiple values:


In [None]:
arr.sel(x=[-3.2, 6.5], y=slice("2009-02-28", "2009-05-31"))

If instead of selecting data we want to drop it, we can use `drop_sel`:


In [None]:
arr.drop_sel(x=[-3.2, 5.3])

### Exercises


In [None]:
ds = xr.tutorial.open_dataset("air_temperature")
ds

1. Select the first 30 entries of latitude and 20th to 40th entries of longitude


In [None]:
# your code here

2. Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013


In [None]:
# your code here

3. Remove all entries at 260 and 270 degrees


In [None]:
# your code here