In [None]:
import xarray as xr
import numpy as np
import pandas as pd

# Introduction

- Why xarray?
  - numpy arrays are not enough
  - names, labels, attributes
  - distinction between dimension coordinates and normal coordinates (this distinction might disappear in the future)

# Data structures

xarray mainly provides two types: `DataArray` and `Dataset`. The `DataArray` class attaches dimension names, coordinates and attributes to multi-dimensional arrays while `Dataset` combines multiple arrays.

Both classes are normally created by reading data, but to understand them let's first look at creating them programmatically.

## DataArray

- in-detail description
  - attach labels, name and attribute to array

structure:
- DataArray construction
  - data + dims
  - coords
  - attrs
  - name
 
- data: array-like
- coords: dict of str to array-like / DataArray
- dims: sequence (tuple / list) of hashable (mostly str)
- name: hashable (mostly str)
- attrs: dict (arbitrary dict)

**Todo**: dtypes?

To programmatically create a `DataArray`, we use its constructor:
```python
xr.DataArray([data, coords, dims, name, attrs])
```
To be useful, we need at least `data`, which can be anything with the interface of a `numpy` array (`numpy`, `dask`, `sparse` (WIP), `pint` (WIP), etc). As an example, let's create a `DataArray` with two dimensions from a `numpy` array:

In [None]:
da = xr.DataArray(np.ones((3, 4)), dims=("x", "y"), name="a")
da

**Todo**: explain the HTML/text repr in depth

The representation of the new array (its `repr`) consists of:
- the name of the `DataArray` (`'a'`). If we don't provide a name, this will be omitted.
- the dimensions of the array `(x: 3, y: 4)`: this tells us that the first dimension is named `x` and has a size of `3` while the second dimension is named `y` and has a size of `4`
- a preview of the data
- a list of coordinates
- a list of attributes

Since we didn't provide them, these dimensions don't have coordinates and there are no attributes. If we want to attach coordinates and/or attributes, we can do that with the `coords` and `attrs` parameters:

In [None]:
xr.DataArray(
    np.ones((3, 4)),
    dims=("x", "y"),
    coords={"x": ["a", "b", "c"], "y": np.arange(4), "u": ("x", np.arange(3), {"attr1": 0})},
    attrs={"attribute": "string", "flag": 1},
)

With the values passed to `coords`, we attached values to `x` and `y` and also created a non-dimension coordinate named `u` with the tuple syntax. That special syntax can be used as a shortcut and is roughly equivalent to
```python
data = ("x", np.arange(3), {"attr1": 0})
xr.DataArray(**dict(zip(["dims", "data", "attrs"], data)))
```
so we ca also add `attrs` to the coordinate. Note: using `{"y": np.arange(4)}` has the same result as `{"y": ("y", np.arange(4)}`

Since `attrs` is a normal python `dict`, there is no restriction on the keys / values. However, by convention big arrays should not be used as values. Instead, use coordinates or a data variable in a `Dataset`.

- construction and repr
  - numeric types (bool, int, float, complex)
  - strings
  - datetime / cftime
  - object

# Dataset

- collection of multiple dataarrays
- construction / repr