# Xhistogram Tutorial

Histograms are the foundation of many forms of data analysis.
The goal of xhistogram is to make it easy to calculate weighted histograms in multiple dimensions over n-dimensional arrays, with control over the axes.
Xhistogram builds on top of xarray, for automatic coordiantes and labels, and dask, for parallel scalability.

## Toy Data

We start by showing an example with toy data. First we use xarray to create some random, normally distributed data.

### 1D Histogram

In [None]:
import xarray as xr
import numpy as np
%matplotlib inline

nt, nx = 100, 30
da = xr.DataArray(np.random.randn(nt, nx), dims=['time', 'x'],
                  name='foo') # all inputs need a name
display(da)
da.plot()

By default xhistogram operates on all dimensions of an array, just like numpy. However, it operates on xarray DataArrays, taking labels into account.

In [None]:
from xhistogram.xarray import histogram

bins = np.linspace(-4, 4, 20)
h = histogram(da, bins=[bins])
display(h)
h.plot()

**TODO:** 
- Bins needs to be a list; this is annoying, would be good to accept single items
- The `foo_bin` coordinate is the estimated bin center, not the bounds. We need to add the bounds to the coordinates, but we can as long as we are returning DataArray and not Dataset.

Both of the above need GitHub Issues

### Histogram over a single axis

In [None]:
h_x = histogram(da, bins=[bins], dim=['time'])
h_x.plot()

**TODO:**
  - Relax / explain requirement that dims is always a list

In [None]:
h_x.mean(dim='x').plot()

### Weighted Histogram

Weights can be the same shape as the input:

In [None]:
weights = 0.4 * xr.ones_like(da)
histogram(da, bins=[bins], weights=weights)

Or can use Xarray broadcasting:

In [None]:
weights = 0.2 * xr.ones_like(da.x)
histogram(da, bins=[bins], weights=weights)

## 2D Histogram

Now let's say we have multiple input arrays. We can calculate their joint distribution:

In [None]:
db = xr.DataArray(np.random.randn(nt, nx), dims=['time', 'x'],
                  name='bar') - 2

histogram(da, db, bins=[bins, bins]).plot()

## Real Data

TODO

## Dask Integration

Should just work, but need examples.