# gridded_data_tutorial
## Notebook 2
Waterhackweek 2020
Steven Pestana (spestana@uw.edu)
***

### xarray

The [xarray](https://xarray.pydata.org/) library allows us to read, manipulate, and create **labeled** multi-dimensional arrays and datasets, such as NetCDF files.



In [53]:
import xarray as xr

---
#### DataArrays
Similar to the `numpy.ndarray` object, the `xarray.DataArray` is a multi-dimensional array, with the addition of labeled dimensions, coordinates, and other metadata. A DataArray contains the following: 
* `values` which store the actual data values in a `numpy.ndarray`
* `dims` are the names for each dimension of the `values` array
* `coords` are arrays of labels for each point
* `attrs` is a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) that can contain additional metadata

Let's create some fake streamflow data for two locations to see how these different parts work together to form a DataArray.
(I'm going to use a NumPy function to generate some random numbers that are [normally distributed]((https://numpy.org/devdocs/reference/random/generated/numpy.random.normal.html)))

In [36]:
import numpy as np
import pandas as pd

In [50]:
# randomly generated annual peak streamflow data for three locations
means = [1000, 1400, 900]
standard_deviations = [75, 25, 50] 
samples = (100, 3)
peak_flows = np.random.normal(means, standard_deviations, samples)

`peak_flows` will be the `values` within the DataArray. It is a two-dimensional array, and we've given it a shape of 100x3. The two dimensions will need names (`dims`) and labels (`coords`):

In [51]:
# We can call our two dimensions time, and location corresponding to the dimensions with lengths 100 and 3 respectively
dimensions = ['time', 'location']

# We can create coordinates for each of these dimensions now, first starting with 100 years
times = pd.date_range('1920', periods=100, freq ='1Y')

# Now create the 3 location coordinates
locations = ['stream_gage_1', 'stream_gage_2', 'stream_gage_3']

Finally we can add additional metadata with a dictionary of attributes

In [55]:
metadata = {'units': 'cfs'}

Now that we have all the individual components of an xarray DataArray, we can create it

In [56]:
streamflows = xr.DataArray( peak_flows, coords=[times, locations], dims=dimensions, attrs=metadata)

In [58]:
print(streamflows)

<xarray.DataArray (time: 100, location: 3)>
array([[ 952.0906023 , 1431.06694294,  860.645647  ],
       [1021.89999011, 1370.85189963,  876.60173413],
       [1040.87680834, 1414.85241515,  894.03937292],
       [1031.58207913, 1395.1267001 ,  881.55772242],
       [ 807.59039052, 1380.09095905,  858.24817032],
       [ 997.85024057, 1387.96246905,  885.58570992],
       [1000.57896021, 1408.84022835,  950.00658203],
       [1077.16044773, 1409.30821636,  883.09317131],
       [1070.75946152, 1436.5050066 ,  888.89508896],
       [ 980.30864045, 1420.09782757,  879.12252561],
       [1023.67151065, 1382.50206716,  944.88053445],
       [ 974.92032132, 1393.17320554,  928.04696054],
       [ 950.65899954, 1431.23660939,  863.91453737],
       [1024.17277381, 1375.73180637,  957.19394685],
       [ 764.68262668, 1381.81182503,  871.21660707],
       [ 974.15475533, 1350.93266743,  931.60726393],
       [1089.73269916, 1364.04745516,  854.12311412],
       [1104.34783729, 1408.20070284, 

---
#### Datasets

Create another DataArray with annual min/max water temperature...

Then combine and create a Dataset

Plot the fake data we just made