# gridded_data_tutorial
## Notebook 2
Waterhackweek 2020
Steven Pestana (spestana@uw.edu)
***

### xarray

The [xarray](https://xarray.pydata.org/) library allows us to read, manipulate, and create **labeled** multi-dimensional arrays and datasets, such as NetCDF files.



In [1]:
import xarray as xr

---
#### DataArrays
Similar to the `numpy.ndarray` object, the `xarray.DataArray` is a multi-dimensional array, with the addition of labeled dimensions, coordinates, and other metadata. A DataArray contains the following: 
* `values` which store the actual data values in a `numpy.ndarray`
* `dims` are the names for each dimension of the `values` array
* `coords` are arrays of labels for each point
* `attrs` is a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) that can contain additional metadata

Let's create some fake streamflow data for two locations to see how these different parts work together to form a DataArray. I'm going to use a NumPy function to generate some random numbers that are [normally distributed](https://numpy.org/devdocs/reference/random/generated/numpy.random.normal.html).

In [2]:
import numpy as np
import pandas as pd

In [3]:
# randomly generated annual peak streamflow data for three locations
means = [1000, 1400, 900]
standard_deviations = [75, 25, 50] 
samples = (100, 3)
peak_flows = np.random.normal(means, standard_deviations, samples)

`peak_flows` will be the `values` within the DataArray. It is a two-dimensional array, and we've given it a shape of 100x3. The two dimensions will need names (`dims`) and labels (`coords`):

In [4]:
# We can call our two dimensions time, and location corresponding to the dimensions with lengths 100 and 3 respectively
dimensions = ['time', 'location']

# We can create coordinates for each of these dimensions now, first starting with 100 years
times = pd.date_range('1920', periods=100, freq ='1Y')

# Now create the 3 location coordinates
locations = ['stream_gage_1', 'stream_gage_2', 'stream_gage_3']

Finally we can add additional metadata with a dictionary of attributes

In [5]:
metadata = {'units': 'cfs'}

Now that we have all the individual components of an xarray DataArray, we can create it

In [6]:
streamflows = xr.DataArray( peak_flows, coords=[times, locations], dims=dimensions, attrs=metadata)

In [7]:
print(streamflows)

<xarray.DataArray (time: 100, location: 3)>
array([[1123.19752051, 1414.04594808,  882.44324073],
       [1066.23073385, 1385.66419872,  934.26305981],
       [1031.09640144, 1377.43468574,  943.81916925],
       [1038.67501424, 1423.0442246 ,  929.89624489],
       [ 995.68082827, 1381.67795479,  912.94969588],
       [ 947.69859625, 1373.65120682,  786.83608843],
       [ 961.90041982, 1367.16983394,  930.96421909],
       [ 976.72207517, 1407.72711207,  971.61698371],
       [1110.13231249, 1428.66897028,  934.79684877],
       [ 944.37080057, 1441.71099671,  939.17479451],
       [ 951.56698795, 1424.23373227,  847.67218779],
       [ 992.57462141, 1401.15288982,  855.20574412],
       [ 993.2243375 , 1413.75964515,  897.95681447],
       [1040.10475581, 1383.32203644,  982.93808805],
       [ 966.39113157, 1390.89611373,  945.52019244],
       [1068.77985336, 1373.26651812,  735.70941063],
       [ 924.31771482, 1452.71938383,  978.66752503],
       [ 987.07045217, 1368.6620468 , 

---
#### Datasets

Create another DataArray with annual min/max water temperature...

Then combine and create a Dataset

Plot the fake data we just made