### Tutorial 1&2: Understanding DataArray and Dataset objects and manipulations

Xarray is an open-source package developed especially for analyzing climate datasets. There are two data "types" here:

1. DataArray: This is a N-D (3D) array that can have metadata such as description about the contents of the array, such as names/descriptions etc. The dimensions also have names assigned to them.

2. Dataset: Dataset is like a pandas dataframe where each column is a DataArray

[link](https://comptools.climatematch.io/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial1.html)

In [1]:
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import seaborn as sns

#### DataArray object

In [2]:
tempdat = 283 + np.random.randn(3,2,4,2)
# tempdat

In [3]:
temperature = xr.DataArray(tempdat, dims = ["date", "lat", "lon", "height"])
temperature

In [4]:
temperature.shape

(3, 2, 4, 2)

In [5]:
times = pd.date_range("2020-05-22", periods=temperature.shape[0])
lat = np.linspace(-45, 45, temperature.shape[1])
lon = np.linspace(-45, 45, temperature.shape[2])
height = np.linspace(10, 30, temperature.shape[3])

In [6]:
temperature = xr.DataArray(tempdat, coords=[times, lat, lon, height], 
                           dims =["date", "lat", "lon", "height"])
temperature.attrs["units"] = "Kelvin"
temperature.attrs["name"] = "temperature"
temperature

#### Manipulation of the dataset Array does NOT preserve attributes

In [7]:
temperature = temperature *2 #- 15
temperature

#### Dataset object
A dataset object is a container for DataArrays with similar coordinates. The coordinates don't have to be the same across DataArrays to construct a Dataset object (unlike what the tutorial says) but I imagine the utility of the dataset object is limited when the individual DataArrays are different in their coordinates

In [8]:
temperature.coords

Coordinates:
  * date     (date) datetime64[ns] 2020-05-22 2020-05-23 2020-05-24
  * lat      (lat) float64 -45.0 45.0
  * lon      (lon) float64 -45.0 -15.0 15.0 45.0
  * height   (height) float64 10.0 30.0

In [9]:
pressure = xr.DataArray(np.random.normal(1, 0.01, size=(temperature.shape)), 
                        coords = temperature.coords)
pressure.attrs["units"] = "atm"
pressure.attrs["name"] = "pressure"
pressure

In [10]:
info = xr.Dataset(data_vars = {"Temperature": temperature, "Pressure": pressure})
info

#### Can individual dataArrays in a Dataset have different coordinates and/or dimensions?

In [11]:
humidity = xr.DataArray(np.random.normal(50, 10, size=(3,2,3)))
info["humidity"] = humidity

In [12]:
info

#### Data selection and slicing

Numpy-like selection works but there are more advanced methods of selection using `sel()`, `interp()` and `loc()`

In [13]:
first_array = temperature[1,:,:]
first_array

In [16]:
first_array = temperature.sel(date="2020-05-23")
first_array

In [19]:
interpolated_temp = temperature.sel(date="2020-05-24", method="nearest")
interpolated_temp

In [28]:
temperature.sel(date=slice("2020-05-22", "2020-05-24"), 
                lon=slice(-40,4),
                lat = slice(-20,10)
               )

In [29]:
temperature.sel(lon=slice(-40,4),
                date=slice("2020-05-22", "2020-05-24"), 
                
               )

Same thing can be done with loc but without specifying the names and has to be in the right order

In [31]:
temperature.loc["2020-05-22":"2020-05-24", -40:40]