# XArray Basics

This notebook exists to demonstrate some of the basic features and capabilities that Xarray provides out-of-box. It also contains links to documentation and further reading that digs into some of the more advanced features. This notebook may be expanded in the future to provide guidance on some of those features as well


## Why Use Xarray?

[From the article](http://xarray.pydata.org/en/stable/why-xarray.html):

> Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called “tensors”) are an essential part of computational science. They are encountered in a wide range of fields, including physics, astronomy, geoscience, bioinformatics, engineering, finance, and deep learning. In Python, NumPy provides the fundamental data structure and API for working with raw ND arrays. However, real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc.

> Xarray doesn’t just keep track of labels on arrays – it uses them to provide a powerful and concise interface.

There are several other existing libraries that provide functionality similar to xarray (netCDF4-python, iris, UV-CDAT), but Xarray shines through by building on some of the best features of numpy and pandas to create objects that allow you to build arbitrarily-multidimensional data objects rich in metadata. Xarray is performant, well-documented, and is widely adopted among many scientific communities. These are a few of the reasons why xarray has been chosen for use in this project.  

Also, integration with CF conventions, ease of working with files, automatic time conversion & other built-in methods, and a few built-in plotting routines.

# Examples

In this next section we will demonstrate / experiment with some of the basic features that Xarray offers.


## 1. Creating a simple dataset with UTF timestamp saved as a long data type

Creating a dataset from scratch with xarray is pretty easy and can be done in a number of ways. The following example shows just one way of doing it.

In [1]:
import datetime
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

In [2]:
# create dummy dataframe
base_time = 1498867200
times = [1498867200, 1498867260, 1498867320]
heights = [1, 2, 4]

# create dataset
ds_dict = {
    'time'  : xr.DataArray(
                data   = np.array([1498867200, 1498867260, 1498867320], np.int64),
                dims   = ['time'],
                attrs  = {
                    'units'    : 'seconds since 1970-01-01T00:00:00'
                    }
                ),
    'height'  : xr.DataArray(
                data   = np.array([1000, 2000, 4000], np.float32),
                dims   = ['height'],
                attrs  = {
                    'units'    : 'millimeters'
                    }
                ),
    'SWdown': xr.DataArray(
                data   = np.array([[1, 2, 5], [3, 4, 7], [5, -9999, -9999]], np.float32),
                dims   = ['time', 'height'],
                attrs  = {
                    '_FillValue': -9999,
                    'data_type' : 'float',
                    'long_name' : 'Shortwave Downwelling Radiation',
                    'units'     : 'W/m2',
                    'comment'   : 'Short-Wave Downwelling Radiation measured at ground level. Short-wave radiation (visible light) comes from the sun and contains much more energy than Long-wave radiation.'
                    }
                ),
    'LWdown': xr.DataArray(
                data   = np.random.random(3),   # enter data here
                dims   = ['time'],
                attrs  = {
                    '_FillValue': -9999,
                    'data_type' : 'float',
                    'long_name' : 'Longwave Downwelling Radiation',
                    'units'     : 'W/m2'
                    }
                )
            }
ds = xr.Dataset(ds_dict, attrs = {'example_attr': 'this is a global attribute'})

ds

## Step 2: Convert the dataset over to standard time format

In [3]:
ds['time']

In [4]:
# One method to decode the raw data according to CF conventions
# is to use the xarray.decode_cf() method. This is pretty simple
# to use, as shown below. Note that the integer times get 
# converted to datetime objects and also that the -9999 values 
# in the SWdown data change to nan.
decoded = xr.decode_cf(ds)
decoded

In [5]:
# Unfortunately, the decode_cf method has a unfortunate downside; when 
# you use the decode_cf the dataset and its dataarrays get frozen, 
# making it difficult to set values. (It might be possible, but I 
# have not found a way to do this yet). Note how this 
decoded['LWdown'].data[0] = 50
decoded['LWdown'].data

array([0.82981255, 0.3269567 , 0.96875014])

In [6]:
# Fortunately, there is a much simpler alternative to decode_cf. 
# Xarray states in its documentation that it will automatically
# attempt to parse datasets according to CF conventions when writing
# or reading a netCDF file. We can simply save the dataset in a 
# temp file and reopen it to have XArray decode it.
ds.to_netcdf("./temp.nc")
new_ds = xr.open_dataset("./temp.nc")
new_ds

In [7]:
# With this new dataset we can set values, add attributes, etc.
new_ds["LWdown"].data[0] = -1.4132
new_ds["LWdown"].attrs['comment'] = "Long-Wave Downwelling Radiation measured at ground level. Long-wave radiation (infrared light) is the  and contains much more energy than Long-wave radiation."
new_ds["LWdown"]

In [8]:
# NOT WORKING: CURRENTLY GETTING IMPORT ERROR(S) FOR PARSE_CF
# A third alternative for parsing the raw data according to
# CF conventions is to use metpy. Metpy uses a dataset accessor
# to register a "metpy" namespace under datasets/dataarrays.
# One of the functions built in to this namespace is for 
# parsing the dataset according to CF conventions. See below:
import metpy  # Any import of metpy will register the accessors
met_ds = ds.metpy.parse_cf()
met_ds

OSError: dlopen(/Users/levi260/opt/anaconda3/lib/libgeos_c.dylib, 6): image not found

In [11]:
# Note that just by importing metpy in the cell above we gain
# access to a whole suite of attributes and methods under the 
# metpy attribute of a dataset or dataarray:
ds.metpy?

[0;31mType:[0m        MetPyDatasetAccessor
[0;31mString form:[0m <metpy.xarray.MetPyDatasetAccessor object at 0x7fcd8ab87970>
[0;31mFile:[0m        ~/opt/anaconda3/envs/mhkit_act_env/lib/python3.8/site-packages/metpy/xarray.py
[0;31mDocstring:[0m  
Provide custom attributes and methods on XArray Datasets for MetPy functionality.

This accessor provides parsing of CF metadata and unit-/coordinate-type-aware selection.

    >>> import xarray as xr
    >>> from metpy.cbook import get_test_data
    >>> ds = xr.open_dataset(get_test_data('narr_example.nc', False)).metpy.parse_cf()
    >>> print(ds['crs'].item())
    Projection: lambert_conformal_conic


In [7]:
ds = xr.open_dataset('http://thredds.cdip.ucsd.edu/thredds/dodsC/cdip/realtime/248p1_rt.nc')

In [8]:
ds

## Resources

Terminology: http://xarray.pydata.org/en/stable/terminology.html / 

Data Structures: http://xarray.pydata.org/en/stable/data-structures.html
* DataArray - an array-like structure with dimensions, attributes, data, and a host of methods
* Dataset - a dict-like collection of DataArray objects

Parsing times:
* Standard: http://xarray.pydata.org/en/stable/time-series.html
* Irregular calendars: http://xarray.pydata.org/en/stable/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range

File I/O:
* Individual files: http://xarray.pydata.org/en/stable/io.html
* Multi-file datasets: http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html

Testing: http://xarray.pydata.org/en/stable/api.html#testing