# Tutorial

Use data from [ERA5-Land](https://www.ecmwf.int/en/era5-land) to examine time scales of variability in surface variables at the grid point closest to HU Beltsville.

This tutorial is part of [zmoon92/hu-pbl-workshop-2020/tree/master/python-tutorial](https://github.com/zmoon92/hu-pbl-workshop-2020/tree/master/python-tutorial).

In [1]:
import calendar

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from ipywidgets import interact
import matplotlib.pyplot as plt
import numpy as np
import pvlib
import xarray as xr

In [2]:
%matplotlib notebook

plt.rcParams.update({
    "figure.autolayout": True,
    "axes.xmargin": 0,
})

# define some defaults
xrp = {
    "size": 5.2,  # height; if we pass it to xarray plot methods, it will create a new fig
    "aspect": 1.6,  # aspect * size = width
}
figsize = (xrp["size"]*xrp["aspect"], xrp["size"])  # width, height

## Pre

### Load the data

According to [the GRUAN page](https://www.gruan.org/network/sites/beltsville), we want to the point closest to 39.0542 °N, 76.8775 °W. In ERA5-Land, which has a resolution of 0.1°, this turns out to be the point 39.1, -76.9. I have already extracted a few years of data for that point to the file `era5-land_bel.nc`. This file is available if you are running the notebook on the MyBinder for the repo (or have cloned the repo). Otherwise, you will need to download it.

Examine the data set.

You can interact with its fancy HTML representation, which will show up if you have xarray [v0.15.1+](http://xarray.pydata.org/en/stable/whats-new.html#v0-15-1-23-mar-2020).

In [None]:
ds = xr.open_dataset("era5-land_bel.nc")
ds

Examine a variable.

In [None]:
ds.t2m

### Sample time series plots

In [None]:
ds.u10.plot.line(**xrp);

Notice how xarray does all of the labeling for us, including units and descriptive name for the variable being plotted. This is possible because the data set we have loaded follows the [CF Conventions](https://cfconventions.org/), specifying `units` and `long_name` attributes for each variable.

In [None]:
ds.t2m.plot.line(**xrp, lw=0.5, alpha=0.3, label="original hourly data")

# add some moving averages (the time resolution is 1 hour)
ds.t2m.rolling(time=12, center=True).mean().plot(lw=1.0, alpha=0.5, label="1 day moving average")
ds.t2m.rolling(time=12*30, center=True).mean().plot(lw=1.2, alpha=0.8, label="30 day moving average")
ds.t2m.rolling(time=12*90, center=True).mean().plot(lw=1.8, alpha=1.0, label="90 day moving average")
plt.legend();

Relationship between the sensible heat flux and near-surface air temperature?

In [None]:
ds.plot.scatter(x="sshf", y="t2m", marker=".", alpha=0.5, edgecolors="none", **xrp);

How about at midday in the summer?

In [None]:
h = ds.time.dt.hour
mo = ds.time.dt.month
ds.where(
    (h >= 14) & (h <= 18) & (mo >= 6) & (mo <= 8)
).plot.scatter(x="sshf", y="t2m", marker=".", alpha=0.5, edgecolors="none", **xrp);

## Tutorial problems

### Compare variables to Sun position

Which have the strongest correlation?

We can compute Sun position with [pvlib](https://pvlib-python.readthedocs.io/en/stable/). [Astropy](https://www.astropy.org/) could also be used for this.

In [None]:
sun_pos = pvlib.solarposition.get_solarposition(ds.time.values, ds.latitude.values, ds.longitude.values)
sun_pos

Add some of the Sun position variables to our data set.

In [None]:
zen_deg = sun_pos["zenith"]
zen = zen_deg.apply(np.deg2rad)
ds["sza"] = ("time", zen_deg, {"long_name": "solar zenith angle", "units": "deg"})
ds["mu"] = ("time", zen.apply(np.cos), {"long_name": "cos(sza)", "units": ""})
ds["selev"] = ("time", sun_pos["elevation"], {"long_name": "solar elevation angle", "units": "deg"})

In [None]:
day = ds.time.where(ds.selev > 10).dropna("time")

ds.sel(time=day).plot.scatter(x="sza", y="t2m", marker=".", alpha=0.5, edgecolors="none", **xrp);

In [None]:
# Explore the data, both Sun position and ERA5 surface variables
...

### Compare SZA and near-surface temperature

It is well-known that the warmest temperatures generally lag the minimum solar zenith angle (SZA), on both seasonal and hourly (diurnal cycle) time scales.

Find what that lag is.
* seasonal: for each year find the lag between the yearly minimum SZA and daily maximum temperatue
* diurnal: ...

In [None]:
# for each year, calculate the two times and substract them
...

### Remove seasonal cycle

This is often done in statistical analyses so that the seasonal cycle doesn't dominate the signal when looking for relationships.

A common method is to represent the seasonal cycle using a number of harmonics, e.g., 10.

Examine the estimated seasonal cycles of different variables and the time series with seasonal cycles removed, using the function provided below to calculate the seasonal cycles.

In [None]:
# calculate t_year (here hour-of-year, cf. day-of-year, which the .dt accessors provide) so we can group by it
# I would like to know a better (more elegant) way to do this...
t_year = ds.time - np.r_[[np.datetime64(f"{x}", "D") for x in ds.time.dt.year.values]]
ds["t_in_year"] = ("time", t_year, {"long_name": "time-in-year"})
t_year_leap = ds["t_in_year"].where(ds.time.dt.year == 2016, drop=True).drop(["latitude", "longitude"], drop=True)
# decDOY instead??

In [None]:
from scipy.fftpack import rfft, irfft
import pandas as pd

def compute_sc(da, n_harm=10):
    "Compute the mean seasonal cycle, given an xr.DataArray."
    x = da.groupby(ds.t_in_year).mean().values  # group by hour in year
    #x = da.values
    z = rfft(x)
    z[n_harm:] = 0
    return xr.DataArray(
        name=f"sc_{da.name}", data=irfft(z), dims="doy", coords=[t_year_leap.dt.days + t_year_leap.dt.seconds / (24*3600)],
        attrs={"long_name": f"mean seasonal cycle in {da.attrs['long_name']}", "units": da.attrs["units"]},
    )

sc = compute_sc(ds.t2m, 5)  # see how changing the number impacts the results
    
sc.plot(size=3, aspect=1.5)


t2m = ds.t2m
doy = t2m.time.dt.dayofyear
year = t2m.time.dt.year
t2m_2 = t2m.groupby(year).map(lambda x: x - sc[0:x.size].values)  # allow for leap years
t2m_2.attrs = {k: v for k, v in t2m.attrs.items()}  # hack to put the attrs back (I think they are lost because sc doesn't have)

t2m_2.plot(size=3, aspect=2.0);

## Extras

### Map plot

We plot data from ERA5 (not ERA5-Land) in a 20x20 degree box around the BEL location. The data are mean (over 1979--2019) monthly, thus, they show climatology.

In [None]:
ds2 = xr.open_dataset("era5_bel-box_months_1979-2019.nc")
ds2

Quickly see what the data looks like.

In [None]:
plt.figure(); plt.imshow(ds2.t2m.sel(month=1));

Now, plot on map using Cartopy.

In [None]:
variable = "t2m"  # change here to see the maps for another variable

lat = ds2.latitude
da = ds2[variable].where((lat >= 35) & (lat <= 45), drop=True)  # cut off a bit

# same contour levels for all months. change `30` to get a different number of levels
l, h = np.floor(da.min().values), np.ceil(da.max().values); step = np.ceil((h-l)/30)
levels = np.arange(l, h+step, step)

fig = plt.figure(figsize=figsize)

def plot_month(imonth):
    fig.clear()
    ax = plt.axes(projection=ccrs.Mercator())

    ax.add_feature(cfeature.LAND)
    ax.add_feature(cfeature.STATES)  # only US, requires sufficiently recent Cartopy

    gl = ax.gridlines(draw_labels=True)
    #gl.xlabels_top = False; gl.ylabels_right = False  # now deprecated
    gl.top_labels = False; gl.right_labels = False  # preferred syntax
    
    data = da.sel(month=imonth)
    data.plot.contourf(x="longitude", y="latitude", levels=levels, ax=ax, transform=ccrs.PlateCarree())
    
    ax.set_title(f"{ax.get_title()} ({calendar.month_name[imonth]})")

    
interact(plot_month, imonth=(1, 12, 1));