# Quality Control

This notebook is for validating the drought indices dataset produced using the `scripts/process.py` script.

In [36]:
import numpy as np
import xarray as xr
import pandas as pd
from config import INDICES_DIR, DOWNLOAD_DIR, CLIM_DIR
import luts

## Index validation

For each drought index, re-compute a value manually and compare with the indices dataset.

We will be working with the climatologies, the downloaded ERA5 data, and of course the computed indices data. Set up connections to these datasets.

Indices dataset:

In [3]:
intervals = pd.Index([1, 7, 30, 60, 90, 180, 365], name="interval")
fps = [INDICES_DIR.joinpath(f"nws_drought_indices_{i}day.nc") for i in intervals]
indices_ds = xr.open_mfdataset(fps, combine="nested", concat_dim=[intervals])

Define some fixed variables that will be used throughout:

In [47]:
# our most recent day of available data (arbitrary file)
with xr.open_dataset(DOWNLOAD_DIR.joinpath(f"total_precipitation_current_month.nc")) as ds:
    ref_date = ds.time.dt.date.values[-1]

Define a function to help with extracting data from grid cells in ERA5 downloads, since we will be doing this for every index:

In [121]:
def extract_era5(varname, time_slice, lat, lon):
    """Function to open the three ERA5 datasets for a given variable name and extract the data from a grid cell for a given point location"""
    da_list = []
    latlon_sel_di = {"latitude": lat, "longitude": lon}
    for fp in DOWNLOAD_DIR.glob(f"{luts.varname_prefix_lu[varname]}*.nc"):
        with xr.open_dataset(
            fp
        ) as ds:
            if "expver" in ds.dims:
                # if expver is present, combine from both into a single dataset and drop it
                da = xr.merge([
                    ds[varname].sel(
                        latlon_sel_di, method="nearest"
                    ).sel(expver=1).drop("expver"),
                    ds[varname].sel(
                        latlon_sel_di, method="nearest"
                    ).sel(expver=5).drop("expver")
                ])[varname].sel(time=time_slice)

            else:
                da = ds[varname].sel(
                    latlon_sel_di, method="nearest"
                ).sel(time=time_slice)

            da_list.append(da)
            
    out_da = xr.concat(da_list, dim="time").sortby("time")
    return out_da

    
def get_time_slice(ref_date, interval):
    start_date = ref_date - pd.to_timedelta(f"{interval - 1} day")
    return slice(str(start_date), str(ref_date))

Now work through each index and test the existing values against newly processed ones.

#### Total precip

Total precip should be the sum of the precip values over the specified interval.

In [52]:
varname = "tp"
interval = 30
lat, lon = 65, -148
time_slice = get_time_slice(ref_date, interval)

test = indices_ds["tp"].sel(interval=interval).sel(
    latitude=lat, longitude=lon, method="nearest"
).compute()
raw = extract_era5("tp", time_slice, lat, lon)

# convert m to cm
assert (raw.sum() * 100).astype("float32") == test