# Process yearly indicators from daily NCAR 12km BCSD dataset

Use this notebook to create a yearly base indicators dataset, which will compute the following indicators for each year using daily data:

* `hd`:  “Hot day” threshold -- the highest observed daily $T_{max}$ such that there are 5 other observations equal to or greater than this value.
* `cd`: “Cold day” threshold -- the lowest observed daily $T_{min}$ such that there are 5 other observations equal to or less than this value.
* `rx1day`: Maximum 1-day precipitation
* `su`: Summer Days –- Annual number of days with Tmax above 25 C
* `dw`: Deep Winter days –- Annual number of days with Tmin below -30 C
* `wsdi`: Warm Spell Duration Index -- Annual count of occurrences of at least 5 consecutive days with daily mean T above 90 th percentile of historical values for the date
* `cdsi`: Cold Spell Duration Index -- Same as WDSI, but for daily mean T below 10 th percentile
* `rx5day`: Maximum 5-day precipitation
* `r10mm`: Number of heavy precip days –- Annual count of days with precip > 10 mm
* `cwd`: Consecutive wet days –- Yearly number of the most consecutive days with precip > 1 mm
* `cdd`: Consecutive dry days –- Same as CED, but for days with precip < 1 mm

We will use functions from the `indicators.py` script to compute the indicators. Load the libs:

In [5]:
import time
from datetime import datetime
import numpy as np
import tqdm
import xarray as xr

# ignore all-nan slice warnings
import warnings
warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered')

from config import *
from indicators import *

We will be iterating over model and scenario, loading data, and processing. However, we want to load the Daymet data ahead of time for use in computing WSDI and CSDI indicators for all models and scenarios. Load that daymet dataset into a global variable, using a window of 1980 to 2010:

In [93]:
%%time
daymet_ds = xr.open_mfdataset([f"/atlas_scratch/cparr4/ncar_replacement_data/daymet/daymet_met_{year}.nc" for year in range(1980, 2010)])
daymet_ds = daymet_ds.load()
# drop underscore from units in tmin/tmax, for xclim to be happy
daymet_ds["tmin"].attrs["units"] = "degC"
daymet_ds["tmax"].attrs["units"] = "degC"

CPU times: user 39.6 s, sys: 7.99 s, total: 47.6 s
Wall time: 40.2 s


Iterate over the projected models and scenarios and compute the indicators:

In [95]:
%%time
results = []
for scenario in scenarios:
    for model in models:
        fps = [ncar_dir.joinpath(f"{model}_{scenario}_BCSD_met_{year}.nc") for year in range(2006, 2100)]
        results.append(run_compute_indicators(fps, scenario, model))

data for rcp45, CCSM4 loaded into memory
rx1day  done, rx5day  done, r10mm  done, cwd  done, cdd  done, hd  done, su  done, wsdi  done, cd  done, dw  done, csdi  done, 
data for rcp45, MRI-CGCM3 loaded into memory
rx1day  done, rx5day  done, r10mm  done, cwd  done, cdd  done, hd  done, su  done, wsdi  done, cd  done, dw  done, csdi  done, 
data for rcp85, CCSM4 loaded into memory
rx1day  done, rx5day  done, r10mm  done, cwd  done, cdd  done, hd  done, su  done, wsdi  done, cd  done, dw  done, csdi  done, 
data for rcp85, MRI-CGCM3 loaded into memory
rx1day  done, rx5day  done, r10mm  done, cwd  done, cdd  done, hd  done, su  done, wsdi  done, cd  done, dw  done, csdi  done, 
CPU times: user 1h 5min 40s, sys: 15min 33s, total: 1h 21min 13s
Wall time: 1h 19min 44s


Merge the individual DataArrays into a single dataset:

In [141]:
%time proj_indicators_ds = xr.merge([da for da_list in results for da in da_list])

CPU times: user 18.1 s, sys: 5.23 s, total: 23.4 s
Wall time: 23.4 s


Process applicable indicators for the historical era (using Daymet dataset):

In [136]:
%%time
summary_das = []
# note, WSDI and CSDI are have been dropped - no historical data
# TO-DO: add these back in, but just use the same time period!
for indicator in ["rx1day", "rx5day", "r10mm", "cwd", "cdd", "hd", "su", "cd", "dw"]:
    varname = indicator_varname_lu[indicator]
    summary_das.append(compute_indicator(daymet_ds[varname], indicator, "historical", "daymet"))
    print(indicator, "done", end=", ")

rx1day done, rx5day done, r10mm done, cwd done, cdd done, hd done, su done, cd done, dw done, CPU times: user 1min 48s, sys: 32.6 s, total: 2min 21s
Wall time: 2min 21s


And combine into a Dataset:

In [149]:
%time daymet_indicators_ds = xr.merge(summary_das)

CPU times: user 20.7 ms, sys: 21 µs, total: 20.7 ms
Wall time: 20 ms


Convert 0's (which are null values from some of the xclim indicators) to -9999. Do this for projected indicators:

In [143]:
def replace_nan(da):
    da.values[nan_mask] = -9999
    da.attrs["_FillValue"] = -9999
    return da

nan_mask = np.isnan(proj_indicators_ds["rx1day"])

for indicator in ["r10mm", "wsdi", "csdi", "cwd", "cdd", "su", "dw"]:
    proj_indicators_ds[indicator] = replace_nan(proj_indicators_ds[indicator]).astype(np.int32)

Then daymet indicators:

In [155]:
# setting different nanmask because different data cubes
nan_mask = np.isnan(daymet_indicators_ds["rx1day"])

for indicator in ["r10mm", "cwd", "cdd", "su", "dw"]:
    # um this array isn't writeable? never seen this before
    daymet_indicators_ds[indicator].values.setflags(write=1)
    daymet_indicators_ds[indicator] = replace_nan(daymet_indicators_ds[indicator]).astype(np.int32)

Then combine the projected and Daymet indicators Datasets together:

In [163]:
indicators_ds = xr.merge([daymet_indicators_ds, proj_indicators_ds])
del indicators_ds.attrs["units"]

Round certain indicators to reasonable precision:

In [170]:
for indicator in ["hd", "cd", "rx1day", "rx5day"]:
    indicators_ds[indicator] = np.round(indicators_ds[indicator], 1)

Add global metadata:

In [171]:
indicators_ds.attrs = {
    "creation_date": datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
}

Write to disk (might take a couple of minutes):

In [173]:
%time indicators_ds.to_netcdf(indicators_fp)

CPU times: user 88.6 ms, sys: 4.26 s, total: 4.35 s
Wall time: 12.7 s


done!