# Using the `ua-snap/downscale` repo for statistical downscaling of climate data

This notebook provides a step-by-step guide to using the [`downscale`](https://github.com/ua-snap/snap-geo) package conduct simple delta-downscaling of some CRU-TS 4.08 data.

This demonstration will be done with Chinook because that has beome our main processing system. 

## Setup

### Environment

You will want to spin up a compute node connect this notebook to it following the instructions in this [README](https://github.com/ua-snap/ardac-toolbox). The python code in the downscale repo needs to be accessible to this kernel, which can be achieved either by installing `downscale` as a package in the environment or by starting the jupyter server from within the base directory. Make sure the packages in the `requirements.txt` file are present in the environment. 

### Data

The CRU-TS data is available from the [CRU website](https://crudata.uea.ac.uk/cru/data/hrg/), part of the University of East Anglia. The files of interest have historically looked like `cru_ts<version>.<start year>.<end year>.<variable>.dat.nc.gz`. This repo has been used for the `tmp` and `pre` data. Download and unzip the data. 



Here's the imports we need:

In [1]:
from pathlib import Path
from functools import partial
import numpy as np
import rasterio as rio

from downscale import Baseline, Dataset, DeltaDownscale

Start by setting some variables to specify inputs:  
* path to dataset to downscale:

In [2]:
cru_fp = f"/beegfs/CMIP6/kmredilla/cru_ts/cru_ts4.08.1901.2023.pre.dat.nc"

* path to directory containing baseline monthly climatology files to use (PRISM, 1961-1990):

In [3]:
clim_dir = Path(f"/beegfs/CMIP6/kmredilla/prism/pr")

(Note, previous path for doing this processing with PRISM data is `/workspace/Shared/Tech_Projects/DeltaDownscaling/project_data/climatologies/prism`)

* number of cores to use for parallel processing:

In [4]:
ncpus = 24

* path to directory for output files:

In [5]:
var_id = "pr"
output_dir = Path(f"/beegfs/CMIP6/kmredilla/downscaled/CRU_TS408/{var_id}")
output_dir.mkdir(exist_ok=True, parents=True)

### Processing

Create a `Baseline` object from the baseline files:

In [6]:
baseline = Baseline(sorted(list(clim_dir.glob("*.tif"))))

Create a historical `Dataset` object from the CRU-TS dataset:

(does some grid rearranging in background, hence the elapsed time. Also prints an obscure "level" value)

In [7]:
# not sure if these value should ever change atm.
clim_begin = "01-1961"
clim_end = "12-1990"
# the following variables seem to be determined by the input file(s)
#  but still need to be specified for now
variable = "pre"
model = "ts408"
scenario = "historical"
project = "cru"
units = "mm"
metric = "total"

ds_args = (
    cru_fp,
    variable,
    model,
    scenario,
    project,
    units,
    metric,
)
ds_kwargs = {
    "method":"linear",
    "ncpus": ncpus,
}

# put on one line for timing purposes
%time historical = Dataset(*ds_args, **ds_kwargs)

4
CPU times: user 3.61 s, sys: 3.35 s, total: 6.96 s
Wall time: 12.8 s


Define some more options, and a function for rounding the outputs:

In [8]:
rounder = partial(np.around, decimals=0)
downscaling_operation = "add"
out_varname = var_id
# write anomalies as well
anom = True
# interpolate across NA's using a spline
interp = True
find_bounds = False
fix_clim = False
aoi_mask = None


def round_it(arr):
    return rounder(arr)

Read in a mask from one of the PRSIM climatology files:

In [9]:
with rio.open(baseline.filelist[0]) as src:
    mask = src.read_masks(1)

Create a `DeltaDownscale` class using the above objects and options:

In [10]:
# FOR CRU WE PASS THE interp=True so we interpolate across space first when creating the Dataset()
# ^^ note from a script

dd_kwargs = {
    "baseline": baseline,
    "clim_begin": clim_begin,
    "clim_end": clim_end,
    "historical": historical,
    "future": None,
    "downscaling_operation": downscaling_operation,
    "mask": mask,
    "mask_value": 0,
    "ncpus": ncpus,
    "src_crs": {"init": "epsg:4326"},
    "src_nodata": None,
    "dst_nodata": None,
    "post_downscale_function": round_it,
    "varname": out_varname,
    "modelname": None,
    "anom": anom,
    "interp": interp,
    "find_bounds": find_bounds,
    "fix_clim": fix_clim,
    "aoi_mask": aoi_mask,
}

%time cru = DeltaDownscale(**dd_kwargs)

running interpolation across NAs -- base resolution
processing interpolation to convex hull in parallel using 24 cpus.
ds interpolated updated into self.ds
CPU times: user 8.04 s, sys: 8.62 s, total: 16.7 s
Wall time: 2min 19s


Run the downscaling:

In [11]:
%time cru.downscale(output_dir=output_dir)

| 0.50, 0.00, 0.00|
| 0.00,-0.50, 90.00|
| 0.00, 0.00, 1.00|
CPU times: user 2.29 s, sys: 3.27 s, total: 5.56 s
Wall time: 1min 27s


PosixPath('/beegfs/CMIP6/kmredilla/downscaled/CRU_TS408/pr')

If you are updating a dataset such as the [2km CRU TS precipitation](https://catalog.snap.uaf.edu/geonetwork/srv/eng/catalog.search#/metadata/9eeef879-42ee-4bbe-a54e-435716ad0c90):

remove / copy out the `anom` folders in the output directories, 

```
cd output_dir
mv pr/anom pr_anom
```

rename the folder according to the existing convention

```
mv pr pr_AK_CAN_2km_CRU_TS408_historical
```

zip it

```
zip -r pr_AK_CAN_2km_CRU_TS408_historical.zip pr_AK_CAN_2km_CRU_TS408_historical
```

and copy to the CKAN directory on poseidon

```
scp pr_AK_CAN_2km_CRU_TS408_historical.zip kmredilla@poseidon.snap.uaf.edu://workspace/CKAN/CKAN_Data/Base/AK_CAN_2km/historical/CRU_TS/Historical_Monthly_and_Derived_Precipitation_Products_2km_CRU_TS/
```
