# Using the `ua-snap/downscale` repo for statistical downscaling of climate data

This notebook provides a step-by-step guide to using the [`downscale`](https://github.com/ua-snap/snap-geo) package conduct simple delta-downscaling of some CRU-TS 4.05 data. Let's go!

### Setup

Start by setting some variables to specify what data should be downscaled:  

* path to dataset to downscale (CRU-TS 4.0, temperature):

In [1]:
!ncdump -h /Data/Base_Data/Climate/World/CRU_grids/CRU_TS405/cru_ts4.05.1901.2020.tmp.dat.nc

netcdf cru_ts4.05.1901.2020.tmp.dat {
dimensions:
	lon = 720 ;
	lat = 360 ;
	time = UNLIMITED ; // (1440 currently)
variables:
	float lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
	float lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
	float time(time) ;
		time:long_name = "time" ;
		time:units = "days since 1900-1-1" ;
		time:calendar = "gregorian" ;
	float tmp(time, lat, lon) ;
		tmp:long_name = "near-surface temperature" ;
		tmp:units = "degrees Celsius" ;
		tmp:correlation_decay_distance = 1200.f ;
		tmp:_FillValue = 9.96921e+36f ;
		tmp:missing_value = 9.96921e+36f ;
	int stn(time, lat, lon) ;
		stn:description = "number of stations contributing to each datum" ;
		stn:_FillValue = -999 ;
		stn:missing_value = -999 ;

// global attributes:
		:Conventions = "CF-1.4" ;
		:title = "CRU TS4.05 Mean Temperature" ;
		:institution = "Data held at British Atmospheric Data Centre, RAL, UK." ;
		:source = "Run ID = 2103051243. Data genera

In [2]:
in_fp = f"/Data/Base_Data/Climate/World/CRU_grids/CRU_TS405/cru_ts4.05.1901.2020.tmp.dat.nc"

* path to directory containing baseline monthly climatology files to use (PRISM, 1961-1990):

In [3]:
from pathlib import Path


clim_dir = Path(f"/workspace/Shared/Tech_Projects/DeltaDownscaling/project_data/climatologies/prism/tas")

In [4]:
!ls /workspace/Shared/Tech_Projects/DeltaDownscaling/project_data/climatologies/prism/tas

tas_mean_C_akcan_prism_01_1961_1990.tif
tas_mean_C_akcan_prism_02_1961_1990.tif
tas_mean_C_akcan_prism_03_1961_1990.tif
tas_mean_C_akcan_prism_04_1961_1990.tif
tas_mean_C_akcan_prism_05_1961_1990.tif
tas_mean_C_akcan_prism_06_1961_1990.tif
tas_mean_C_akcan_prism_07_1961_1990.tif
tas_mean_C_akcan_prism_08_1961_1990.tif
tas_mean_C_akcan_prism_09_1961_1990.tif
tas_mean_C_akcan_prism_10_1961_1990.tif
tas_mean_C_akcan_prism_11_1961_1990.tif
tas_mean_C_akcan_prism_12_1961_1990.tif


* number of cores to use for parallel processing:

In [5]:
# 32 is max on Atlas compute node
ncpus = 32

* path to directory for output files:

In [6]:
output_dir = Path("/atlas_scratch/kmredilla/downscale/CRU_TS405/tas")
output_dir.mkdir(exist_ok=True, parents=True)

### Processing

In [7]:
# this seems to help with errors like this:
#  OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 32: Resource temporarily unavailable
import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'

Create a `Baseline` object from the baseline files:

In [8]:
from downscale import Baseline


baseline = Baseline(sorted(list(clim_dir.glob("*.tif"))))

Create a historical `Dataset` object from the CRU-TS dataset:

(does some grid rearranging in background, hence the elapsed time. Also prints an obscure "level" value)

In [9]:
from downscale import Dataset


# not sure if these value should ever change atm.
clim_begin = "01-1961"
clim_end = "12-1990"
# the following variables seem to be determined by the input file(s)
#  but still need to be specified for now
variable = "tmp"
model = "ts40"
scenario = "historical"
project = "cru"
units = "C"
metric = "mean"

ds_args = (
    in_fp,
    variable,
    model,
    scenario,
    project,
    units,
    metric,
)
ds_kwargs = {
    "method":"linear",
    "ncpus": ncpus,
}

# put on one line for timing purposes
%time historical = Dataset(*ds_args, **ds_kwargs)

4
CPU times: user 3.54 s, sys: 1.57 s, total: 5.11 s
Wall time: 5.13 s


Define some more options, and a function for rounding the outputs:

In [10]:
import numpy as np
from functools import partial


rounder = partial(np.around, decimals=1)
downscaling_operation = "add"
out_varname = "tas"
# write anomalies as well
anom = True
# interpolate across NA's using a spline
interp = True
find_bounds = False
fix_clim = False
aoi_mask = None
        
def round_it(arr):
    return rounder(arr)

Read in a mask from one of the PRSIM climatology files:

In [11]:
import rasterio as rio


with rio.open(baseline.filelist[0]) as src:
    mask = src.read_masks(1)

Create a `DeltaDownscale` class using the above objects and options:

In [12]:
from downscale import DeltaDownscale


# FOR CRU WE PASS THE interp=True so we interpolate across space first when creating the Dataset()
# ^^ note from a script

dd_kwargs = {
    "baseline": baseline,
    "clim_begin": clim_begin,
    "clim_end": clim_end,
    "historical": historical,
    "future": None,
    "downscaling_operation": downscaling_operation,
    "mask": mask,
    "mask_value": 0,
    "ncpus": ncpus,
    "src_crs": {"init": "epsg:4326"},
    "src_nodata": None,
    "dst_nodata": None,
    "post_downscale_function": round_it,
    "varname": out_varname,
    "modelname": None,
    "anom": anom,
    "interp": interp,
    "find_bounds": find_bounds,
    "fix_clim": fix_clim,
    "aoi_mask": aoi_mask,
}

%time cru = DeltaDownscale(**dd_kwargs)

running interpolation across NAs -- base resolution
processing interpolation to convex hull in parallel using 32 cpus.
ds interpolated updated into self.ds
CPU times: user 16.6 s, sys: 8.74 s, total: 25.4 s
Wall time: 2min 32s


Run the downscaling:

In [None]:
%time cru.downscale(output_dir=output_dir)