# Restack 20km AK WRF data interactively

This notebook is for ad-hoc execution of any portion of the pipeline in the [restack_20km](./restack_20km.ipynb) notebook. Use it to restack arbitrary chunks of the WRF data, down to a single variable of a single year for a particular WRF group.

## Setup

Execute this cell to set up the environment:

In [1]:
import os
import time
from multiprocessing import Pool
from pathlib import Path
import xarray as xr
from tqdm.notebook import tqdm
# codebase
import luts
import restack_20km as main


# these paths should be constant for any SNAPer running this pipeline
# assumes all folders are created in restack_20km.ipynb
base_dir = Path("/import/SNAP/wrf_data/project_data/wrf_data")
anc_dir = base_dir.joinpath("ancillary")
# monthly WRF file to serve as template
template_fp = anc_dir.joinpath("monthly_PCPT-gfdlh.nc")
# a script for initializing conda on nodes' shells
conda_init_script = anc_dir.joinpath("init_conda.sh")
# WRF geogrid file for correctly projecting data and rotating wind data
geogrid_fp = anc_dir.joinpath("geo_em.d01.nc")
# final output directory for data
output_dir = Path("/import/SNAP/wrf_data/project_data/wrf_data/restacked")
# slurm directory
slurm_dir = base_dir.joinpath("slurm")
slurm_dir.mkdir(exist_ok=True)

# this env var is always defined if notebook started with anaconda-project run
project_dir = Path(os.getenv("PROJECT_DIR"))
ap_env = project_dir.joinpath("envs/default")
# cp_script = project_dir.joinpath("restack_20km/mp_cp.py") not used on Chinook, $ARCHIVE not accessible from compute nodes
restack_script = project_dir.joinpath("restack_20km/restack.py")
forecast_times_script = project_dir.joinpath("restack_20km/forecast_times.py")
luts_fp = project_dir.joinpath("restack_20km/luts.py")

Set user parameters that should not change between various executions:

In [2]:
# scratch space where data will be copied for performant reading / writing
scratch_dir = Path(input("Scratch directory path:") or "/center1/DYNDOWN/kmredilla/wrf_data")
slurm_email = input("Email address for slurm:")
# conda_init_script = "/home/kmredilla/init_conda.sh"

# where raw wrf outputs will be copied on scratch
raw_scratch_dir = scratch_dir.joinpath("raw")

# where initially restacked data will be stored on scratch_space
restack_scratch_dir = scratch_dir.joinpath("restacked")

Scratch directory path: 
Email address for slurm: kmredilla@alaska.edu


## Restack data


### Processing parameters

The following arguments are required for a single job of restacking data for a particular variable (or variables), model, scenario, and year (or years):

* **WRF Group**: Encoded value specifying the WRF group being worked on, which is just a combination of the model and scenario (or just model, in terms of ERA-Interim).  One of [`era_interim`, `gfdl_hist`, `ccsm_hist`, `gfdl_rcp85`, `ccsm_rcp85`].
* **Year(s)**: a list of years to work on specified as integers, such as `[1979, 1980]`, or omit to work on all years available for a given WRF group.
* **Variable name(s)**: Name(s) of the variable(s). This is the lower case version of the variable name in the WRF outputs.
* **Number of cores**: This is the number of cores to use for parallel tasks.
* **Chinook partition**: the desired partition on the Chinook cluster.

In [3]:
group = ""
while group not in list(luts.groups.keys()):
    group = input(f"WRF group, one of {list(luts.groups.keys())}:")
print(f"WRF group selected: {group}")
wrf_dir = Path(luts.groups[group]["directory"])

years = [1738]
valid_years = f"{luts.groups[group]['years'][0]}-{luts.groups[group]['years'][-1]}"
while not all([int(year) in luts.groups[group]["years"] for year in years]):
    years = input(f"Years, a ' '- separated list of years, in {valid_years}. Leave blank for all years:") or luts.groups[group]['years']
    if (type(years) == str) and (len(years) > 0):
        years = years.split(" ")
print(f"Years selected: {years}")

varnames = [""]
while not all ([varname in luts.varnames for varname in varnames]):
    varnames = input("Enter name(s) of WRF variable(s) to re-stack (leave blank for all):") or luts.varnames
    if (type(varnames) == str) and (len(varnames) > 0):
        varnames = varnames.split(" ")

ncpus = 0
while (ncpus < 2) | (ncpus > 24):
    ncpus = input("Enter number of CPUs to use (valid range: 2-24; leave blank for 24 cores):") or 24
    ncpus = int(ncpus)

partition = input("Enter name of compute partition to use (leave blank for 't1small'):") or "t1small"

WRF group, one of ['erain_hist', 'gfdl_hist', 'ccsm_hist', 'gfdl_rcp85', 'ccsm_rcp85']: ccsm_hist


WRF group selected: ccsm_hist


Years, a ' '- separated list of years, in 1970-2005. Leave blank for all years: 1971


Years selected: ['1971']


Enter name(s) of WRF variable(s) to re-stack (leave blank for all): ACSNOW
Enter number of CPUs to use (valid range: 2-24; leave blank for 24 cores): 
Enter name of compute partition to use (leave blank for 't1small'): 


Create slurm scripts:

In [6]:
sbatch_fps = []
year_str = main.get_year_fn_str(years)
for varname in varnames:
    # write to .slurm script
    sbatch_fp = slurm_dir.joinpath(f"restack_{group}_{year_str}_{varname}.slurm")
    # filepath for slurm stdout
    sbatch_out_fp = slurm_dir.joinpath(f"restack_{group}_{year_str}_{varname}_%j.out")
    sbatch_head = main.make_sbatch_head(
        slurm_email, partition, conda_init_script, ap_env
    )

    args = {
        "sbatch_fp": sbatch_fp,
        "sbatch_out_fp": sbatch_out_fp,
        "restack_script": restack_script,
        "luts_fp": luts_fp,
        "geogrid_fp": geogrid_fp,
        "anc_dir": anc_dir,
        "restacked_dir": restack_scratch_dir,
        "group": group,
        "fn_str": luts.groups[group]["fn_str"],
        "years": years,
        "varname": varname,
        "ncpus": ncpus,
        "sbatch_head": sbatch_head,
    }

    main.write_sbatch_restack(**args)
    sbatch_fps.append(sbatch_fp)

Remove existing slurm output scripts if you fancy it:

In [8]:
for varname in varnames:
    _ = [fp.unlink() for fp in list(slurm_dir.glob(f"*{group}_{year_str}_{varname}_*.out"))]

Submit the `.slurm` scripts with `sbatch`:

In [9]:
job_ids = [main.submit_sbatch(fp) for fp in sbatch_fps]

## Quality Check

### Ensure that all files open and have consistent header info

Check this using both `xarray` and GDAL bindings.

Dimensions are the sam