# Restack hourly 20km WRF outputs

Now that the WRF outputs are available on the scratch filesystem for persistence and fast access, execute the restacking script on all variables of interest.

This is the main lift of the pipeline and it applies to a single WRF group (again, "group" meaning a specific model / scenario combination) for any variables and years specified. It "restacks" the WRF outputs, which means extracting the data for all variables in a single hourly WRF file and combining them into new files grouped by variable and year. It then assigns useful metadata and restructures the files to achieve greater usability (note - this was previously a separate step, but the storage of essentially duplicate intermediate data was not efficient).

As mentioned above, this pipeline is currently configured to run the restacking for all potential combinations of variables / years for each group.

Set up the environment:

In [None]:
import os
import time
from multiprocessing import Pool
import tqdm
import numpy as np
import pandas as pd
import xarray as xr
# from tqdm.notebook import tqdm as nb_tqdm
# codebase
from config import *
import luts
import slurm

### 1 - Make forecast time table

Tables of forecast time values and filenames are used for interpolating the "accumulation" variables, such as snow and precipitation. This should be done after an entire year's worth of data has successfully copied from `$ARCHIVE` to scratch space, because this table will be referenced for the filepaths and timestamp information to restack. Plus this step utilizes the compute nodes which cannot see `$ARCHIVE`.

**Note** - it is currently unknown why there is an "accumulation fix" needed at all. There could be some info on this lurking somewhere.

Create the slurm script for getting the forecast times:

In [32]:
# since this is only done once for a group with all files, only need to specify group (no year(s))
sbatch_fp = slurm_dir.joinpath(f"get_forecast_times_{group}.slurm")
sbatch_out_fp = slurm_dir.joinpath(f"get_forecast_times_{group}_%j.out")
wrf_scratch_dir = raw_scratch_dir.joinpath(group)
sbatch_head = slurm.make_sbatch_head(slurm_email, partition, conda_init_script)
slurm.write_sbatch_forecast_times(sbatch_fp, sbatch_out_fp, wrf_scratch_dir, anc_dir, forecast_times_script, ncpus, sbatch_head)

Forecast times slurm commands written to /center1/DYNDOWN/kmredilla/wrf_data/slurm/get_forecast_times_ccsm_hist.slurm


Submit the script:

In [34]:
# takes > 30 minutes to run on Chinook
job_id = slurm.submit_sbatch(sbatch_fp)

### 2 - Run the restacking with slurm

Make the slurm scripts for restacking data for a particular variable and year.

In [71]:
varnames = luts.varnames

sbatch_fps = []
year_str = f"{years[0]}-{years[-1]}"
for varname in varnames:
    # write to .slurm script
    sbatch_fp = slurm_dir.joinpath(f"restack_{group}_{year_str}_{varname}.slurm")
    # filepath for slurm stdout
    sbatch_out_fp = slurm_dir.joinpath(f"restack_{group}_{year_str}_{varname}_%j.out")
    sbatch_head = slurm.make_sbatch_head(
        slurm_email, partition, conda_init_script
    )

    args = {
        "sbatch_fp": sbatch_fp,
        "sbatch_out_fp": sbatch_out_fp,
        "restack_script": restack_script,
        "luts_fp": luts_fp,
        "geogrid_fp": geogrid_fp,
        "anc_dir": anc_dir,
        "restacked_dir": restack_scratch_dir,
        "group": group,
        "fn_str": luts.groups[group]["fn_str"],
        "years": years,
        "varname": varname,
        "ncpus": ncpus,
        "sbatch_head": sbatch_head,
    }

    slurm.write_sbatch_restack(**args)
    sbatch_fps.append(sbatch_fp)

Remove existing slurm output scripts if you fancy it:

In [72]:
for varname in varnames:
    _ = [fp.unlink() for fp in list(slurm_dir.glob(f"*{group}_{year_str}_{varname}_*.out"))]

Submit the `.slurm` scripts with `sbatch`:

In [73]:
job_ids = [slurm.submit_sbatch(fp) for fp in sbatch_fps]

This should complete this step of the pipeline. Once the slurm jobs have all finished, proceed to resampling the restacked files to a daily resolution. 