# Utilities for restacking pipeline

This notebook is a collection of various utility functions that can be used to assist with running the pipeline. 

Run the cell below to set up the environment before moving on to other parts of the notebook.

In [3]:
from config import *
import luts
import restack_20km as main

years = luts.groups[group]["years"]
wrf_dir = Path(luts.groups[group]["directory"])

### Inspect the filesystem

This section just provides some info on the system we are working with here.

The WRF outputs of interest from different runs of model / scenario may be in separate directories, but there is consistency in file structure across all groups - all `hourly` and `daily` directories have annual subgroups consisting of the WRF outputs to be restacked:

In [2]:
ls /archive/DYNDOWN/DIONE/pbieniek/ccsm/hist/hourly | head -5

[0m[38;5;27m1970[0m/
[38;5;27m1971[0m/
[38;5;27m1972[0m/
[38;5;27m1973[0m/
[38;5;27m1974[0m/


In [14]:
ls /archive/DYNDOWN/DIONE/pbieniek/ccsm/hist/hourly/ | tail -6

[38;5;27m2003[0m/
[38;5;27m2004[0m/
[38;5;27m2005[0m/
nohup.out
[38;5;34morgdata.sh[0m*
[m

In [18]:
ls /archive/DYNDOWN/DIONE/pbieniek/ccsm/hist/hourly/1979 | head -5

[0m[38;5;34mdailylog.out[0m*
[38;5;34mWRFDS_d01.1979-01-01_00.nc[0m*
[38;5;34mWRFDS_d01.1979-01-01_01.nc[0m*
[38;5;34mWRFDS_d01.1979-01-01_02.nc[0m*
[38;5;34mWRFDS_d01.1979-01-01_03.nc[0m*
ls: write error


This structure applies for all outputs, and exists for the following model / scenario / year combinations:

* ERA-Interim
    * "historical": 1979-2015
* GFDL-CM3
    * historical: 1970-2006
    * RCP 8.5: 2006-2100
* NCAR-CCSM4
    * historical: 1970-2005
    * RCP 8.5: 2005-2100

#### System

This pipeline is being developed on the Chinook cluster:

In [1]:
!uname -a

Linux chinook00.rcs.alaska.edu 2.6.32-754.35.1.el6.61015g0000.x86_64 #1 SMP Mon Dec 21 12:41:07 EST 2020 x86_64 x86_64 x86_64 GNU/Linux


This pipeline makes use of slurm and multiple cores / compute nodes for processing in reasonable time.

In [5]:
!sinfo -V

slurm 19.05.7


### Ensure ancillary WRF geogrid file is present

The restacking will rely on a WRF geogrid data file for determining correct spatial projection information, and for correctly rotating data for wind variables. Make sure that it is present in the `anc_dir` directory:

In [None]:
if not geogrid_fp.exists():
    # the original location of this file is not known, in case it is ever deleted
    #  from this source location it might still be available on Poseidon at 
    #  /workspace/Shared/Tech_Projects/wrf_data/project_data/ancillary_wrf_constants/geo_em.d01.nc
    shutil.copy("/import/SNAP/wrf_data/project_data/ancillary_wrf_constants/geo_em.d01.nc", geogrid_fp)

### Check progress of copying to scratch

Check to see how many of the expected hourly WRF outputs have successfully been copied to scratch space from `$ARCHIVE`. This can take a while.

**Note** - only do this if files you are trying to copy have actually been staged. It should theoretically still work regardless, but will take an unknown amount of time.

In [5]:
wrf_fps, existing_scratch_fps = main.check_raw_scratch(wrf_dir, group, years, raw_scratch_dir)