# Copy raw WRF outputs to scratch space

If not present on the filesystem (as is the case at the time of developing the current code) then the WRF data need to be copied over from tape storage on Chinook (`$ARCHIVE` filesystem).

This step will copy the annual subdirectories containing the WRF outputs for all specified years to scratch space for efficient reading.

The prerequisite to this step is to "stage" the files that are on tape storage - i.e., read them from tape to a temporary spot (path on system is retained). Per the README, this can be accomplished with:

```
python stage_hourly.py
```

Run this cell to set up the environment for running the copy:

In [3]:
from multiprocessing import Pool
from config import *
import luts
import restack_20km as main

years = luts.groups[group]["years"]
wrf_dir = luts.groups[group]["directory"]

### 1 - Check that all requested files actually staged

It takes a very long time to to stage an entire group's directory of files, and experience has shown that it might partially fail or otherwise be difficult to tell whether all files are staged. So we will determine what files (years), if any, are missing from the scratch space.

Use the `check_staged` function to verify that all files are actually staged and ready to be copied to scratch space:

In [14]:
%time unstaged_fps = main.check_staged(wrf_dir, years)

Requested years: [1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
 1998 1999 2000 2001 2002 2003 2004 2005 2006]
All files are staged
CPU times: user 119 ms, sys: 90 ms, total: 209 ms
Wall time: 18.2 s


If not all files are staged, you will need to re-exeute the above command calling the `stage_hourly.py` script.

### 2 - Copy staged files to `scratch_dir`

Ensure yearly subdirectories are present before starting the copying:

In [18]:
main.make_yearly_scratch_dirs(group, years, raw_scratch_dir)

Iterate over years and copy the files in parallel with `multiprocesing.Pool`:

In [None]:
ncpus = 20
clobber = "all"


group_dir = raw_scratch_dir.joinpath(group)
for year in tqdm.tqdm(years, total=len(years), desc=f"Copying files for {len(years)} years"):
    src_dir = wrf_dir.joinpath(str(year))
    dst_dir = group_dir.joinpath(str(year))
    # set third arg to False for no-clobber
    args = [(fp, dst_dir.joinpath(fp.name), clobber) for fp in src_dir.glob("*.nc")]
    
    with Pool(ncpus) as pool:
        out = [out for out in tqdm.tqdm(pool.imap(main.sys_copy, args), total=len(args), desc=f"Year: {year}")]

Copying files for 37 years:   0%|                                                                          | 0/37 [00:00<?, ?it/s]
Year: 1970:   0%|                                                                                        | 0/8736 [00:00<?, ?it/s][A
Year: 1970:   0%|                                                                             | 1/8736 [00:06<14:36:49,  6.02s/it][A
Year: 1970:   0%|▏                                                                            | 21/8736 [00:12<1:11:38,  2.03it/s][A
Year: 1970:   0%|▎                                                                              | 41/8736 [00:17<54:50,  2.64it/s][A
Year: 1970:   1%|▌                                                                              | 61/8736 [00:18<30:46,  4.70it/s][A
Year: 1970:   1%|▋                                                                              | 81/8736 [00:24<35:36,  4.05it/s][A
Year: 1970:   1%|▉                                               

All files should now be present on scratch space. 

However, this cell below can be used as a quick check to identify any files that didn't copy properly based on file size:

In [7]:
flag_fps = []

for year in years:
    year_scratch_dir = raw_scratch_dir.joinpath(group, year)
    flag_fps.extend(check_scratch_file_sizes(year_scratch_dir, ncpus=20))

Then, re-copy any missing files derived from that check:

In [None]:
main.recopy_raw_scratch_files(flag_fps, wrf_dir)

**Note** - if there is a large number of missing files, it might be more efficient to use the intial section above for copying in batch.