# Quality control for regridding efforts

Use this notebook to check the quality of the regridded data.

In [38]:
import re
from multiprocessing import Pool
import numpy as np
import xarray as xr
from config import *
from regrid import rename_file

### Check for completeness of regridded files

Get a list of filepaths for all regridded files:

In [3]:
regrid_fps = list(regrid_dir.glob("*/*/*/*/*.nc"))

Check that all files to be regridded (which are those listed in the batch regrid files) are found in the regrid directory on scratch space.

First, need to get all of the source filenames from the batch files:

In [22]:
src_fns = []
for fp in regrid_batch_dir.glob("*.txt"):
    with open(fp) as f:
        src_fns.extend([line.split("/")[-1].replace("\n", "") for line in f.readlines()])

Since we renamed the files by replacing the grid type component of the original filename with "regrid", we must standardize again for both set of files. Do this by simply dropping "regrid" from the regridded files, and dropping the grid type component from the raw filenames:

In [23]:
rep = {"_gr_": "_", "_gr1_": "_", "_gn_": "_"}

src_fns = set([rename_file(fn, rep) for fn in src_fns])
regrid_fns = set([fp.name.replace("_regrid_", "_") for fp in regrid_fps])

Now, the source files which are not found in the regridding output directory can be isolated, and the number of them should be equal to the difference in number of files between source and completed files:

In [26]:
missing_fns = list(src_fns - regrid_fns)
len(missing_fns) == (len(src_fns) - len(regrid_fns)) == 0

True

Sometimes the processing code would create files and fail before writing them completely. Check for file smaller than 1 MB:

In [27]:
%%time
from multiprocessing import Pool


def is_smol_file(fp):
    """Check whether a file is small for a regridded CMIP6 file."""
    if fp.stat().st_size / (10e2 ** 2) < 1:
        return fp
    else:
        return
    
with Pool(20) as pool:
    smol_fps = pool.map(is_smol_file, regrid_fps)
    
smol_fps = [fp for fp in smol_fps if fp is not None]

assert len(smol_fps) == 0

CPU times: user 87.2 ms, sys: 123 ms, total: 210 ms
Wall time: 35.8 s


### Verify regridding

Verify that regridded files all have the target grid.

Load the target grid

In [28]:
dst_ds = xr.open_dataset(target_grid_fp)

Define a function to check that the lat and lon arrays of the target grid match those of a given regridded fielpath:

In [56]:
def verify_latlon(regrid_fp, target_lat_arr, target_lon_arr):
    assert np.all(regrid_ds["lat"] == target_lat_arr)
    assert np.all(regrid_ds["lon"] == target_lon_arr)
    
    return regrid_fp

Run the check for all regridded files:

In [54]:
%%time
target_lat_arr = dst_ds["lat"].values
target_lon_arr = dst_ds["lon"].values

args = [(fp, target_lat_arr, target_lon_arr) for fp in regrid_fps]

with Pool(20) as pool:
    _ = pool.starmap(verify_latlon, args)

CPU times: user 467 ms, sys: 163 ms, total: 630 ms
Wall time: 849 ms
