# Production comparison: NCAR CCSM4 Historical

This notebook is for evaluating the results from the comparison of newly restacked files with existing production files.

It is meant to serve as a historical record and will not maintain functionality as files are moved.

Set up the environment:

In [1]:
import importlib.util
import os
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import xarray as xr


# all this code to load the config and luts modules by absolute path
project_dir = Path(os.getenv("PROJECT_DIR"))

def load_module(path):
    """https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly"""
    module = path.name.split(".py")[0]
    spec = importlib.util.spec_from_file_location(
        module, path
    )
    module_obj = importlib.util.module_from_spec(spec)
    sys.modules[module] = module_obj
    spec.loader.exec_module(module_obj)
    
    return module_obj

luts = load_module(project_dir.joinpath("restack_20km/luts.py"))
config = load_module(project_dir.joinpath("restack_20km/config.py"))

## Hourly data

Load the results from comparing hourly data:

In [2]:
hourly_fp = config.anc_dir.joinpath(
    "production_data_comparisons",
    f"prod_comparison_{luts.groups[config.group]['fn_str']}_hourly.csv"
)
hourly_df = pd.read_csv(hourly_fp)

#### Timestamp mismatches

Look at instances where something was wrong with the timestamp comparison.

First thing to check is that all of these mismatches are with the newly rotated wind data created in 2021. If they are all wind variables, we can ignore the other time comparisons - those wind data time stamps were labeled incorrectly (should not have leap days).

In [13]:
wind_varnames = ["u", "u10", "ubot", "v", "v10", "vbot"]
time_mismatch_vars = np.unique(hourly_df.query("time_result == False")["varname"])
assert np.all([varname in wind_varnames for varname in time_mismatch_vars])

Assert that all of the data comparisons were OK under the `arr_result` result column:

In [16]:
assert np.all(hourly_df["arr_result"])

So the hourly data for NCAR CCSM4 historical passes the comparison with production data. These are safe to copy to `base_dir` and remove from scratch space.

## Daily data

Load the results from comparing daily data:

In [17]:
daily_fp = config.anc_dir.joinpath(
    "production_data_comparisons",
    f"prod_comparison_{luts.groups[config.group]['fn_str']}_daily.csv"
)
daily_df = pd.read_csv(daily_fp)

Looks like all of the datetime mismatches are the `pcpc` variable, which occur because that variable was not produced for some models, and is not even hosted on AWS for any model, so we will ignore those:

In [18]:
assert np.all(~daily_df.query("time_result == False")["prod_exists"])

And make sure there were no array mismatches with the daily data by asserting that there where no files for non-`pcpc` variables where the data comparison failed:

In [10]:
assert len(daily_df.query("arr_result == False & varname != 'pcpc'")) == 0

as a double check, this should also just be all of the files for which there is no production version:

In [19]:
assert np.all(~daily_df.query("arr_result == False")["prod_exists"])

And that's it. Daily and hourly NCAR CCSM4 historical data appear to resemble the production data well enough.

In [23]:
 $SCRATCH_DIR/restacked

[0m[38;5;27mdaily[0m/  [38;5;27mhourly[0m/
[m

In [34]:
ls -lh /import/SNAP/wrf_data/project_data/wrf_data/restacked/daily/pcpc


total 371M
-rw------- 1 kmredilla dyndown 9.5M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1970.nc
-rw------- 1 kmredilla dyndown 9.0M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1971.nc
-rw------- 1 kmredilla dyndown 9.6M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1972.nc
-rw------- 1 kmredilla dyndown 9.4M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1973.nc
-rw------- 1 kmredilla dyndown  12M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1974.nc
-rw------- 1 kmredilla dyndown 9.1M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1975.nc
-rw------- 1 kmredilla dyndown  10M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1976.nc
-rw------- 1 kmredilla dyndown  12M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1977.nc
-rw------- 1 kmredilla dyndown  12M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1978.nc
-rw------- 1 kmredilla dyndown 9.7M Oct 19 15:26 pcpc_daily_wrf_NCAR-CCSM4_historical_1979.nc
-rw------- 1 kmredilla dyndown  11M Oct 19 15:26 