The purpose of this notebook is to explore a mismatch between "durations" values which I calculate and those calculated from the pipeline. I expected that data queried from the GFDL-CM3 historical (1970) WRF precip data and summed over 4day periods should be equal to the data produced from the "durations" step in the pipeline, saved in `/workspace/Shared/Tech_Projects/DOT/project_data/durations/pcpt_4d_sum_wrf_GFDL-CM3_historical.nc`. This notebook exposes that mismatch. 

In [1]:
import os, time
import numpy as np
import pandas
import xarray as xr
from pyproj import Transformer

# directories
data_dir = "/workspace/Shared/Tech_Projects/DOT/project_data"
wrf_dir = os.path.join(data_dir, "wrf_pcpt")
wrf_fp = os.path.join(wrf_dir, "pcpt", "pcpt_hourly_wrf_GFDL-CM3_{}_{}.nc")

# Point of Interest - WGS84 coordinates from example of invalid bounds
wgs84_coords = (-147.96, 64.66)
# WRF CRS
wrf_crs = '+units=m +proj=stere +lat_ts=64.0 +lon_0=-152.0 +lat_0=90.0 +x_0=0 +y_0=0 +a=6370000 +b=6370000'
transformer = Transformer.from_proj("EPSG:4326", wrf_crs, always_xy=True)
# WGS84 coordinates transformed to WRF CRS
wrf_coords = transformer.transform(*wgs84_coords)

# open durations file, query at POI
dur_ds = xr.open_dataset(os.path.join(wrf_dir, "durations", "pcpt_4d_sum_wrf_GFDL-CM3_historical.nc"))
dur_sel_da = dur_ds.sel(xc=wrf_coords[0], yc=wrf_coords[1], method="nearest")
pcpt_sum_dur = dur_sel_da.pcpt.values

# open raw WRF file, bin by 4d period and sum
raw_ds = xr.open_dataset(wrf_fp.format("historical", "1970"))
raw_sel = raw_ds.sel(xc=wrf_coords[0], yc=wrf_coords[1], method="nearest")
pcpt_sum_raw_da = raw_sel.resample(time="4D").sum()
pcpt_sum_raw = pcpt_sum_raw_da.pcpt.values



time periods and coordinates match:

#### durations data:

In [2]:
print("time sample:", dur_sel_da.time.values[:5])
print("xc, yc:", dur_sel_da.xc.values, ",", dur_sel_da.yc.values)

time sample: ['1970-01-02T00:00:00.000000000' '1970-01-06T00:00:00.000000000'
 '1970-01-10T00:00:00.000000000' '1970-01-14T00:00:00.000000000'
 '1970-01-18T00:00:00.000000000']
xc, yc: 190000.0 , -2702425.477371664


#### calculated data:

In [3]:
print("time sample:", pcpt_sum_raw_da.time.values[:5])
print("xc, yc:", pcpt_sum_raw_da.xc.values, ",", pcpt_sum_raw_da.yc.values)

time sample: ['1970-01-02T00:00:00.000000000' '1970-01-06T00:00:00.000000000'
 '1970-01-10T00:00:00.000000000' '1970-01-14T00:00:00.000000000'
 '1970-01-18T00:00:00.000000000']
xc, yc: 190000.0 , -2702425.477371664


but these data don't match:

In [4]:
print("Sample of precip sums from pipeline:", pcpt_sum_dur[:5])
print("Sample of precip sums calculated from raw WRF:", pcpt_sum_raw[:5])

Sample of precip sums from pipeline: [ 3.782     12.1015     5.26      17.661499   0.9404998]
Sample of precip sums calculated from raw WRF: [ 3.278      7.8299994 10.0355     6.111     11.969    ]
