# Invalid confidence interval diagnosis

The purpose of this notebook is to aid in diagnosing the cause of invalid bounds on the confidence intervals for some precipitation estimates. 

**The issue**: lower bound $\geq$ upper bound

## example

An example of this issue can be seen in the bounds for the **4d duration**, **1000yr interval** for **2020-2049 GFDL-CM3** output at latitude **67.61** and longitude **-156.75**, or X-coordinate of -117682.86362522288 and Y-coordinate of 1963676.5978819495 in EPSG:3338. The **estimate is 4.74**, with a **lower bound of 7.81** and an **upper bound of 2.68**.

## where: deltas or warping?

Does this problem appear to be present before or after the warping step? 

Since the confidence bounds are stored as the ratios between future bounds and historical bounds, it's not necessarily  incorrect if an upper bound delta is less than a lower bound delta. The issue is only present after the delta has been applied (multiplied). 

### testing
1. a preliminary step in testing this is to just check that the deltas make sense using the data from the underlying delta calculation: the historical and future WRF precip estimates from the "intervals" step. 

In [1]:
# setup
import os
import numpy as np
import rasterio as rio
import xarray as xr
from pyproj import Transformer

# directories
data_dir = "/workspace/Shared/Tech_Projects/DOT/project_data"
wrf_dir = os.path.join(data_dir, "wrf_pcpt")

# WGS84 coordinates from example of invalid bounds
wgs84_coords = (-156.75, 67.61)
# WRF CRS
wrf_crs = '+units=m +proj=stere +lat_ts=64.0 +lon_0=-152.0 +lat_0=90.0 +x_0=0 +y_0=0 +a=6370000 +b=6370000'
transformer = Transformer.from_proj("EPSG:4326", wrf_crs, always_xy=True)
wrf_coords = transformer.transform(*wgs84_coords)



In [2]:
# open deltas, historical, and future
deltas_ds = xr.open_dataset(os.path.join(wrf_dir, "deltas", "pcpt_GFDL-CM3_sum_wrf_4d_2020-2049_deltas.nc"))
hist_ds = xr.open_dataset(os.path.join(wrf_dir, "output_interval_durations", "pcpt_GFDL-CM3_historical_sum_wrf_4d_1979-2015_intervals.nc"))
future_ds = xr.open_dataset(os.path.join(wrf_dir, "output_interval_durations", "pcpt_GFDL-CM3_rcp85_sum_wrf_4d_2020-2049_intervals.nc"))

In [3]:
# query datasets at point
deltas_sel = deltas_ds.sel(xc=wrf_coords[0], yc=wrf_coords[1], method="nearest")
hist_sel = hist_ds.sel(xc=wrf_coords[0], yc=wrf_coords[1], method="nearest")
future_sel = future_ds.sel(xc=wrf_coords[0], yc=wrf_coords[1], method="nearest")

Check for equality between $historical * deltas$ product and future values, with precision of 5 decimal places. 

In [4]:
# function to compare equality of all values at five decimal places
def arrs_equal(a1, a2):
    return all(np.round(a1, 5) == np.round(a2, 5))

# test multiplication against future
du, dpf, dl = deltas_sel["pf-upper"].values, deltas_sel["pf"].values, deltas_sel["pf-lower"].values
hu, hpf, hl = hist_sel["pf-upper"].values, hist_sel["pf"].values, hist_sel["pf-lower"].values
fu, fpf, fl = future_sel["pf-upper"].values, future_sel["pf"].values, future_sel["pf-lower"].values
upper_equal = arrs_equal(du * hu, fu)
pf_equal = arrs_equal(dpf * hpf, fpf)
lower_equal = arrs_equal(dl * hl, fl)

print("all upper bounds equal:", upper_equal)
print("all pf estimates equal:", pf_equal)
print("all lower bounds equal:", lower_equal)

all upper bounds equal: True
all pf estimates equal: True
all lower bounds equal: True


As expected, the deltas for the exmaple problem appear valid, in the sense that multiplying them by the historical values gives the future values. 

Inspection of the historical and future estimates partly illuminates the underlying issue:

In [5]:
# indexing -1 for 1000yr RI
i = -1
print("Historical upper bound:", hu[i], "| Future upper bound:", fu[i])
print("Historical estimate:", hpf[i], "| Future estimate:", fpf[i])
print("Historical lower bound:", hl[i], "| Future lower bound:", fl[i])

Historical upper bound: 32.664143 | Future upper bound: 11.907405
Historical estimate: 9.411026 | Future estimate: 5.703586
Historical lower bound: 3.1065607 | Future lower bound: 3.2861311


The uncertatinty dencreases drastically from the historical period to the future period. Knowing this may be useful for informing a different method of quantifying uncertainty with the delta method, because it is evidence of the possibility that changes in the confidence bounds do not necessarily occur in the same direction as change in the pf estimates.

2. Next, applying these deltas to the values of the corresponding Atlas 14 grid cell will confirm that the issue was present before warping, if that is indeed the case. 

In [6]:
# open rasters
a14_upper = rio.open(os.path.join(data_dir, "NOAA_Atlas14", "raw", "warped", "ak1000yr04dau_ams.tif"))
a14_pf = rio.open(os.path.join(data_dir, "NOAA_Atlas14", "raw", "warped", "ak1000yr04da_ams.tif"))
a14_lower = rio.open(os.path.join(data_dir, "NOAA_Atlas14", "raw", "warped", "ak1000yr04dal_ams.tif"))

In [7]:
# get row/column of WGS84 example coordinates
# transform to 3338
transformer = Transformer.from_crs(4326, 3338, always_xy=True)
a14_coords = transformer.transform(*wgs84_coords)
a14_rc = a14_pf.index(*a14_coords)

In [8]:
# get combined values
# define window for reading only the exact row, col
window = ((a14_rc[0], a14_rc[0] + 1), (a14_rc[1], a14_rc[1] + 1))
# read a14 values
ub = a14_upper.read(1, window=window) / 1000
pf = a14_pf.read(1, window=window) / 1000
lb = a14_lower.read(1, window=window) / 1000
# combined with deltas
ub_c = np.round(ub * du[i], 3)
pf_c = np.round(pf * dpf[i], 3)
lb_c = np.round(lb * dl[i], 3)

In [9]:
print("Atlas 14 lower bound, pf estimate, and upper bound:", lb, pf, ub)
print("Corresponding WRF-derived deltas: ", round(dl[i], 3), round(dpf[i], 3), round(du[i], 3))
print("Combined values:", lb_c, pf_c, ub_c)

Atlas 14 lower bound, pf estimate, and upper bound: [[7.239]] [[9.843]] [[13.542]]
Corresponding WRF-derived deltas:  1.058 0.606 0.365
Combined values: [[7.657]] [[5.965]] [[4.937]]


This confirms that the problem is present after deltas step.

## Possible solutions

### 1. compute deltas from pf estimate - confidence bound differences

One possible method is to save the deltas for the confidence bounds as ratios between differences from the pf estimate. Would that return valid estimates in this case?

In [10]:
# new deltas for upper and lower bounds
hu_1, fu_1 = hpf[i] - hu[i], fpf[i] - fu[i]
hl_1, fl_1 = hpf[i] - hl[i], fpf[i] - fl[i]
du_1 = fu_1 / hu_1
dl_1 = fl_1 / hl_1
# multiply
ub_1 = np.round(ub * du_1, 3)
lb_1 = np.round(lb * dl_1, 3)

print("Combined values, possible solution 1:", lb_1, pf_c, ub_1)

Combined values, possible solution 1: [[2.776]] [[5.965]] [[3.613]]


**FAILS**

This method does not gaurantee sensible confidence bounds; the resulting confidence bounds are still at the mercy of drastic changes in uncertainty between historical and future period.

### 2. to be continued...