# Profiling for dates associated with Nome storms

This notebook is for visualizing and analsing the results from profiling the analog forecast method using dates where Nome, AK was hit by a storm (provided by collaborators). 


The dates provided will be referred to as the "dates of interest." We applied the skill profiling framework for the the third and fifth days preceding to the dates of interest. 

In [1]:
import os
from pathlib import Path
import pandas as pd
import xarray as xr
import numpy as np
# local
import luts
from config import data_dir
import analog_forecast as af

Concatenate the results into single tables:

In [2]:
analog_df = pd.concat([pd.read_csv(fp) for fp in Path("results").glob("*.csv") if "naive" not in fp.name])
naive_df = pd.concat([pd.read_csv(fp) for fp in Path("results").glob("*naive.csv")])

## QC

First, some validation just to make sure we have things straight here. Let's take the first and last row here and manually check each aspect of the algorithm. 

In [9]:
row = analog_df.iloc[0]

Load the ERA5 data that we will use to search and generate forecasts:

In [10]:
row

variable                      t2m
spatial_domain             alaska
anomaly_search               True
reference_date         2004-10-08
forecast_day_number             1
forecast_error              4.851
Name: 0, dtype: object

In [11]:
varname = row["variable"]
ref_date = row["reference_date"]
ds = xr.load_dataset(data_dir.joinpath(luts.varnames_lu[varname]["anom_filename"]))

Subset to the spatial domain:

In [12]:
spatial_domain = row["spatial_domain"]
bbox = luts.spatial_domains[spatial_domain]["bbox"]
sub_da = ds[varname].sel(latitude=slice(bbox[3], bbox[1]), longitude=slice(bbox[0], bbox[2]))
print("Original shape:", ds[varname].shape)
print("Shape after spatial subset:", sub_da.shape)

Original shape: (23011, 361, 1440)
Shape after spatial subset: (23011, 129, 221)


Compute RMSE for all time slices before and after the reference date and forecast window:

In [13]:
%%time
analogs = af.find_analogs(sub_da, ref_date)

CPU times: user 3.1 s, sys: 5.63 s, total: 8.72 s
Wall time: 8.73 s


Load the raw value version for generating and checking the forecast:

In [14]:
%%time
raw_ds = xr.load_dataset(data_dir.joinpath(luts.varnames_lu[varname]["filename"]))

CPU times: user 56.3 s, sys: 50.8 s, total: 1min 47s
Wall time: 1min 52s


Subset the raw data spatially and compute the forecast as the mean of the arrays for day t+1 for each of the analogs:

In [15]:
raw_sub_da = raw_ds[varname].sel(latitude=slice(bbox[3], bbox[1]), longitude=slice(bbox[0], bbox[2]))
forecast = (raw_sub_da.sel(time=analogs.time.values + pd.to_timedelta(1, "d")).values).mean(axis=0)

Compute the RMSE between forecast and the date after the reference date, and cross check with the results:

In [22]:
test_rmse = np.sqrt(
    ((raw_sub_da.sel(
        time=pd.to_datetime(ref_date + " 12:00:00") + pd.to_timedelta(1, "d")
    ) - forecast) ** 2).mean()
).round(3)

assert test_rmse == row["forecast_error"].astype(np.float32)

Make sure memory is freed up for loading different datasets:

In [71]:
try:
    del ds
    del raw_sub_da
    del raw_ds
except:
    pass
gc.collect()

15

Do the same as above for another row with a different variable, spatial domain, etc. Also with a different forecast day number:

In [4]:
%%time
# ensure it is a different variable
row = analog_df.query("variable == 'sst'").iloc[-1]
varname = row["variable"]
ref_date = row["reference_date"]
spatial_domain = row["spatial_domain"]
bbox = luts.spatial_domains[spatial_domain]["bbox"]
ds = xr.load_dataset(data_dir.joinpath(luts.varnames_lu[varname]["filename"]))
sub_da = ds[varname].sel(latitude=slice(bbox[3], bbox[1]), longitude=slice(bbox[0], bbox[2]))
print("Original shape:", ds[varname].shape)
print("Shape after spatial subset:", sub_da.shape)
analogs = af.find_analogs(sub_da, ref_date)
date_offset = row["forecast_day_number"]
forecast = (sub_da.sel(time=analogs.time.values + pd.to_timedelta(date_offset, "d")).values).mean(axis=0)
test_rmse = np.sqrt(
    ((sub_da.sel(
        time=pd.to_datetime(ref_date + " 12:00:00") + pd.to_timedelta(date_offset, "d")
    ) - forecast) ** 2).mean()
).round(3)

assert test_rmse == row["forecast_error"].astype(np.float32)

Original shape: (23011, 361, 1440)
Shape after spatial subset: (23011, 361, 240)


AssertionError: 

## Save tables

Save the combined tables for collaborators to have a look at.

Make a table with the forecast date included and limit to those particular dates provided by partners.

In [30]:
ref_dates = ["2004-10-11", "2004-10-18", "2005-09-22", "2013-11-06", "2004-05-09", "2015-11-09", "2015-11-23"]

analog_df["forecast_date"] = (pd.to_datetime(analog_df["reference_date"]) + pd.to_timedelta(analog_df["forecast_day_number"], unit="d"))
analog_df.query("forecast_date in @ref_dates & forecast_day_number in [3, 5]")

Unnamed: 0,variable,spatial_domain,anomaly_search,reference_date,forecast_day_number,forecast_error,forecast_date
2,t2m,alaska,True,2004-10-08,3,6.213,2004-10-11
16,t2m,alaska,True,2004-10-15,3,4.711,2004-10-18
30,t2m,alaska,True,2005-09-19,3,3.814,2005-09-22
44,t2m,alaska,True,2013-11-03,3,5.250,2013-11-06
58,t2m,alaska,True,2004-05-06,3,4.350,2004-05-09
...,...,...,...,...,...,...,...
718,t2m,north_pacific,False,2005-09-17,5,2.281,2005-09-22
732,t2m,north_pacific,False,2013-11-01,5,3.781,2013-11-06
746,t2m,north_pacific,False,2004-05-04,5,2.078,2004-05-09
760,t2m,north_pacific,False,2015-11-04,5,3.106,2015-11-09
