## Verification in Python ##
**Tyler Wixtrom**<br>
*Texas Tech University*<br>
tyler.wixtrom@ttu.edu<br>

Unidata Users Workshop<br>
*June 25-28, 2018<br>
Boulder, CO*

The final type of visualization we will explore is that of ensemble verification. In the example, we will perform one of the simplest verification techniques, Root Mean Square Error (RMSE), comparing observed precipitation to both the ensemble mean precipitation and individual members.


In [None]:
from datetime import datetime
from netCDF4 import Dataset, num2date

import numpy as np
import matplotlib.pyplot as plt

### Ensemble Precipitation Plume ###
We will start with the precipitation plume that was created in a previous example.

In [None]:
data = Dataset('../2015020112/wrfprst_d01_2015020112_mem1.nc')
lat = data.variables['lat'][0, :]
lon = data.variables['lon'][0, :]
vtimes = num2date(data.variables['valid_time'][:], data.variables['valid_time'].units)

In [None]:
def lat_lon_2D_index(y, x, lat1, lon1):
    """
    This function calculates the distance from a desired lat/lon point
    to each element of a 2D array of lat/lon values, typically from model output,
    and determines the index value corresponding to the nearest lat/lon grid point.
    x = longitude array
    y = latitude array
    lon1 = longitude point (signle value)
    lat1 = latitude point (single value)
    Returns the index value for nearest lat/lon point on grid
    Equations for variable distiance between longitudes from
    http://andrew.hedges.name/experiments/haversine/
    """
    R = 6373.*1000.  # Earth's Radius in meters
    rad = np.pi/180.
    x1 = np.ones(x.shape)*lon1
    y1 = np.ones(y.shape)*lat1
    dlon = np.abs(x-x1)
    dlat = np.abs(y-y1)
    a = (np.sin(rad*dlat/2.))**2 + np.cos(rad*y1) * np.cos(rad*y) * (np.sin(rad*dlon/2.))**2
    c = 2 * np.arctan2( np.sqrt(a), np.sqrt(1-a))
    d = R * c
    return np.unravel_index(d.argmin(), d.shape)

In [None]:
idx = lat_lon_2D_index(lat, lon, 42.78, -84.59)

pcp = {}
for i in range(1, 21):
    data = Dataset('../2015020112/wrfprst_d01_2015020112_mem'+str(i)+'.nc')
    pcp['mem'+str(i)] = data.variables['tot_pcp'][:-8, idx[0], idx[1]].data

mean_pcp = np.mean([pcp[key] for key in pcp.keys()], axis=0)

In [None]:
fig = plt.figure(1, figsize=(17., 10.))
for i in range(1, 21):
    plt.plot(vtimes[:-8], pcp['mem'+str(i)], label='mem'+str(i))
plt.plot(vtimes[:-8], mean_pcp, label='Mean', color='k', linewidth=4)
plt.xlim(datetime(2015, 2, 1, 12), datetime(2015, 2, 2, 12))
plt.ylabel('Accumulated Precipitation (mm)')
plt.title('Ensemble Precipitation Plume for Lansing Capital Region International Airport (KLAN)')
plt.grid()
plt.legend()
plt.show()

### Observed Precipitation Data ###
To verify the ensemble precipitation, we will use archived surface ASOS observations from the KLAN station provided by the [Iowa State University Iowa Environmental Mesonet](https://mesonet.agron.iastate.edu/archive/) archive. This data is in .csv format, so we will use the [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) package to open and reformat it.

In [None]:
import pandas as pd
obs = pd.read_csv('../2015020112/klan_pcp.csv', names=['Station', 'Time', 'Precipitation'], skiprows=1)
obs

In [None]:
obs_pcp = pd.concat((pd.Series(0), obs['Precipitation']))
obs_pcp_sum = obs_pcp.cumsum()
obs_pcp_sum_03h = obs_pcp_sum[::3]
obs_pcp_sum_03h

With the observed precipitation data opened and reformatted, lets plot it along with the ensemble members and mean on the plume to compare each forecast member with the observed precipitation.

In [None]:
fig = plt.figure(1, figsize=(17., 10.))
for i in range(1, 21):
    plt.plot(vtimes[:-8], pcp['mem'+str(i)], label='mem'+str(i))
plt.plot(vtimes[:-8], mean_pcp, label='Mean', color='k', linewidth=4)
plt.plot(vtimes[:-8], obs_pcp_sum_03h, label='Observed', color='tab:red', linewidth=4)
plt.xlim(datetime(2015, 2, 1, 12), datetime(2015, 2, 2, 12))
plt.ylabel('Accumulated Precipitation (mm)')
plt.title('Ensemble Precipitation Plume for Lansing Capital Region International Airport (KLAN)')
plt.grid()
plt.legend()
plt.show()

### Forecast Error ###
We can start to quantify the quality of the ensemble forecast by calculating the forecast error of each member:
\begin{equation}
Error = F - A
\end{equation}

In [None]:
error = {}
for i in range(1, 21):
    error['mem'+str(i)] = pcp['mem'+str(i)] - obs_pcp_sum_03h
    
mean_error = mean_pcp - obs_pcp_sum_03h

Now we can plot the timeseries of member error as well as error of the ensemble mean.

In [None]:
fig = plt.figure(1, figsize=(17., 10.))
for i in range(1, 21):
    plt.plot(vtimes[:-8], error['mem'+str(i)], label='mem'+str(i))
plt.plot(vtimes[:-8], mean_error, label='Mean', color='k', linewidth=4)
plt.xlim(datetime(2015, 2, 1, 12), datetime(2015, 2, 2, 12))
plt.ylabel('Accumulated Precipitation (mm)')
plt.title('Ensemble Precipitation Error for Lansing Capital Region International Airport (KLAN)')
plt.grid()
plt.legend()
plt.show()

Clearly the ensemble is overforecasting precipitation for this location and event. We can quantify the error in each member as well as the mean with the Root Mean Square Error and compare each member.

### Root Mean Square Error (RMSE) ###
[The Root Mean Square Error](http://statweb.stanford.edu/~susan/courses/s60/split/node60.html) is simply the mean difference between the predicted values and the observed values. This can be applied to both the individual members and the ensemble mean.

In [None]:
def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

Typically, verification statistics are computed for a single lead time over multiple locations and forecasts. For this example, we will simply evaluate the performance of the ensemble in forecasting the above observed precipitation accumulation.

In [None]:
mean_rmse = rmse(mean_pcp, pcp_03h)
ens_rmse = {}
for mem in pcp.keys():
    ens_rmse[mem] = rmse(pcp[mem], pcp_03h)

In [None]:
fig = plt.figure(1, figsize=(17., 10.))
members = ['Mean', *[key for key in ens_rmse.keys()]]
y_pos = np.arange(len(members))
 
plt.bar(y_pos, [mean_rmse, *[ens_rmse[key] for key in ens_rmse.keys()]], 
        align='center', alpha=0.8, color='tab:blue')
plt.xticks(y_pos, members)
plt.ylabel('RMSE')
plt.title('RMSE of Accumulated Precipitation for KLAN 12 UTC 1 Feb 2015 - 12 UTC 2 Feb 2015')
plt.grid() 
plt.show()