## Ensemble Visualization and Verification in Python 
**Tyler Wixtrom**<br>
Texas Tech University<br>
tyler.wixtrom@ttu.edu<br>
<br>
**Unidata Users Workshop**<br>
25-28 June 2018<br>
Boulder, CO

   # Overview #

1. Plots
    * Spaghetti Plots and Postage Stamps
    * Paintball Plots
    * Probability Plots
2. Verification
    * RMSE
3. Examples

<h1><center>Examples and Data Repository</center></h1>
<br>
<br>
<center>
<href>https://github.com/tjwixtrom/workshop2018</href></center>

<h1><center>Spaghetti Plots and Postage Stamps</center></h1>

Spaghetti plots and postage stamp plots are both concise,
simple methods of viewing all members in a ensemble on the same plot. 

Spaghetti plots are generally used for contoured
fields (e.g. 500 hPa geopotential height) while postage stamps
are more commonly reserved for shaded fields (e.g. simulated
reflectivity).

<center>
![spaghetti](spaghetti.png "Example 500 hPa Spaghetti Plot")
</center>

<center>
![](postage_stamp.png)
</center>

Advantages
* Displays information from each individual member
* Full location and magnitude of each field is plotted
* Simple to interpret
<br>
<br>

Disadvantages
* Can be very busy
* Becomes increasingly hard to interpret as spread increases
* Easy to miss small differences among members (esp. postage stamps)

<h1><center>Paintball Plots</center></h1>

Paintball plots are plots where individual points greater than a specified threshold (e.g. simulated reflectivity $\geq$ 25 dBZ) are plotted, color-coded by ensemble member.

<center>
![](paintballs.png)
</center>

Advantages
* Easy to quickly see spatial variations among individual member solutions
* Plot remains relatively uncluttered, even for ensembles with many members
<br>
<br>

Disadvantages
* Very little information regarding variations in intensity
* Thresholds are arbitrary and may not be best for all cases

<h1><center>Probability Plots

If an ensemble contains a sufficient number of members and
sufficient spread, probabilities of event occurence can be
calculated based on the individual members. This is done at a
single point with the following method (Schwartz and Sobash,
2017):

# Ensemble Probability
1. Define the Binary Probability (BR) for each member at each point for field $ f$ and threshold $ q$
<br>
<br>
\begin{equation}
    BP(q)_{ij} = \left\{
        \begin{array}{ll}
            1 & \quad f_{ij} \geq q \\
            0 & \quad f_{ij} < q
        \end{array}
    \right.
   \end{equation} 

2. Calculate the average of ensemble member binary probabilities at each point to define the Ensemble Probability (EP)
<br>
<br>
\begin{equation}
     EP(q)_{i} = \frac{1}{N}\sum^{N}_{j=1}BP_{ij}
\end{equation}

<center>
![probabilities](probs.png "Example Ensemble Probability Plot")
</center>

<h1><center>Neighborhood Ensemble Probability

Since the ensemble probability is valid for only a single point
and does not account for small spatial differences in member
solutions, the Neighborhood Ensemble Probability (NEP) can
be defined as the probability of event occurence within a
specified radius of any point (Schwartz and Sobash, 2017).

To calculate the NEP for a given field and threshold $q$:
* Begin by calculating the EP at each point
* Find all points within specified radius and calculate mean of EP
<br>
<br>
\begin{equation}
      NEP(q)_{i} = \frac{1}{N_b}\sum^{N_b}_{j=1}EP_{ij}
\end{equation}

<center>
![](neighbor_probs.png)
</center>

Advantages
* Probabilistic forecasts of specific events
* Quickly shows ensemble confidence
* Relatively simple to interpret
<br>
<br>

Disadvantages
* Probabilities are often too high due to low model spread
* Assumes that all member solutions are equally likely
* Limited information regarding member differences in intensity and location

<h1><center>Plots Summary

1. Spaghetti Plots and Postage Stamps
    * Best for viewing individual members, ensemble spread, range of solutions
2. Postage Stamp Plots
    * Best for quickly viewing spatial differences among member solutions
3. Probability Plots
    * Best for probability of event occurence and ensemble confidence

<h1><center>Verification

There are many verification metrics available which can be placed into three categories:

* Grid-point (Wolff et al., 2014)
* Object-based (Davis et al., 2006)
* Neighborhood approaches (Schwartz and Sobash, 2017)
<br>

One very simple approach that is often used in the Root Mean
Square Error (RMSE).

<h1><center>Root Mean Square Error

The Root Mean Square Error is defined as:
<br>
<br>
\begin{equation}
    RMSE = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(F_i - O_i)^2}
\end{equation}
<br>
* Measure of mean magnitude of forecast error
* Same units as forecast value
* Simple to compute and interpret
* Useful for comparing two models, multiple members, etc.
* Holmes (2000)

<h1><center>Examples

1. Open terminal
2. Navigate to `workshop2018` folder
3. `source activate workshop` for mac users, `activate workshop` for windows users
4. Type `jupyter notebook`

# References

<p style="margin-left: 40px; text-indent: -40px;"> Davis, C., B. Brown, and R. Bullock, 2006: Object-Based Verification of Precipitation Forecasts. Part I: Methodology and Application to Mesoscale Rain Areas. *Mon. Weather Rev.*, **134 (7)**, 1772-1784.</p>

<p style="margin-left: 40px; text-indent: -40px;"> Holmes, S., 2000: RMS Error. URL statweb.stanford.edu/~susan/courses/s60/split/node60.html.</p>

<p style="margin-left:40px; text-indent: -40px;"> Schwartz, C. S., and R. A. Sobash, 2017: Generating Probabilistic Forecasts from Convection-Allowing Ensembles Using Neighborhood Approaches: A Review and Recommendations. *Mon. Weather Rev.*, **145 (9)**, 3397-3418.</p>

<p style="margin-left: 40px; text-indent: -40px;"> Wolff, J. K., M. Harrold, T. Fowler, J. H. Gotway, L. Nance, and B. G. Brown, 2014: Beyond the Basics: Evaluating Model-Based Precipitation Forecasts Using Traditional, Spatial, and Object-Based Methods. *Weather Forecast.*, **29 (6)**, 1451-1472.</p>