# Gulf Coast Weather Analysis

## The Data
Data includes observations provided by the National Centers for Environmental Information at NOAA. 177 weather stations are cataloged along Alabama and Florida's Gulf Coast. No more than 102 stations were active in a given year.

**NCEI homepage**: https://www.ncei.noaa.gov/  
**Data root**: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/  
**Data description**: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt  

### Measurements 
<table width='100%'>
<tr>
<td valign='top' align='left' style='vertical-align:top;text-align:left;'>
<h3> daily maximum temperature </h3> 
  <ul>
  <li>abbreviation: TMAX</li> 
  <li>units: tenths of degrees C</li> 
  <li>mean: 253.426915636</li> 
  <li>std: 71.8990185977</li> 
  </ul>
<h3> daily minimum temperature </h3> 
  <ul>
  <li>abbreviation: TMIN</li>
  <li>units: tenths of degrees C</li>
  <li>mean: 134.153128411</li>
  <li>std: 81.144135703</li>
</ul>
<h3> daily average temperature</h3> 
  <ul>
  <li>abbreviation: TOBS</li>
  <li>units: tenths of degrees C</li>
  <li>mean: 179.088370362</li>
  <li>std: 82.3701917823</li>
  </ul>
</td>
<td valign='top' align='left' style='vertical-align:top;text-align:left;'>
<h3>daily precipitation</h3>
<ul>
  <li>abbreviation: PRCP</li>
  <li>units: mm</li>
  <li>mean: 36.390542953</li>
  <li>std: 97.5160490716</li>
</ul>
<h3> daily snowfall</h3>
<ul>
  <li>abbreviation: SNOW</li>
  <li>units: mm</li>
  <li>mean: 0.0156396580793</li>
  <li>std: 1.29291515637</li>
</ul>
<h3> daily snow depth</h3>
  <ul>
  <li>abbreviation: SNWD</li>
  <li>units: mm</li>
  <li>mean: 0.0111986744941</li>
  <li>std: 2.31504805586</li>
  </ul>
</td>
</tr>
</table>

<img src="img/stations.png">
*The size of the station marker indicates the number of observations available*

### Historical

Data ranges from 1890 to 2012. The number of active stations in a given year falls into four distinct time periods 

- between 1890 and the early 1950s there was a slow, steady increase of active stations
- the beginning of the Cold War sees a marked spike that lasts until the early 1960s
- the 1970s and 80s maintain the infrastructure
- in the 1990s a steady increase begins again, potentially motivated by the expanding internet. In 2008, the start of a new presidential administration, active rain guage stations jump significantly.

<table>
    <tr>
        <td>
            <img src="img/stations_by_year.png" />
        </td>
        <td>
            <img src="img/years_by_stations.png"/>
        </td>
    </tr>
    <tr>
    <td>
            <img src="img/stations_by_measurement.png" />
        </td>
        <td>
            <img src="img/obs_by_measurement.png" />
        </td>
    </tr>
</table>

### Missing Data

The percent of missing observations improved drastically starting in 1930. Between 1930 and 2000 completeness remained fairly consistent until a new trend of missing data starts. Large numbers of new stations coming online, new sensor technology, as well as the increase in less consistent rain guages could accound for the increase in missing observations. 

<table>
<tr>
<td>
<img src="img/nan_by_year.png">
</td>
<td>
<img src="img/obs_by_year.png">
</td>
</tr>
</table>


On aggregate, small variations in completeness occur near the beginning and end of the year. Overall missing observations are a small factor in this data set and are ignored for all calculations below.
<table>
    <tr>
        <td> <img src="img/obs_by_day_1.png">
        </td>
    </tr>
    <tr>
        <td> <img src="img/obs_by_day_2.png">
        </td>
    </tr>
</table>

## Temperature

The aggregated mean temperature along the Gulf Coast follows an intuitive Southern pattern of highly variable winters and consistently hot summers.  A PCA decomposition of the mininmum, observed average, and maximum temperatures produces similar top 3 eigenvectors with some subtle differences. 

<table>
<tr>
<td colspan=3><img src='img/temp_mean.png'></td></tr>
<tr>
<td colspan=3><img src='img/temp_eigen.png' /></td></tr>
</table>

In all three of the examples above, a negative contribution from a principal component corresponds with an increase in temperature where a positive contribution corresponds with a temperature decrease. The first component in each follows the mean. The second and third components highlight the change in seasons. We can see that spring typically begins promptly in April and that the length of Fall can depend on principal component 2 and 3.

Below, each station's average PC1 coefficient and PC2 coefficient for TMAX is mapped. Black indicates positive, red indicates negative, and the radius indicates the magnitude in either direction. PC1 is weakest at the coast, strongest inland and PC2 is strongest on the coast and weakest inland.   

Referring back to the plot of TMAX's eigenvectors and following PC2, we can see that coastal stations are likely to have a mild fall, a cold but quick winter, and a summer with temps approaching inland measurements. 

Attributing components to physical factors isn't quite so cut and dry, however. Another potential pattern these coefficients match is station density. There could be influence of urban vs rural measurements.

<table>
<tr>
<td><img src='img/temp_tmax_coeff1.png' /></td>
</tr>
<tr>
<td><img src='img/temp_tmax_coeff2.png' /></td>

</tr>
</table>

Regardless of the physical underpinnings, the above pattern held strong for TMIN, but when TOBS was reviewed the same description of components didn't fit.

<img src='img/temp_historical_mean.png' />

Looking at the mean annual temperatures across the data set, there is an obvious change in the relationship between TOBS and TMIN/TMAX starting in the 1980s. Since TOBS was described by the mean better than TMIN and TMAX (see Variance Explained plots below), it is possible PC1 for TOBS indicates some other pattern. Since no geographical patterns were found, looking at how average TOBS PC1 coefficients change over time might highlight features of the observations.  

<table>
<tr>
<td></td>
</tr><tr>
<td><img src='img/temp_tobs_coeff1.png' /></td>
</tr>
</table>

Indeed, there is a significant downward shift starting in 1986. Checking equipment history on the NCEI website shows new MMTS sensors were widely installed in October 1986([example](https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USC00012675/detail)). Further research shows the discussion in the meteorological community on issues with data consistency and correcting TMIN and TMAX values based on changing measurements techniques ([ex1](http://journals.ametsoc.org/doi/full/10.1175/1520-0426%282004%29021%3C1590%3AATCBTM%3E2.0.CO%3B2), [ex2](https://wattsupwiththat.com/2014/06/28/the-scientific-method-is-at-work-on-the-ushcn-temperature-data-set/), [ex3](https://fallmeeting.agu.org/2015/files/2015/12/Press-Release-NEW-STUDY-OF-NOAA-USHCN.pdf)), which explains why TMIN amd TMAX do not follow the same trend. 

When the above boxplot is considered with the Stations per Year breakdown, the four historical periods of active stations are evident. As the strongest eigenvector, these findings highlight the importance of consistent measurements. Less obvious patterns, such as proximity to water or cities, can get lost. 


<table>
<tr><td colspan=3><img src='img/temp_var_explained.png' /></td></tr>
<tr>
<td><img src='img/temp_tmin_res3_cdf.png' /></td>
<td><img src='img/temp_tobs_res3_cdf.png' /></td>
<td><img src='img/temp_tmax_res3_cdf.png' /></td>
</tr>
</table>

Unsurprisingly, TOBS, a measured average, is fit better by the mean than TMIN or TMAX, and is reconstructed more accurately by the top 3 eigenvectors.

## Precipitation

Precipitation for the area stayed pretty consistent throughout the year with small dips in the spring and fall. Compared with temperature measurements, precipitation was described less accurately by the top three eigenvectors, even the top 100. This shows the high variability of rain.
<table>
<tr>
    <td colspan=2><img src='img/prcp_mean.png'></td>
</tr>
<tr>
    <td colspan=2><img src='img/prcp_eigen.png'></td>
</tr>
<tr>
    <td><img src='img/prcp_var_explained.png'></td>
    <td><img src='img/prcp_res3_cdf.png'></td>
</tr>
</table>

Plotting the average annual rainfall for each station shows no geographic influence on rainfall amounts
<img src='img/prcp_by_map.png'>

## Snow

Although rare, snow does happen in Alabama The below plot shows the total snow observed by all stations. Larger blocks within the stacks indicate more snow at a single station, while many blocks within a stack indicates many stations recording snow. Old-timers still tell stories of [the New Years Storm of 1963]('https://en.wikipedia.org/wiki/New_Year%27s_Eve_1963_snowstorm') and [the Storm of the Century in 1993]('https://en.wikipedia.org/wiki/1993_Storm_of_the_Century').

<img src='img/snow_by_year.png' />
 
I noticed, though, that I couldn't find any records of the apocalyptic storm the data shows in 1969. The anomaly went back to two stations which appeared to be going through diagnostics in October 1969. Below, with the inaccurate data removed, the snowfall history is more visible.    
<table>
<tr>
<td>USW00003852</td>
<td style="text-align:left">USW00003850</td>
</tr>
<tr>
<td>
September 30, 1969: NaN<br>
October 01, 1969: NaN<br>
October 02, 1969: 25.0 mm<br>
October 03, 1969: 76.0 mm<br>
October 04, 1969: 178.0 mm<br>
October 05, 1969: NaN<br>
October 06, 1969: NaN<br>
October 07, 1969: NaN<br>
October 08, 1969: NaN<br>
October 09, 1969: NaN<br>
October 10, 1969: NaN<br>
October 11, 1969: NaN<br>
October 12, 1969: NaN<br>
October 13, 1969: NaN<br>
October 14, 1969: NaN<br>
October 15, 1969: 102.0 mm<br>
October 16, 1969: NaN<br>
October 17, 1969: 127.0 mm<br>
October 18, 1969: 178.0 mm<br>
October 19, 1969: NaN<br>
October 20, 1969: NaN<br>
October 21, 1969: 76.0 mm<br>
October 22, 1969: 25.0 mm<br>
October 23, 1969: 178.0 mm<br>
October 24, 1969: 152.0 mm<br>
October 25, 1969: 152.0 mm<br>
October 26, 1969: 152.0 mm<br>
October 27, 1969: 51.0 mm<br>
October 28, 1969: 178.0 mm<br>
October 29, 1969: 152.0 mm<br>
October 30, 1969: NaN<br>
</td>
<td style="text-align:left">
September 30, 1969: NaN<br>
October 01, 1969: NaN<br>
October 02, 1969: 102.0 mm<br>
October 03, 1969: 102.0 mm<br>
October 04, 1969: 178.0 mm<br>
October 05, 1969: NaN<br>
October 06, 1969 178.0 mm<br>
October 07, 1969: NaN<br>
October 08, 1969: 203.0 mm<br>
October 09, 1969: NaN<br>
October 10, 1969: NaN<br>
October 11, 1969: NaN<br>
October 12, 1969: NaN<br>
October 13, 1969: NaN<br>
October 14, 1969: NaN<br>
October 15, 1969: NaN<br>
October 16, 1969: NaN<br>
October 17, 1969: 203.0 mm<br>
October 18, 1969: 203.0 mm<br>
October 19, 1969: NaN<br>
October 20, 1969: 51.0 mm<br>
October 21, 1969: 102.0 mm<br>
October 22, 1969: 152.0 mm<br>
October 23, 1969: 178.0 mm<br>
October 24, 1969: 178.0 mm<br>
October 25, 1969: 102.0 mm<br>
October 26, 1969: 178.0 mm<br>
October 27, 1969: 51.0 mm<br>
October 28, 1969: 178.0 mm<br>
October 29, 1969: 178.0 mm<br>
October 30, 1969: 203.0 mm<br>
</td>
</tr>
</table>


<img src='img/snow_by_year_clean.png' />

## Future Work

Applying PCA decomposition to weather data provides a lot of descriptive power. Further experimentation will follow Dr. Bob Livezey's best practices described in [his 2005 talk]('http://www.cpc.ncep.noaa.gov/products/outreach/proceedings/cdw30_proceedings/Livezey_PCA_PSU.ppt').