# Alabama Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Alabama, in and around the city of Montgomery. (-88.5172W to -85.55W, 31.5333N to 32.6131N).

The data used comes from [NOAA](https://www.ncdc.noaa.gov/).

The focus was on six different measurements listed below:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/alabama/united-states/3170" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month.</p>

<p>&nbsp;</p>

<p><img alt="Climate_Montgomery_-_Alabama_and_Weather_averages_Montgomery.jpg" src="img/act_alabama.png" /></p>

<p>&nbsp;</p>

<p> We can see that the observed temperature in our data also follows the same trend. The Y-axis in this graph measures the temperature in Centigrade. </p>

<p><img alt="TOBS.png" src="img/tobs.png" /></p>

<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade and scale the Y-axis values by a factor of 10.</p>

<p>&nbsp;</p>

<p><img alt="TMIN,TMAX.png" src="img/tmin_tmax.png" style="height:300px; width:800px" /></p>

<p>To compare the precipitation&nbsp;we need to translate millimeter/day to inches/month. According to our analysis the average rainfall is 4.00 mm/day (on scaling Y-axis values by 10), which translates to about 4.76 Inches&nbsp;per month. This agrees well with the US climate data. We can also see from our data that there is an increase in the mean precipitation in March and a dip in October.</p>

<p>&nbsp;<img alt="PRCP.png" src="img/prcp.png" style="height:450px; width:600px" /></p>

<p> Since this region very rarely has sub-zero temperatures, we can see that there is hardly any snowfall experienced.</p>

<p><img alt="SNOW,SNWD.png" src="img/snow_snwd.png" style="height:300px; width:800px" /></p>

## PCA Analysis

<p>For each of the six measurement, we compute the percentage of the variance explained as a function of the number of eigen-vectors used.</p>

### Percentage of variance explained.
![VarExplained1.png](img/VarExplained1.png)

<p>We see that the top 5 eigen-vectors explain 22% of variance for TMIN, 45% for TOBS and 19% for TMAX.</p>

![VarExplained2.png](img/VarExplained2.png)

<p>We see that the top 5 eigen-vectors explain 58% of variance for SNOW, 95% for SNWD and 7% for PRCP. Since the percentage of variance explained by a few eigen-vectors is very high, we pursue its analysis first. However, we do not hold out much hope to discover interesting patterns as most values are close to zero. We then analyze the precipitation and temperature data to see if we can uncover any interesting patterns.</p>

## Analysis of snow depth

We choose to analyze the eigen-decomposition for snow-depth because the first 5 eigen-vectors explain 95% of the variance.

First, we graph the mean and the top 3 eigen-vectors.

![SNWD_mean_eigs.png](img/snwd_mean_eigs.png)

<p> We observe that the snowdepth is low as snowfall is low, as has been mentioned before. However, the mean captures the fact that it usually snows a bit from November to the end of March.</p>
<p> The first eigen-function seems to reflect the data if it snows in November-December, the second eigen-vector is a spike to indicate snow in the beginning of the year while the third eigen-vector interprets if it snowed in March.</p>

## Analysis of daily average temperature

We choose to analyze the eigen-decomposition for observed temperature next because the first 5 eigen-vectors explain more than 40% of the variance.

First, we graph the mean and the top 3 eigen-vectors.

![TOBS_mean_eigs.png](img/tobs_mean_eigs.png)

<p>We see that the average daily temperature varies from 8 degree Centigrade in the winter to 25 degree Centigrade in the summer</p>
<p>The first eigen vector is fairly stable. The second eigen vector interprets lower temperature in January-February, higher temperatures in November and December while the third eigen vector captures roughly the opposite trend. They capture variation of the observed temperature as compared to the mean.</p>

## Example of reconstruction

<p> The most positive value for coefficient 2 and the most negative value for coefficient 2 are taken and the reconstructions are plotted below.</p>
![recon_tobs_1.png](img/recon_tobs_1.png)
![recon_tobs_2.png](img/recon_tobs_2.png)
<p>Large positive values correspond to higher observed average temperature while lower values correspond to lower observed temperatures compared to the mean.</p>
<p>From the cumulative distribution function of coefficient 2, we can observe that this inference is validated as half the values are negative while the rest are positive.</p>
![cum_coeff2_tobs.png](img/cum_coeff2_tobs.png)

## The variation in the observed mean temperature is mostly due to year-to-year variation

<p>The following are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively.</p>

** coefficient 2 **  
total RMS                    = 192.931081902  
RMS removing mean-by-station = 189.319527345, fraction explained = 1.87  
RMS removing mean-by-year    = 104.605404872, fraction explained = 45.8

<p>We see that the variation by year explains more than the variation by station.</p>

## Analysis of precipitation

<p>Here, the analysis using principal components and eigen vectors is not shown as they do not explain much of the variance.</p>

<p>This is evident as seen from the map of the region with the principal components for precipitation in the stations color-coded. We see that there is hardly any variation in these coefficients. The radius of each circle represents the number of stations in the region.</p>

![map_prcp.png](img/map_prcp.png)

<p>As we see from the graph below, in our region it rains in about one fourth of the days.</p>

![cdf_prcp.png](img/cdf_prcp.png)

<p>For each pair of stations in our region, we calculate the normalized log probability that it rains on the same days in the two stations. The distribution of the per-day significance obtained is shown below.</p>

![pnorm_prcp.png](img/pnorm_prcp.png)

<p>We can see from the matrix representation of the pairwise probabilities that there does exist some sort of correlations between stations numbered one through eighteen.</p>
![mat_prcp.png](img/mat_prcp.png)

<p>We perform SVD and reorder rows in order to dig into this further. The top 5-6 eigen vectors explain most of the variance in the pairwise probabilities as seen from the graph below.</p>
![var_exp_prcp_pair.png](img/var_exp_prcp_pair.png)

<p>We obtain the 18 correlated stations from our analysis and map them to analyze their spatial relationship. We see from the map that the correlated stations lie to the west of the region or the northeast part of the region (which is at a higher elevation).</p>
![corr_prcp_map.png](img/corr_prcp_map.png)