# Weather Analysis on Florida's Western Seaboard

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the weatern coastline of Florida.

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

## Area Under Consideration

Our data includes measurements from the year 1891 through 2012. The data was collected from 213 weather stations shown in the following diagram. 

<p><img alt="Map-Stations.png" src="report_figures/Map-Stations.png" width = 300 /></p>

## Sanity Check - Comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/tampa/florida/united-states/usfl0481" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month.</p>

<p>&nbsp;</p>

<p><img alt="Climate_Boston_-_Massachusetts_and_Weather_averages_Boston.jpg" src="report_figures/sitetemp.png" width=500 /></p>

<p>&nbsp;</p>


<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade.</p>

<p>&nbsp;</p>

<p><img alt="TMIN,TMAX.png" src="report_figures/tempminmax.png" style="height:300px; width:800px" /></p>

<p>To compare the precipitation, we observe the trend in rainfall in our data and that from the website. It can be seen that rainfall peaks in the months of July to September in both cases</p>

<p>&nbsp;<img alt="PRCP.png" src="report_figures/PRCPmeanstd.png" style="height:450px; width:600px" /></p>

<p>As expected there is almost no snow in florida. The figures below show the values of the snow depth and snowfall thorugh the year. It can be seen that mean snowfall and depth are almost zero, and the non zero parts may be attributed to noise in the data or an extremely anomalous year. <b> We will therefore disregard snow in our analysis </b></p>

<p>&nbsp;<img alt="PRCP.png" src="report_figures/snow.png" style="height:300px; width:800px" /></p>

## PCA analysis

For each of the six measurement, we compute the percentate of the variance explained as a function of the number of eigen-vectors used.

### Percentage of variance explained.
![VarExplained1.png](report_figures/VarExplained1.png)
We see that the top 5 eigen-vectors explain 35% of variance for TMIN, 52% for TOBS and 26% for TMAX.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 41% of the variance.

![VarExplained2.png](report_figures/VarExplained2.png)

The top 5 eigenvectors explain 8% of the variance for PRCP and 100% for SNOW and SNWD. The value for PRCP is very small and for SNOW and SNWD there is no point in doing any analysis for reasons mentioned above.

Based on the above explained variances, we will focus our analysis on TOBS. 

## Analysis of Average Temperature

We choose to analyze the eigen-decomposition for Average Daily Temperature because the first 4 eigen-vectors explain 50% of the variance.

First, we graph the mean and the top 4 eigen-vectors.

We observe that the average daily temperature is about 27 degrees Centigrade is from May through October, peaking in July. For the rest of the year, the average temperature varies, reaching a minimum of about 15 degrees in December/January

<p><img src="report_figures/TOBSeig2.png" width = 800></p>

Next we interpret the eigen-functions. It can be seen that eig1 is always positive while the other eigen-functions take negative and positive values depending on the time of they year. This leads to different interpretations of the significance of these eigenvalues 

They can be interpreted as follows:
* **eig1:** More or less Uniform thorughout the year, taking only positive values. This signifies how much over the mean temperature the temperature in that year was.
* **eig2:** Higher winter temperatures in Jan-March
* **eig3:** Higher winter temperatures in Nov-Dec and Lower in Jan-Feb
* **eig4:** Early onset of summer and late arrival of winter, since this peaks in Match-April and November-December


### Examples of reconstructions

#### Coeff1
Coeff1: most positive
![SNWD_grid_Pos_coeff1.png](report_figures/c0pos.png)
Coeff1: most negative
![SNWD_grid_neg_coeff1.png](report_figures/c0neg.png)
Large positive values of coeff1 correspond to more than average temperatures. While large negative values of coeff1 correspond to less than average temperatures.

#### Coeff2
Coeff2: most positive
![SNWD_grid_Pos_coeff2.png](report_figures/c1pos.png)
Coeff2: most negative
![SNWD_grid_neg_coeff2.png](report_figures/c1neg.png)

Large positive values of coeff2 correspond to higher than normal temperatures in the beginning of the year, while large negarive values correspond to lower than normal temperatures in the beginning of the year.

#### Coeff3
Coeff3: most positive
![SNWD_grid_Pos_coeff3.png](report_figures/c2pos.png)
Coeff3: most negative
![SNWD_grid_neg_coeff3.png](report_figures/c2neg.png)

Large positive values of coeff3 correspond to higher than normal temperatures in the end of the year and lower than normal at the beginning. Vice versa for large negative values



## Spatio-Temporal dependence of the data.

We now estimate the relative importance of location-to-location variation relative to year-by-year variation.

These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

** coeff_1 **  
total MS                   =  373185.752627

MS removing mean-by-station=  163620.320812, percentage explained = 0.57

MS removing mean-by-year   =  328127.494148, percentage explained = 0.121

** coeff_2 **  
total MS                   =  38309.0684317 

MS removing mean-by-station=  33601.4251072, percentage explained = 0.123

MS removing mean-by-year   =  17243.8836198, percentage explained = 0.55 

** coeff_3 **  
total MS                   =  28415.5317557

MS removing mean-by-station=  22994.3261225, percentage explained = 0.191

MS removing mean-by-year   =  15364.8806355, percentage explained = 0.46


We can see that for the first coefficient, the variation by station explaines more of the variance than the variantion by year, while for the second and third coefficients, the variation by of year explains most of the variance. 

** Explanation ** 

As seen in the Analysis above, eig1 gives us aggregate information about whether the temperature in a year was more or less than usual.

On the other hand, eig2 and eig3 give temporal information about which times of the year have more or less temperatures than usual

Therefore, eig1 has a larger spatial dependence, while eig2 and eig3 have a larger temporal dependence

## Visualizing the spatial dependence of coeff_1

We've seen in the above example that eig1 has high spatial dependency. In the plot below, we have plotted the absolute value of coeff_1 for different weather stations based on theur location.
Red points correspond to negative values of coeff_1 and blue to positive. 

It can be seen that coeff_1 tends to be more positive, the more the west we go. This also implies that regions on the west tend to have higher temperatures than the overall average, while temperatures tend to be lower on the east.

<p><img src="report_figures/colplot.png"></p>