# Florida Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Florida. Our data covers 213 different locations in Florida, including cities like Starke, Venice, Brandon, etc. 

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN:** the daily minimum temperature.
* **TMAX:** the daily maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

## 1. Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/starke/florida/united-states/usfl0466" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month. We use a typital city Starke as an example. </p>

<p>&nbsp;</p>

<p><img alt="Climate_Florida_Starke.png" src="hw5_figures/Climate_Florida_Starke.png" style="height:400px" /></p>

<p>&nbsp;</p>

<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade.</p>

<p>&nbsp;</p>

<p><img alt="TMIN,TMAX.png" src="hw5_figures/TMIN,TMAX.png" style="height:300px; width:800px" /></p>

<p>To compare the precipitation&nbsp;we need to translate millimeter/day to inches/month. According to our analysis the average rainfall is 3.50 mm/day(a little more from June to October and a little less in other months), which translates to about 4.13 Inches&nbsp;per month. According to US-Climate-Data the average rainfall is closer to 4.2 inch per month. However, there is clear agreement that average precipitation&nbsp;is very close to a constant throughout the year.</p>

<p>&nbsp;<img alt="PRCP.png" src="hw5_figures/PRCP.png" style="height:350px;" /></p>


## 2. PCA analysis

For each of the six measurement, we compute the percentate of the variance explained as a function of the number of eigen-vectors used.

### Percentage of variance explained.
![VarExplained1.png](hw5_figures/VarExplained1.png)
<p>&nbsp;</p>
We see that the top 5 eigen-vectors explain 35% of variance for TMIN, 52% for TOBS and 26% for TMAX.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 42% of the variance. We will do a further PCA analysis for TOBS below.

![VarExplained2.png](hw5_figures/VarExplained2.png)

The top 5 eigenvectors explain 8% of the variance for PRCP, which is very low. On the other hand the top 5 eigenvectors explain %100 of the variance for SNOW and SNWD. This means that these top 5 eigenvectors capture all of the variation in the snow signals. 

We also notice only two eigenvectors is almost enough to capture 100% of the variance for SNWD, but SNOW needs 5 eigenvectors, which means that SNWD is a little less noisy than SNOW. This is because SNWD is a decaying integral of SNOW and, as such, varies less between days and between the same date on diffferent years. 

Based on that we will dig deeper into the PCA analysis for snow-depth below.

### Analysis of snow depth (SNWD)

We choose to analyze the eigen-decomposition for snow-depth because the first 2 eigen-vectors explain 100% of the variance.

First, we graph the mean and the top 2 eigen-vectors.

We observe that snowfall is very rare in Florida. There might be some snowfalls during winter season, namely December and January, and several snowfalls in early April. But most of the year, there is no snow.

![TOBS_mean_eigs.png](hw5_figures/SNWD_mean_eigs.png)

Next we interpret the eigen-functions. 

The first eigen-function **eig1** has a shape very similar to the mean function. The main difference is that the eigen-function is close to zero during April while the mean is not.  The interpretation of this shape is that eig1 represents the overall amount of snow above/below the mean, but without changing the distribution over time. As for **eig2**, most of the time it has zero value, while it only has negative values during April. 

### Analysis of  average temperature (TOBS)

Since Florida is a very warm city, so we are more interested in the temperate climate there instead of snow. Therefore, we will do PCA analysis of average temporature below.


First, we graph the mean and the top 3 eigen-vectors.

We observe that Florida is warm almost all year around, reaching its highest temperature from July to September and lowest from December to January.

![SNWD_mean_eigs.png](hw5_figures/TOBS_mean_eigs.png)

Next we interpret the eigen-functions. 

Similar to our snow-depth analysis above, the first eigen-function **eig1** has a shape similar to the mean function. The main difference is that the eigen-function is close to zero all year around while the mean is going up in the summer and going down in the winter. The interpretation of this shape is that eig1 represents the overall average temperature above/below the mean, but without changing the distribution over time. As for **eig2**, it oscillates more than eig1 throughout the year. 

### Examples of reconstruction of TOBS

As examples, we selectively reconstruct 9 TOBS below. These data is collected at different locations in different years.

![TOBS_Example_Year](hw5_figures/TOBS_Example_Year.png)

The residules corresponding to each TOBS data is showed blow. For example, the first one (location USC00080535 in year 1985) has the residules: mean=0.08, r1=0.47, r2=0.38, r3=0.38.

![TOBS_Example_Resi](hw5_figures/TOBS_Example_Resi.png)



## 3. Variation of the average temperature is mostly due to location-to-location variation
We now estimate the relative importance of location-to-location variation relative to year-by-year variation.

These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

** coeff_1 **  
total MS                   =  373185.752627
MS removing mean-by-station=  163620.320812, fraction explained=56.16
MS removing mean-by-year   =  328127.494148, fraction explained=12.07

** coeff_2 **  
total MS                   =  38309.0684317
MS removing mean-by-station=  33601.4251072, fraction explained=12.29
MS removing mean-by-year   =  17243.8836198, fraction explained=54.99

** coeff_3 **  
total MS                   =  28415.5317557
MS removing mean-by-station=  22994.3261225, fraction explained=19.08
MS removing mean-by-year   =  15364.8806355, fraction explained=45.93 

We notice that for coeff_1, the variation by location explains more than the variation by year. But for coeff_2 and 3, it is in the reverse way.