# Utah Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the north area of the state of Utah, including Logan, Ogden, Salt Lake City.

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.



## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/logan/utah/united-states/usut0147" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month.</p>

<p>&nbsp;</p>

<p><img alt="Climate_Logan_Outside.jpg" src="r_figures/Climate_Logan_Outside.jpg" /></p>

<p>&nbsp;</p>

##### Min and Max Daily Temperature

<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade. According to US-Climate-Data, the min temperature varies from 10 to 50 Fahrenheit degree, which is -12 to 10 Centigrade degree. And the max temperature varies from 30 to 90 Fahrenheit degree, which is -1 to 32 Centigrade degree. According to our analysis, the max temperature is 0 to 30 Centigrade degree and the min temperature is -10 to 12 Centigrade degree. In addition, the changing trend is also pretty similar.</p>

<p>&nbsp;</p>

<p><img alt="4.4TMIN,TMAX.png" src="r_figures/4.4TMIN,TMAX.png" style="height:300px; width:800px" /></p>


##### Daily Percipitation
<p>To compare the precipitation&nbsp;we need to translate millimeter/day to inches/month. According to our analysis the average rainfall is 2 mm/day which translates to about 2.36 Inches&nbsp;per month. According to US-Climate-Data the average rainfall is closer to 2 inch per month. Besides, we can see that there is less precipitation in June, July and August, which agrees with the outside data.</p>

<p>&nbsp;<img alt="4.4PRCP.png" src="r_figures/4.4PRCP.png" style="height:450px; width:600px" /></p>



## PCA analysis

For each of the six measurement, we compute the percentate of the variance explained as a function of the number of eigen-vectors used.

### Percentage of variance explained.
![VarExplained1.png](r_figures/4.4VarExplained1.png)
We see that the top 5 eigen-vectors explain 37% of variance for TMIN, 57% for TOBS and 36% for TMAX.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 47% of the variance.

![VarExplained2.png](r_figures/4.4VarExplained2.png)

The top 5 eigenvectors explain 13% of the variance for PRCP and 11% for SNOW. Both are low values. On the other hand the top 5 eigenvectors explain 87% of the variance for SNWD. This means that these top 5 eigenvectors capture most of the variation in the snow signals. Based on that we will dig deeper into the PCA analysis for snow-depth.

It makes sense that SNWD would be less noisy than SNOW. That is because SNWD is a decaying integral of SNOW and, as such, varies less between days and between the same date on diffferent years.

## Analysis of snow depth

We choose to analyze the eigen-decomposition for snow-depth because the first 3 eigen-vectors explain 80% of the variance.

First, we graph the mean and the top 3 eigen-vectors.

We observe that the snow season is from November to the end of April, where the middle of February marks the peak of the snow-depth.
![SNWD_mean_eigs.png](r_figures/4.5SNWD_mean_eigs.png)

Next we interpret the eigen-functions. The first eigen-function (eig1) has a shape very similar to the mean function if multiplied by a negative coefficient. The changing trend is pretty similar. The interpretation of this shape is that eig1 represents the overall amount of snow above/below the mean, but without changing the distribution over time.

**eig2 and eig3** are similar in the following way. They all oscilate between positive and negative values. In other words, they correspond to changing the distribution of the snow depth over the winter months, but they don't change the total (much).

They can be interpreted as follows:
* **eig2:** less snow in Nov - Feb, more snow in mid-Feb - April.
* **eig3:** slightly more snow in Nov - Jan, less snow in Deb, more snow in March.


### Examples of reconstructions

#### Coeff1
Coeff1: most positive 12
![4.5c1_pos12.png](r_figures/4.5c1_pos12.png)
Coeff1: most negative 12
![4.5c1_neg12.png](r_figures/4.5c1_neg12.png)
Large positive values of coeff1 correspond to more than average snow. Low values correspond to less than average snow.

#### Coeff2
Coeff2: most positive 12
![4.5c2_pos12.png](r_figures/4.5c2_pos12.png)
Coeff2: most negative 12
![4.5c2_neg12.png](r_figures/4.5c2_neg12.png)

Large positive values of coeff2 correspond to a late snow season (most of the snowfall is Feb). Negative values for coeff2 correspond to an early snow season (most of the snow is before Feb).

#### Coeff3
Coeff3: most positive 12
![4.5c3_pos12.png](r_figures/4.5c3_pos12.png)
Coeff3: most negative 12
![4.5c3_neg12.png](r_figures/4.5c3_neg12.png)

Large positive values of coeff3 correspond to a snow season with two spikes: one in the March, the other at the end of Jan. Negative values of coeff3 correspond to a season with a single peak at the end of Feb.



## The variation in the timing of snow is mostly due to year-to-year variation
In the previous section we see the variation of Coeff1, which corresponds to the total amount of snow, with respect to location. We now estimate the relative importance of location-to-location variation relative to year-by-year variation.

These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

** coeff_1 **  
total MS                   =  2379089.0691  
MS removing mean-by-station=  1361932.2784 , fraction explained =  42.7540441387  
MS removing mean-by-year   =  1396429.89226 , fraction explained =  41.3040095724  

** coeff_2 **  
total MS                   =  362293.993716  
MS removing mean-by-station=  330969.764203 , fraction explained =  8.64608027064  
MS removing mean-by-year   =  119749.511364 , fraction explained =  66.9468681676  

** coeff_3 **  
total MS                   =  357987.817788  
MS removing mean-by-station=  297067.574754 , fraction explained =  17.0174067404  
MS removing mean-by-year   =  155071.1418 , fraction explained =  56.6825645749   


We see that the variation by year explains more than the variation by station. However this effect is weaker consider coeff_1, which has to do with the total snowfall, vs. coeff_2,3 which, as we saw above have to do with the timining of snowfall. We see that for coeff_2,3 the stations explain 10-15% of the variance while the year explaines 60-70%.