# Colorado Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Colorado.

<p><img alt="Colorado_map.png" src="hw5_figures/Colorado_map.png" width="50%" height="50%" /></p>

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a> US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month.</p>



<p>&nbsp;</p>

<p><img alt="Denver_Climate.png" src="hw5_figures/Denver_Climate_centigrade.png" width="50%" height="50%" /></p>

<p>&nbsp;</p>

<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade.</p>

<p>&nbsp;</p>

<p><img alt="TMIN_TMAX.png" src="hw5_figures/TMIN_TMAX.png" style="height:300px; width:800px" /></p>

<p>According to our analysis the average rainfall is 1.5 mm/day which translates to about 45 mm&nbsp;per month. According to US-Climate-Data the average rainfall is closer to 50 mm per month. Our analysis agrees very well with the data obtained from US Climate data. Another point that can be noted from the data analysis is that, precipitation varies a lot throughout the year.</p>

<p>&nbsp;<img alt="PRCP_Mean.png" src="hw5_figures/PRCP_Mean.png" style="height:350px; width:400px" /></p>


## PCA analysis

For each of the six measurement, we compute the percentage of the variance explained as a function of the number of eigen-vectors used.

### Percentage of variance explained.
![PCA_temp.png](hw5_figures/PCA_Temp.png)
We see that the top 5 eigen-vectors explain 52% of variance for TMIN, 66% for TOBS and 55% for TMAX. Just the first Eigen vector itself explains a good amount of variance as shown in the graphs.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 61% of the variance.

![PCA_Snow.png](hw5_figures/PCA_Snow.png)

The top 5 eigenvectors explain 12% of the variance for PRCP and 15% for SNOW. Both are low values. On the other hand the top 5 eigenvectors explain 90% of the variance for SNWD. 
The first Eigen vector, 80% itself captures most of the variation in snow depth. Based on that we will dig deeper into the PCA analysis for snow-depth.

It makes sense that SNWD would be less noisy than SNOW. That is because SNWD is a decaying integral of SNOW and, as such, varies less between days and between the same date on diffferent years.

## Analysis of snow depth

We choose to analyze the eigen-decomposition for snow-depth because the first eigen-vector explain 80% of the variance and the first five explain 90% of the variance which is a very favourable for credible data analysis. 

First, we graph the mean and the top 5 eigen-vectors.

We observe that the snow season is from November to late May, where the middle of February marks the peak of the snow-depth. Another observation is that there is absolutely no snow between June and October.
![SNWD_Mean_Eigs_new.png](hw5_figures/SNWD_Mean_Eigs_new.png)

Next we interpret the eigen-functions. The first eigen-function (eig1) has a shape very similar to the mean function. The only difference is that the timing of the peak is different by just half-month.  The interpretation of this shape is that eig1 represents the overall amount of snow above/below the mean, but without changing the distribution over time.

**eig2,eig3,eig4 and eig5** are similar in the following way. They all oscilate between positive and negative values. In other words, they correspond to changing the distribution of the snow depth over the winter months, but they don't change the total (much).

They can be interpreted as follows:
* **eig2:** less snow between April and July, more snow between January and March
* **eig3:** more snow between January and February and between May and July, less snow between March and May
* **eig4:** less snow between October and December
* **eig5:** more snow in January and between April and May, less snow between July and August



### Examples of reconstructions (SNWD with 3 Eigen vectors)

Note that the coeff names are based on 0-indexing and only first 3 Eigen vectors have been considered as these itself sufficiently show enough variance

#### Coeff0
Coeff0: most positive
![c0_snwd_pos.png](hw5_figures/c0_snwd_pos.png)
Coeff0: most negative
![c0_snwd_neg.png](hw5_figures/c0_snwd_neg.png)
Large positive values of coeff0 correspond to more than average snow. Low values correspond to less than average snow. According to the graph below, 38% of the instances follow negative analysis and the rest positive.

<p><img alt="c0_snwd_inst.png" src="hw5_figures/c0_snwd_inst.png" width="30%" height="30%" /></p>

#### Coeff1
Coeff1: most positive
![c1_snwd_pos.png](hw5_figures/c1_snwd_pos.png)
Coeff1: most negative
![c1_snwd_neg.png](hw5_figures/c1_snwd_neg.png)

Large positive values of coeff1 correspond to a shorter snow season, between December and March. Negative values for coeff1 correspond to a longer snow season, between December and June. According to the graph below, 25% of the instances follow negative analysis and the rest positive.

<p><img alt="c1_snwd_inst.png" src="hw5_figures/c1_snwd_inst.png" width="30%" height="30%" /></p>

#### Coeff2
Coeff2: most positive
![c2_snwd_pos.png](hw5_figures/c2_snwd_pos.png)
Coeff2: most negative
![c2_snwd_neg.png](hw5_figures/c2_snwd_neg.png)

Large positive values of coeff2 correspond to a snow season with two spikes: one in February and the other at in June. Negative values of coeff3 correspond to a season with a single peak at the end of March. According to the graph below, 50% of the instances follow negative analysis and the rest positive.

<p><img alt="c2_snwd_inst.png" src="hw5_figures/c2_snwd_inst.png" width="30%" height="30%" /></p>




## The variation in the timing of snow is mostly due to year-to-year variation
In the previous section we see the variation of Coeff1, which corresponds to the total amount of snow, with respect to location. We now estimate the relative importance of location-to-location variation relative to year-by-year variation.

These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

** coeff_0 **  
total RMS                   =  4716.23111173

RMS removing mean-by-station=  1770.67051939, fraction explained= 62.45

RMS removing mean-by-year   =  3822.05449028, fraction explained= 18.96

** coeff_1 **  
total RMS                   =  1476.9291627

RMS removing mean-by-station=  925.75164404, fraction explained= 37.32

RMS removing mean-by-year   =  1248.39893937, fraction explained= 15.47

** coeff_2 **  
total RMS                   =  1091.96967689

RMS removing mean-by-station=  850.223596609, fraction explained= 22.14

RMS removing mean-by-year   =  954.761614852, fraction explained= 12.57


We see that the variation by station explains more than the variation by year.

## Analysis of TOBS

We choose to analyze the eigen-decomposition for TOBS(temperature at the time of observation) because the first eigen-vector explain 60% of the variance and the first three explain 65% of the variance which is a very favourable for credible data analysis. 

First, we graph the mean and the top 5 eigen-vectors.

We observe that the period with highest TOBS is between June and August. From November to March, TOBS is very low.
![TOBS_Mean_Eigs.png](hw5_figures/TOBS_Mean_Eigs.png)

### Reconstructions (TOBS with 3 Eigen vectors)

The reconstructions do not teach much about the coefficients and how they vary throughout the year as seen evident by the picture below which is for coeff_0 in decreasing order. One possible reason for this could be that the target graph is itself not very consistent throughout the year. It fluctutates a lot on a daily basis. For this reason, it wouldn't be wise to expect credible data analysis results for this.

** coeff_0 ** 

coeff0:Most positive
![c0_tobs_pos.png](hw5_figures/c0_tobs_pos.png)

coeff0:Most negative
![c0_tobs_neg.png](hw5_figures/c0_tobs_neg.png)

Large positive values of coeff0 correspond to more than average snow. Low values correspond to less than average snow.

Similar reconstructions can be done for coeff1 and coeff2 but for the reasons mentioned earlier, we shouldn't draw hard conclusions from this for the reasons mentioned before.
