In [9]:
!open r_figures/TMIN,TMAX.png

# Colorado Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Massachusets.

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature in celsius.
* **TOBS:** The average temperature for each day in celsius.
* **PRCP:** Daily Percipitation in mm.
* **SNOW:** Daily snowfall in mm.
* **SNWD:** The depth of accumulated snow in mm.
* **Elevation:** Altitude of the weather station in meters

## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/boston/massachusetts/united-states/usma0046" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month in Colorado for each of the following four cities: Aspen, Grand Junction, Denver, and Vail.</p>

<p>&nbsp;</p>




<p><img alt="Colorado_Weather_Averages.png" src="g_figures/Colorado_Weather_Averages.png" /></p>

<p>We see that the min and max daily&nbsp;temperature agree with the ones we got from our data, once we translate Fahrenheit to Centigrade.</p>

<p><img alt="TMIN,TMAX.png" src="g_figures/TMIN,TMAX_STD.png" /></p>

<p>&nbsp;<img alt="TOBS,PRCP.png" src="g_figures/TOBS,PRCP_STD.png" /></p>

<p> We can see that the average precipitation is also in agreement between our data and the US Climate Data. Rainfall picks up during the spring and fall seasons while faling off in summer and winter. According to our data, our average annual precipitation is 12.54mm which is also in agreement with both graphs.</p>




## PCA analysis

For each of the six measurement, we compute the percentate of the variance explained as a function of the number of eigen-vectors used.

### Percentage of variance explained.
<p>&nbsp;<img alt="VarExplained11.png" src="g_figures/VarExplained1.png" /></p>

We see that the top 5 eigen-vectors explain over 50% of variance for TMIN, 60% for TOBS and 50% for TMAX.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 60% of the variance.

<p>&nbsp;<img alt="VarExplained2.png" src="g_figures/VarExplained2.png" /></p>

The top 5 eigenvectors explain 14% of the variance for SNOW and 12% for PRCP. Both are low values. On the other hand the top 5 eigenvectors explain almost 90% of the variance for SNWD. This means that these top 5 eigenvectors capture most of the variation in the snow signals. Based on that we will dig deeper into the PCA analysis for snow-depth.

It makes sense that SNWD would be less noisy than SNOW. That is because SNWD is a decaying integral of SNOW and, as such, varies less between days and between the same date on diffferent years.

## Analysis of snow depth

We choose to analyze the eigen-decomposition for snow-depth because the first 4 eigen-vectors explain over 80% of the variance.

First, we graph the mean and the top 4 eigen-vectors.

We observe that the snow season is from mid-november to the middle/end of June, where the middle of February marks the peak of the snow-depth.
![SNWD_mean_eigs.png](g_figures/Snwd_mean_eigs.png)

Next we interpret the eigen-functions. The first eigen-function (eig1) has a shape very similar to the mean function. The main difference is that the eigen-function peaks out in march and the slopes are much less steep than the actual data.  The interpretation of this shape is that eig1 represents the overall amount of snow above/below the total mean, but without changing the distribution over time.

**Eig2,Eig3 and Eig4** all represent the changes in Eig1 throughout the year with Eig2 and Eig3 showing the general trends in the first half of the year and Eig4 showing the latter half. 


They can be interpreted as follows:
* **Eig2:** resembles a derivative of the Eig1. In march, Eig2 dips below zero which is mirrored in Eig1 hitting the peak of the curve and slowly trending downwards in the following months.
* **Eig3:** acts as a derivative of Eig2 which can be especially seen in late March through July. Eig2 reaches the bottom of the curve as Eig3 crosses from negative to positive in May but then starts decreasing from June until it zeroes out in July
* **Eig4:** Does not seem to carry much weight in the beginning of the year but mirrors the inverse of the mean snow depth from october through the end of the year.


### Examples of reconstructions

#### Coeff1
Coeff1: most positive
![SNWD_grid_Pos_coeff1.png](g_figures/SNWD_grid_Pos_coeff1.png)
Coeff1: most negative
![SNWD_grid_neg_coeff1.png](g_figures/SNWD_grid_neg_coeff1.png)
Large positive values of coeff1 correspond to more than average snow. Low values correspond to less than average snow.

#### Coeff2
Coeff2: most positive
![SNWD_grid_Pos_coeff2.png](g_figures/SNWD_grid_Pos_coeff2.png)
Coeff2: most negative
![SNWD_grid_neg_coeff2.png](g_figures/SNWD_grid_neg_coeff2.png)

Large negative values of coeff2 correspond to a late snow season (most of the snowfall is after mid feb. Positive values for coeff2 correspond to an early snow season (most of the snow is before mid-feb.
#### Coeff3
Coeff3: most positive
![SNWD_grid_Pos_coeff3.png](g_figures/SNWD_grid_Pos_coeff3.png)
Coeff3: most negative
![SNWD_grid_neg_coeff3.png](g_figures/SNWD_grid_neg_coeff3.png)

Large positive values of coeff3 correspond to a snow season with two spikes: one in the middle of February, the other at the end of the year. Negative values of coeff3 correspond to a season with a single peak in the middle of February and  lesser snow depth at the end of the year.

#### Coeff4
Coeff4: most positive
![SNWD_grid_Pos_coeff4.png](g_figures/SNWD_grid_Pos_coeff4.png)
Coeff4: most negative
![SNWD_grid_neg_coeff4.png](g_figures/SNWD_grid_neg_coeff4.png)

Large positive values of coeff4 correspond to years where the snow depth in the beginning of the year is larger than the snow depth at the end of the year. Negative values of coeff4 correspond to a year where the snow depth in the beginning of the year is not greater than the snow depth at the end of the year

### Geographical distribution of first 4 coefficients. 

![SNWD_coeff_locs.png](g_figures/SNWD_coeff_locs.png)

### Plotting the cumulative distribution functions of each coefficient and residual

<tr>
<td> <img src = "g_figures/SNWD_CDF_coeff1.png" alt="SNWD_CDF_coeff1.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/SNWD_CDF_res1.png" alt="SNWD_CDF_res1.png" style = "width: 400px;"/> </td>
</tr>

<tr>
<td> <img src = "g_figures/SNWD_CDF_coeff2.png" alt="SNWD_CDF_coeff2.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/SNWD_CDF_res2.png" alt="SNWD_CDF_res2.png" style = "width: 400px;"/> </td>
</tr>
<tr>
<td> <img src = "g_figures/SNWD_CDF_coeff3.png" alt="SNWD_CDF_coeff3.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/SNWD_CDF_res3.png" alt="SNWD_CDF_res3.png" style = "width: 400px;"/> </td>
</tr>
<tr>
<td> <img src = "g_figures/SNWD_CDF_coeff4.png" alt="SNWD_CDF_coeff4.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/SNWD_CDF_res4.png" alt="SNWD_CDF_res4.png" style = "width: 400px;"/> </td>
</tr>

## Analysis of correlation between percipitation across locations


We can see that the mean precipitation is largest in late spring and late summer with the minimum being in the middle of summer and during peak winter.
![PRCP, Mean_Eigs.png](g_figures/PRCP, Mean_Eigs.png)



As we saw above, even the top 5 eigenvectors only cover 12% of variance for precipitation data.

![VarExplained_PRCP.png](g_figures/VarExplained_PRCP.png)


In fact, we need over 150 eigenvectors to even properly reconstruct 80% of the data.
![PCRP_Eigs_cumulative.png](g_figures/PCRP_Eigs_cumulative.png)

We can also see the data is rather noisy as well so it would be difficult to reconstruct the rain data from their eigenvector/eigenvales.

We see this is true by showing the most positive/negative Coeff_1 reconstruction plots

Most Positive
![PRCP_grid_Pos_coeff1.png](g_figures/PRCP_grid_Pos_coeff1.png)

Most Negative
![PRCP_grid_neg_coeff1.png](g_figures/PRCP_grid_neg_coeff1.png)

The reconstructed graphs are very far off the targets.



### correlations matrix


Doing a correlation matrix of each of the readings we can see the correlations between each of the measurements with each other. 


![corr_matrix.png](g_figures/corr_matrix.png)

As expected, there is a large inverse correlation between the temperature readings (TMIN, TOBS, TMAX) and the snow mesurements (SNOW,SNWD), especially during the winter months. Something to note as well is while precipitation is mostly negatively correlated with temperature, it is actually slightly positively correlated with TMIN. Whether this is due to noise or it is an actual phenomenon, we would have to do some statistical analysis which could be a project for later down the road.


## Elevation/Longitude/Latitude/Year vs Measurements

### Plotting Elevation vs measurements

<tr>
<td> <img src = "g_figures/elevation_vs_PRCP.png" alt="elevation_vs_PRCP.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/elevation_vs_SNOW.png" alt="elevation_vs_SNOW.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/elevation_vs_SNWD.png" alt="elevation_vs_SNWD.png" style = "width: 400px;"/> </td>
</tr>


<tr>
<td> <img src = "g_figures/elevation_vs_TMAX.png" alt="elevation_vs_TMAX.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/elevation_vs_TMIN.png" alt="elevation_vs_TMIN.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/elevation_vs_TOBS.png" alt="elevation_vs_TOBS.png" style = "width: 400px;"/> </td>
</tr>


<p> As elevation increases, we can see that PRCP and SNOW both trend upwards. In addition we can see that there is also a clear correlation between elevation and temperature. This is a well known as <a href="https://en.wikipedia.org/wiki/Lapse_rate" target="_blank">Lapse Rate</a>. However, while SNOW shows a steady increase in correlation with elevation, SNWD is very spread out. This is pretty interesting as logically, one would expect more snow depth at higher elevations since higher elevations remain colder longer.</p>

To take a closer look at the SNOW and SNWD data, we take 12 different weather stations that exhibit high SNWD.

We also plot the corresponding weather stations' SNOW data.


<tr>
<td> <img src = "g_figures/Top_SNWD_years.png" alt="Top_SNWD_years.png" style = "width: 500px;"/> </td> 
<td> <img src = "g_figures/Top_SNWD_years_corresp_SNOW.png" alt="Top_SNWD_years_corresp_SNOW.png" style = "width: 500px;"/> </td>

</tr>


Interestingly, despite the high SNWD, these stations all exhibit relatively average to low SNOW. This means that despite the relatively little snow that these weather stations get, they retain their snow depth very well. 

A simple explanation to this could be because higher elevations stay colder longer than lower elevations and thus delay the melting of snow.




![Top_SNWD_years_corres_elev.png](g_figures/Top_SNWD_years_corresp_elev.png)

<p>However, there has been some research shown that  <a href="http://www.the-cryosphere.net/8/2381/2014/tc-8-2381-2014.pdf" target="_blank"> elevation and snow depth are only positively correlated until roughly 2500-2900 meters</a> and thereafter become negatively correlated due to lack of moisture.
</p>


Another possibility why this may be could be due to the cardinal orientation of the hillside where the weather station is located. In the northern hemisphere, south facing hillsides do not retain snow as well as north facing ones due to the location of the sun and tilt of the earth's axis. 

<tr>
<td> <img src = "g_figures/north_south_slopes.jpg" alt="north_south_slopes.jpg" style = "width: 500px;"/> </td> </tr>

In either case, it would be beneficial to further look into this data

### Plotting Latitude and Longitude vs measurements



<tr>
<td> <img src = "g_figures/latitude_vs_PRCP.png" alt="latitude_vs_PRCP.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/latitude_vs_SNOW.png" alt="latitude_vs_SNOW.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/latitude_vs_SNWD.png" alt="latitude_vs_SNWD.png" style = "width: 400px;"/> </td>
</tr>

<tr>
<td> <img src = "g_figures/latitude_vs_TMAX.png" alt="latitude_vs_TMAX.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/latitude_vs_TMIN.png" alt="latitude_vs_TMIN.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/latitude_vs_TOBS.png" alt="latitude_vs_TOBS.png" style = "width: 400px;"/> </td>
</tr>



<tr>
<td> <img src = "g_figures/longitude_vs_PRCP.png" alt="longitude_vs_PRCP.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/longitude_vs_SNOW.png" alt="longitude_vs_SNOW.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/longitude_vs_SNWD.png" alt="longitude_vs_SNWD.png" style = "width: 400px;"/> </td>
</tr>

<tr>
<td> <img src = "g_figures/longitude_vs_TMAX.png" alt="longitude_vs_TMAX.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/longitude_vs_TMIN.png" alt="longitude_vs_TMIN.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/longitude_vs_TOBS.png" alt="longitude_vs_TOBS.png" style = "width: 400px;"/> </td>
</tr>

At a first glance our data does not show to be much correlation between latitude/longitude and any measurements. This is understandable since the area that our data spans is a very small part of Colorado. 

However, each of the graphs between the latitude and longitude do seem to follow similar patterns, most notably the two TMIN graphs. Since our area spans many mountainous regions, it would be interesting to see if the combination of latitude/longitude plots could be used to map out hot or snowy "zones" of the mountains. 


### Plotting Year vs measurements as line graph

<tr>
<td> <img src = "g_figures/year_vs_PRCP.png" alt="year_vs_PRCP.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/year_vs_SNOW.png" alt="year_vs_SNOW.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/year_vs_SNWD.png" alt="year_vs_SNWD.png" style = "width: 400px;"/> </td>
</tr>

<tr>
<td> <img src = "g_figures/year_vs_TMAX.png" alt="year_vs_TMAX.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/year_vs_TMIN.png" alt="year_vs_TMIN.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/year_vs_TOBS.png" alt="year_vs_TOBS.png" style = "width: 400px;"/> </td>
</tr>

<p> We initially see a lot of big spikes in our data, particularly from early and later years. These early spikes are likely resulted from very little data being collected initially, thus resulting in readings far from actual averages. Through the early 1900s, more and more measurements are taken and we can see more consistent readings. However, the deployment of <a href="https://en.wikipedia.org/wiki/Automated_airport_weather_station#Automated_Surface_Observing_System_.28ASOS.29" target="_blank">Automated Surface Observing System (ASOS)</a> in 1991 dramatically increased the number of measuremnets taken, which accounts for much of the temperature averages dropping in later years (more automated readings taken during winter seasons where temperatures are low)


</p>

The below scatterplots show us that the number of measurements taken increases significantly as time goes on. This is especially noticeable for the year_vs_SNWD plot where we can see there are hardly any measurements before 1950, causing the large spike in the data.

### Plotting Year vs measurements

<tr>
<td> <img src = "g_figures/year_vs_PRCP_scatter.png" alt="year_vs_PRCP_scatter.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/year_vs_SNOW_scatter.png" alt="year_vs_SNOW_scatter.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/year_vs_SNWD_scatter.png" alt="year_vs_SNWD_scatter.png" style = "width: 400px;"/> </td>
</tr>

<tr>
<td> <img src = "g_figures/year_vs_TMAX_scatter.png" alt="year_vs_TMAX_scatter.png" style = "width: 400px;"/> </td> 
<td> <img src = "g_figures/year_vs_TMIN_scatter.png" alt="year_vs_TMIN_scatter.png" style = "width: 400px;"/> </td>
<td> <img src = "g_figures/year_vs_TOBS_scatter.png" alt="year_vs_TOBS_scatter.png" style = "width: 400px;"/> </td>
</tr>

## A quick look at the places with the highest measurements

<table>

<tr>
   <th align = "left">Highest TMAX</th>
   <th aligh = "left">Highest SNWD</th>
</tr>

<tr>
<td> <img src = "g_figures/Map_Top_TMAX.png" alt="Highest TMAX" style = "width: 500px;"/> </td> 
<td> <img src = "g_figures/Map_Top_SNWD.png" alt="Highest SNWD" style = "width: 500px;"/> </td>
</tr>

<tr>
   <th align = "left">Highest SNOW</th>
   <th aligh = "left">Highest PRCP</th>
</tr>

<tr>
<td> <img src = "g_figures/Map_Top_SNOW.png" alt="Highest SNOW" style = "width: 500px;"/> </td> 
<td> <img src = "g_figures/Map_Top_PRCP.png" alt="Highest PRCP" style = "width: 500px;"/> </td>
</tr>
</table>