# Weather Analysis for Utah and Nevada

## Introduction

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Utah and Nevada. The area covers several forest parks in mountains. 

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

This report contains five sections. In the first section, we briefly introduce the analysis area and measurements. In the second section, we do sanity check on provided data. In the third section, we briefly do PCA on these measurements. In the fourth and fifth sections, we will analyze principle components of TOBS and SNWD in detail.

## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/orderville/utah/united-states/usut0190" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month. We choose one city from this area, Orderville, to express the climate type of this area.</p>

<p>&nbsp;</p>

<p><img alt="Climate_Utah.jpg" src="pics/Climate_Utah_O.jpg" style="height:450px; width:600px"/></p>

<p>&nbsp;</p>

<p>The data is from US Climate Data in Centigrade, and our data is also in Centigrade after divided by 10, so we can compare them directly. From the data of US Climate Data, we know that the highest maximum temperature is this area is a little above 30 centigrades in July and the highest minimal temperature is about 12 centigrades in Augest, which agrees with our record. Besides, the trend of temperature is similar between these two data source. Thus, we can say that our data for temperature is trustful.</p>

<p>&nbsp;</p>

<p><img alt="tmintmax.png" src="pics/tmintmax.png" style="height:300px; width:800px" /></p>

<p>To compare the precipitation&nbsp;we need to translate millimeter/day to millimeter/month. From our data, the average rainfull is about 1 mm/day and is about 30 mm/month, which agrees with the data from US Climate Data. And it is clearly that there is more rain in winter and less rain in summer from both data sources. Though the details of data have some differences, considering that rainfall varies greatly in mountain areas, the difference is acceptable.</p>

<p>&nbsp;<img alt="PRCP.png" src="pics/prcp.png" style="height:450px; width:600px" /></p>


## PCA analysis

For each of the six measurement, we compute the percentage of the variance explained as a function of the number of eigen-vectors used.

![eigTMINTOBSTMAX.png](pics/eigTMINTOBSTMAX.png)
We see that the top 5 eigen-vectors explain 48% of variance for TMIN, 62% for TOBS and 51% for TMAX.

We conclude that of the three, TOBS is best explained by the top 5 eigenvectors. This is especially true for the first eigen-vector which, by itself, explains 55% of the variance. That is, several top eigenvalues for these three measurement can represent most part of the variance. Thus, we will do some detailed PCA process on TOBS later.

![eigSNOWSNWDPRCP.png](pics/eigSNOWSNWDPRCP.png)

The top 5 eigenvectors explain 13% of the variance for PRCP and 11% for SNOW. Both are low values. On the other hand the top 5 eigenvectors explain about 90% of the variance for SNWD. This means that these top 5 eigenvectors capture most of the variation in the snow signals. Based on that we will dig deeper into the PCA analysis for snow-depth.

It makes sense that SNWD would be less noisy than SNOW. That is because SNWD is a decaying integral of SNOW and, as such, varies less between days and between the same date on diffferent years.

## Analysis of TOBS

For the first 4 eigenvectors explain most part (over 60%) of variance in average temperature, we will do eigen-decomposition on it to show information behind the data.

Firstly, we show mean value and first 4 eigenvectors for TOBS in following graphs.

From following figure, we can see that the average temperature in this area is just as common climate, high in July and low in January. The highest average temperature is a little above 20 centigrade and the lowest average temperature is below 0 centigrade.
![meaneigTOBS.png](pics/meaneigTOBS.png)

Then, we talk about more detailed information on its eigen-functions:

* **eig0:** From the figure, we know that eig0 is a convex curve that the shape of it is just like the shape of curve mean but has a more gentle slope. That is, the first eigen-function is a basic component of the temperature.
* **eig1:** eig1 is high in winter and low in summer. It means that it shows a warmmer winter and a colder summer.
* **eig2:** eig2 is high in the first two seasons and low in last two seasons. It reveals spring and summer are warmmer and autumn and winter are colder.
* **eig3:** eig3 has a large valley in spring and a peak in winter. It shows a extreme weather that is cold in spring and warm in winter.


### Examples of reconstructions for TOBS

We have added a filter to the data to avoid large residue. In this way, the result can better show the properties of different eigen-functions.

#### Coeff0
Coeff0: most positive
![tobsc0_1.png](pics/tobsc0_1.png)
Coeff0: most negative
![tobsc0_2.png](pics/tobsc0_2.png)

eig0 is the basic part of temperature. Thus, a large value of coeff0 means the temperture is higher than average and a small value of coeff0 means the temperature is lower than average.

#### Coeff1
Coeff1: most positive
![tobsc1_1.png](pics/tobsc1_1.png)
Coeff1: most negative
![tobsc1_2.png](pics/tobsc1_2.png)

Postive coeff1 always means that there are several warm days in the first three months and negative coeff1 means that weather in summer is extremely hot.


#### Coeff2
Coeff2: most positive
![tobsc2_1.png](pics/tobsc2_1.png)
Coeff2: most negative
![tobsc2_2.png](pics/tobsc2_2.png)

Postive coeff1 always means that temperature in first six months is warmer that last six months.

#### Coeff3
Coeff3: most positive
![tobsc3_1.png](pics/tobsc3_1.png)
Coeff3: most negative
![tobsc3_2.png](pics/tobsc3_2.png)

Large negative coeff3 shows a extremely cold weather in winter.

### Location-to-location variation and year-to-year variation for TOBS

In the previous part, we show the connection between different eigen-functions and average temperature. In this part, we will analyze the relative importance of location-to-location variation relative to year-by-year variation.
These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

#### Coeff0
total MS = 577819.02<br>
MS removing mean-by-station= 216427.31, fraction explained=62.5<br>
MS removing mean-by-year = 498007.02, fraction explained=13.8

#### Coeff1
total MS = 42547.89<br>
MS removing mean-by-station= 29498.86, fraction explained=20.7<br>
MS removing mean-by-year = 23926.85, fraction explained=43.8

#### Coeff2
total MS = 33457.27<br>
MS removing mean-by-station= 31344.08, fraction explained=6.3<br>
MS removing mean-by-year = 10722.28, fraction explained=68.0

#### Coeff3
total MS = 23294.97<br>
MS removing mean-by-station= 22076.45, fraction explained=5.2<br>
MS removing mean-by-year = 4408.80, fraction explained=81.0

From these data, we know that for the first coefficient (coeff_0), it varies greatly from station to station but is almost same in different years. However, for next 3 coefficients, they mainly varies from year to year instead of station to station. It makes sense. We know that the first eigen-function is the main components of average temperature. Due to the fact that climate varies greatly in mountain areas in a short distance, there should be large difference between common average temperature of stations in difference locations. But it varies slightly among years. Then, for the next three eigen-functions, they mainly stand for an extreme weather. An extreme weather, such as a cold front or a warm air stream, may affect the whole area, but does not come every year.

### Cumulative distribution for TOBS

In this part, we analyze the cumulative distribution for TOBS. We can see directly from the figure, though most of instances has res less than 1, there is a large part of instances with res larger than 1. It means the result of reconstruction for some instances are not good enough. And there are most instances with coeff near zero than with large coefficients. It makes sense for normal data distribution.

#### Coeff0 and Res0
![tobsc0.png](pics/tobsc0.png)
![tobsr0.png](pics/tobsr0.png)

#### Coeff1 and Res1
![tobsc1.png](pics/tobsc1.png)
![tobsr1.png](pics/tobsr1.png)

#### Coeff2 and Res2
![tobsc2.png](pics/tobsc2.png)
![tobsr2.png](pics/tobsr2.png)

#### Coeff3 and Res3
![tobsc3.png](pics/tobsc3.png)
![tobsr3.png](pics/tobsr3.png)


### Best reconstruction for TOBS

![tobsbc.png](pics/tobsbc.png)

### Show TOBS in map

Following figure shows Coeff0 (stands for common average temperature) of depth of snow in the map. The radius of circle means number of records and the color means value. We can see from the figure that stations in the same forest park always have similar color and stations from different parks can have some difference.

![snwd.png](pics/tobs.png)

## Analysis of SNWD

We choose to analyze the eigen-decomposition for snow-depth because the first 4 eigen-vectors explain 87% of the variance.

First, we graph the mean and the top 4 eigen-vectors.

We observe that the snow season is from November to June, where the middle of February marks the peak of the snow-depth.
![meaneigSNWD.png](pics/meaneigSNWD.png)

Next we interpret the eigen-functions. The first eigen-function (eig0) has a shape very similar to the mean function, though less than zero (and we can see that most coefficients for it are negative). The main difference is that the eigen-function has a steeper slope than mean.  The interpretation of this shape is that eig1 represents the overall amount of snow above/below the mean, but without changing the distribution over time.

**eig1, eig2 and eig3** are similar in the following way. They all oscilate between positive and negative values. In other words, they correspond to changing the distribution of the snow depth over the winter months, but they don't change the total (much).

They can be interpreted as follows:
* **eig1:** less snow from April to summer, less snow after October.
* **eig2:** more snow in the end of the year, less snow in March.
* **eig3:** more snow in April and May, less snow in end of Febreary abd December.


### Examples of reconstructions for SNWD

#### Coeff0
Coeff1: most positive
![snowc0_1.png](pics/snowc0_1.png)
Coeff1: most negative
![snowc0_2.png](pics/snowc0_2.png)
Large (absolute) negative values of coeff1 correspond to more than average snow. Large values correspond to less than average snow. It differs from class for the first eigen function of us is less than zero. All the properties are reversed.

#### Coeff1
Coeff1: most positive
![snowc1_1.png](pics/snowc1_1.png)
Coeff1: most negative
![snowc1_2.png](pics/snowc1_2.png)
Large positive values of coeff1 correspond to more than average snow. Low values correspond to less than average snow. Large positive values of coeff2 correspond to a late snow season

#### Coeff2
Coeff2: most positive
![snowc2_1.png](pics/snowc2_1.png)
Coeff2: most negative
![snowc2_2.png](pics/snowc2_2.png)

Large positive values of coeff2 correspond to more snow in the winter.
#### Coeff3
Coeff3: most positive
![snowc3_1.png](pics/snowc3_1.png)
Coeff3: most negative
![snowc3_2.png](pics/snowc3_2.png)

Large positive values of coeff3 correspond to more snow in April and May. It means the snow season is longer than average.



### Location-to-location variation and year-to-year variation for SNWD

In the previous section we see the variation of coefficients, which corresponds to the total amount of snow, with respect to location. We now estimate the relative importance of location-to-location variation relative to year-by-year variation.
These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

#### Coeff0
total MS = 9109409.61<br>
MS removing mean-by-station= 2063631.97, fraction explained=77.3<br>
MS removing mean-by-year = 6202867.93, fraction explained=31.9

#### Coeff1
total MS = 1213821.01<br>
MS removing mean-by-station= 720944.13, fraction explained=40.6<br>
MS removing mean-by-year = 679848.79, fraction explained=56.0

#### Coeff2
total MS = 726612.86<br>
MS removing mean-by-station= 549688.05, fraction explained=24.3<br>
MS removing mean-by-year = 437566.48, fraction explained=39.8

#### Coeff3
total MS = 406484.22<br>
MS removing mean-by-station= 344546.79, fraction explained=15.2<br>
MS removing mean-by-year = 242599.55, fraction explained=40.3

From these data, we can say that only the first eigen function varies mainly from station to station, and next three varies mainly from year to year, similar to the analysis of average temperature in last section. The explaination should be similar too. Coeff0 stands for average snowfall, thus varies among different place. Coeff1, Coeff3 and Coeff3 all correspond to extreme weather, which may varies in different years.

### Cumulative distribution for SNWD

In this part, we draw the cumulative distribution for coeff and res for SNWD. Compared with TOBS, the reconstruction result is better for SNWD for there are only several instances that has res larger than 1. 

#### Coeff0 and Res0
![snwdc0.png](pics/snwdc0.png)
![snwdr0.png](pics/snwdr0.png)

#### Coeff1 and Res1
![snwdc1.png](pics/snwdc1.png)
![snwdr1.png](pics/snwdr1.png)

#### Coeff2 and Res2
![snwdc2.png](pics/snwdc2.png)
![snwdr2.png](pics/snwdr2.png)

#### Coeff3 and Res3
![snwdc3.png](pics/snwdc3.png)
![snwdr3.png](pics/snwdr3.png)


### Best reconstruction for SNWD

![snwdbc.png](pics/snwdbc.png)

### Show SNWD in map

Following figure shows Coeff0 (stands for common snow depth) of depth of snow in the map. The radius of circle means number of records and the color means value. We can see from the figure that stations in the same forest park always have similar color and stations from different parks can have some difference.

![snwd.png](pics/snwd.png)