# Weather Analysis of New Mexico's Central Region

This is an analysis report on the climate pattern in an area which basically overlaps the central region of New Mexico. The dataset we use is from [NOAA](https://www.ncdc.noaa.gov/) and contains climate data from 1854 to 2012 collected by 206 stations in that area.

Six measurements are considered in this report:
* **TMIN, TMAX:** Daily minimum and maximum temperature (in &deg;C)
* **TOBS:** Average temperature for each day (in &deg;C)
* **PRCP:** Daily Precipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow (in mm)

## Sanity Check

We first compare our data with some outside sources as a sanity check, for which we use a graph obtained from <a href="http://www.usclimatedata.com/climate/albuquerque/new-mexico/united-states/usnm0370" target="_blank">US Climate Data</a>. The graph below shows the min and max daily temperture as well as the monthly precipitations in Albuquerque, which is the largest city in central New Mexico.

<img alt="Albuquerque_Climate_Graph.png" src="report_figures/Albuquerque_Climate_Graph.png" style="width:50%"/>

We see that the min and max daily temperature in Albuquerque respectively range from about 25&deg;F to 65&deg;F (-3.9&deg;C-18.3&deg;C) and from about 45&deg;F to 90&deg;F (7.2&deg;C-32.2&deg;C), which are not exactly the same as the means of TMIN and TMAX in our data.

<img alt="T_MIN,T_MAX.png" src="report_figures/T_MIN,T_MAX.png" style="width:80%"/>

However, our data does agree with <a href="http://www.usclimatedata.com/climate/albuquerque/new-mexico/united-states/usnm0370" target="_blank">US Climate Data</a> in the sense that the max daily temperature occurs in July and August, and the minimum occurs in December and January. There is also a clear agreement that the difference between max and min temperatures stays roughly constant throughout the year.

The monthly precipitation in Albuquerque is close to a constant (about 0.5 inch/month or 0.42 mm/day) in most of the months other than from July to October, when the maximum in a year (about 1.5 inch/month or 1.27 mm/day) is reached. This generally agrees with our data except that the maximum value in our data is somewhat greater (around 2.5 mm/day).

<img alt="PRCP.png" src="report_figures/PRCP.png" style="width:40%"/>

## Overall PCA Analysis
We next conduct PCA analysis on each of the six measurements to see if there is any hidden weather pattern.

### Percentage of Variance Explained
To begin with, we compute the percentage of the variance explained as a function of the number of eigenvectors used.

<img alt="VarExplained1.png" src="report_figures/VarExplained1.png" style="width:90%"/>

<img alt="VarExplained2.png" src="report_figures/VarExplained2.png" style="width:90%"/>

It is shown that the variance explained by the top 5 eigenvectors is 39% for TMIN, 58% for TOBS, 43% for TMAX, 10% for SNOW, 85% for SNWD and 7% for PRCP.

Clearly, SNWD is explained the best by the top 5 eigenvectors, especially provided that the first eigenvector itself explains 65% of the variance. What follows is TOBS, whose first eigenvector explains more than 50% of the variance. The rest of the measurements are, however, much more noisy.

This makes much sense for that TOBS, as the average temperature, should be more stable than TMIN and TMAX; and that SNWD, as an accumulation of SNOW, should vary less between the same date in different years.

We, therefore, focus on SNOW and TOBS in the following sections.

## Analysis of Snow Depth

### Top 3 Eigenvectors

The figure below illustrates the mean and top 3 eigenvectors of SNWD. From the mean, we see that the snow season in this area is from mid-October to mid-May. The peak of snow depth occurs in January, which happens to be the coldest monthly in that area.

<img alt="SNWD_mean_eigs.png" src="report_figures/SNWD_mean_eigs.png" style="width:60%"/>

The first eigenvector is just like a negative version of the mean: it is negative during the snow season and closes to zero in rest of the months. But the bottom is touched in March rather than January. We conclude that **eig1** represents the overall amount of snow less/more than the average without changing the general distribution much.

**eig2** and **eig3** both wiggle between positive and negative values, which leads to a change in the distribution of snow over time. We interpret **eig2** as less snow in January and February and more snow in March and April, which actually delays the peak of snow depth in a year.

**eig3** acts somewhat similarly to **eig2**: it too implies more snow in March and April and less snow in January and February. What distinguishes it is that it significantly increases the snow depth in December. Distributions with a greater coefficient of **eig3** are likely to have a sub peak in December.

### Examples of Reconstructions

We here give some typical examples of reconstructions using the top 3 eigenvectors. 

#### Coeff1

Most positive:

<img alt="SNWD_grid_pos_coeff1.png" src="report_figures/SNWD_grid_pos_coeff1.png" style="width:90%"/>

Most negative:

<img alt="SNWD_grid_neg_coeff1.png" src="report_figures/SNWD_grid_neg_coeff1.png" style="width:90%"/>

We see that a more negative coeff1 clearly corresponds to a larger overall amount of snow and vice versa.

#### Coeff2

Most positive:

<img alt="SNWD_grid_pos_coeff2.png" src="report_figures/SNWD_grid_pos_coeff2.png" style="width:90%"/>

Most negative:

<img alt="SNWD_grid_neg_coeff2.png" src="report_figures/SNWD_grid_neg_coeff2.png" style="width:90%"/>

It is shown that a positive coeff2 indeed implies a late snow season. On the contrary, distributions with most negative coeff2 all have little snow in March and April.

#### Coeff3

Most positive:

<img alt="SNWD_grid_pos_coeff3.png" src="report_figures/SNWD_grid_pos_coeff3.png" style="width:90%"/>

Most negative:

<img alt="SNWD_grid_neg_coeff3.png" src="report_figures/SNWD_grid_neg_coeff3.png" style="width:90%"/>

The figures above verify our interpretation of **eig3**. Distributions with most positive coeff3 turn out to be multimodal - they have one peak in March and another in December.

### Distribution of Coefficients

Our next step is to figure out the hidden relationship between the coefficients and the geographical location.

#### Coeff1

coeff1 indicates the overall amount of snow in a year. Its cumulative distribution is demonstrated by the figure below. We see that more than 90% data has a coeff1 greater than -1000, which means most of the area actually has very little snow depth.

<img alt="SNWD_coeff_1_CDF.png" src="report_figures/SNWD_coeff_1_CDF.png" style="width:40%"/>

The only 3 stations with a coeff1 less than -1000 all happen to be the stations at high elevation, which makes much sense.

<img alt="SNWD_coeff1_elevation.png" src="report_figures/SNWD_coeff1_elevation.png" style="width:40%"/>

#### Coeff2 & Coeff3

The cumulative distributions of coeff2 and coeff3 are both concentrated near zero. As is shown, about half of the coefficients are negative and half of them are positive.

<img alt="SNWD_coeff_2_CDF.png" src="report_figures/SNWD_coeff_2_CDF.png" style="width:40%"/>
<img alt="SNWD_coeff_3_CDF.png" src="report_figures/SNWD_coeff_3_CDF.png" style="width:40%"/>

The geographical distribution of coeff2 and coeff3 are given below. Each station in the map is shown as a circle, whose color indicates its average coefficient over years. A red circle corresponds to a positive coefficient while a blue one corresponds to a negative value.

<img alt="geo_distribution_coeff2.png" src="report_figures/geo_distribution_coeff2.png" style="width:80%"/>

<center>Geographical Distribution of Coeff2</center>

<img alt="geo_distribution_coeff3.png" src="report_figures/geo_distribution_coeff3.png" style="width:80%"/>

<center>Geographical Distribution of Coeff3</center>

Recall that **eig2** and **eig3** respectively represent the delay of the snow season and the sub peak of snow depth in December. We find that the stations with early snow season and those whose sub max of snow depth occurs in December both lie around the forest areas. The underlying reason, however, is beyond the scope of this report.

## Analysis of Average Daily Temperature

### Top 3 Eigenvectors

We then analyze the average daily temperature in a similar way. As we can see, the mean of TOBS has a unimodal distribution in a year. Its maximum occurs in July while January and December remark its minimum.

<img alt="TOBS_mean_eigs.png" src="report_figures/TOBS_mean_eigs.png" style="width:60%"/>

The first eigenvector, which roughly stays constant throughout the year, is interpreted as the overall temperature above/below the average. 

**eig2** and **eig3** oscillate between positive and negative values to change the distribution of TOBS in a year. We interpret them as follows:

**eig2**: lower temperature from March to October, and higher temperature in the rest of the year, which makes the temperature distribute more uniformly in a year.

**eig3**: colder from July to December, and warmer from January to May, which makes the temperature peak occur earlier. 

### Examples of Reconstructions

We further justify our interpretation by providing the following examples.

#### Coeff1

Greatest:

<img alt="TOBS_grid_pos_coeff1.png" src="report_figures/TOBS_grid_pos_coeff1.png" style="width:90%"/>

Least:

<img alt="TOBS_grid_neg_coeff1.png" src="report_figures/TOBS_grid_neg_coeff1.png" style="width:90%"/>

#### Coeff2

Most positive:

<img alt="TOBS_grid_pos_coeff2.png" src="report_figures/TOBS_grid_pos_coeff2.png" style="width:90%"/>

Most negative:

<img alt="TOBS_grid_neg_coeff2.png" src="report_figures/TOBS_grid_neg_coeff2.png" style="width:90%"/>

#### Coeff3

Most positive:

<img alt="TOBS_grid_pos_coeff3.png" src="report_figures/TOBS_grid_pos_coeff3.png" style="width:90%"/>

Most negative:

<img alt="TOBS_grid_neg_coeff3.png" src="report_figures/TOBS_grid_neg_coeff3.png" style="width:90%"/>

### Distribution of Coefficients



#### Coeff1

coeff1 represents the overall temperature above the average. We find that it is distributed quite uniformly in its domain.

<img alt="TOBS_coeff_1_CDF.png" src="report_figures/TOBS_coeff_1_CDF.png" style="width:40%"/>

It also shows an obvious correlation with the elevation of the station, which makes sense as higher elevation usually implies lower temperature.

<img alt="TOBS_coeff1_elevation.png" src="report_figures/TOBS_coeff1_elevation.png" style="width:40%"/>

#### Coeff2

The cumulative distribution of coeff2 is uniform too.

<img alt="TOBS_coeff_2_CDF.png" src="report_figures/TOBS_coeff_2_CDF.png" style="width:40%"/>

We illustrate the geographical distribution of coeff2 as below, where each station is represented by a circle. A blue circle indicates a small value of coeff2 and a red circle remarks a great one.

<img alt="TOBS_geo_distribution_coeff2.png" src="report_figures/TOBS_geo_distribution_coeff2.png" style="width:80%"/>

<center>Geographical Distribution of Coeff2</center>

Remember that **eig2** makes the temperature distribute more uniformly in a year. We see that the greatest coeff2 (the red circle) is located inside Albuquerque, the largest city in this area. This might be explained by the denser population, and, thus, less variation in temperature.