# Bridgeport,CA Weather Analysis

This is a report on the historical analysis of weather patterns in an area that approximately overlaps the area of the state of Massachusets.

The data we will use here comes from [NOAA](https://www.ncdc.noaa.gov/). Specifically, it was downloaded from This [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

We focused on six measurements:
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (in mm)
* **SNOW:** Daily snowfall (in mm)
* **SNWD:** The depth of accumulated snow.

## Sanity-check: comparison with outside sources

<p>We start by comparing some of the general statistics with graphs that we obtained from a site called <a href="http://www.usclimatedata.com/climate/bridgeport/california/united-states/usca0129" target="_blank">US Climate Data</a> The graph below shows the daily minimum and maximum temperatures for each month, as well as the total precipitation for each month.</p>

<p>&nbsp;</p>

<p><img alt="Climate_Bridgeport_-_California_and_Weather_averages_Bridgeport.png" src="report_figures/Climate_Bridgeport_-_California_and_Weather_averages_Bridgeport.png" /></p>

<p>&nbsp;</p>

<p>We see that the data lines for min and max daily temperature from US Climate match well with our data TMIN(Mean-std) and TMAX(mean)</p>




<p><img alt="TMIN,TMAX.png" src="report_figures/TMIN,TMAX.png" style="height:300px; width:800px" /></p>

<p>&nbsp;</p>

<p>To compare the precipitation we need to translate millimeter/day to inches/month. According to our analysis the average rainfall is the lowest at about 0.20 mm/day,highest at about 3.00 mm/day, which translate to about 0.24 and 3.54 inches per month. According to US-Climate-Data the lowest and highest average rainfall is closer to 0.2 and 1.7 inch per month.  However, there is clear agreement that average precipitation is lowest at Jun and July (summer), whereas is highest from Dec to Feb (winter).</p>

<p>&nbsp;</p>

<p>&nbsp;<img alt="prcp_mean_std.png" src="report_figures/prcp_mean_std.png" style="height:450px; width:600px" /></p>

# PCA analysis

For each of the six measurement, we compute the percentate of the variance explained as a function of the number of eigen-vectors used.

From the figures, we see the top 5 eigen-vecotors can explain 60%, ~70%, 55%, 14%, 85% and 17.5% for TMIN,TOBS,TMAX,SNOW,SNWD and PRCP, the 5 measurements respectively. For SNWD the variance can be explained as high as 85% by only top 5 eigen-vectors,among which the first eigen-vector alone explains ~70%. This makes SNWD highly suitable for further PCA analysis.


![VarExplained1.png](report_figures/VarExplained1.png)

![VarExplained2.png](report_figures/VarExplained2.png)

### Analysis of snow depth

We choose to analysis the eigen-decomposition for snow-depth because the first 3 eigen-vectors explain 80% of the variance.

The mean and top 3 eigen-vectors were plotted in the figure.

We see that the first eigen-vector is very similar with the mean which explains why itself alone can explain ~70% of the variance. The second and third engen-vecotrs determine mostly the time of snow season. 

The snow season of this region starts from Nov ,peaks at Feb and lasts all the way to May.
![SNWD_mean_eigs.png](report_figures/SNWD_mean_eigs.png)

# Examples of reconstructions

#### Coeff1
Most positive
![SNWD_mean_eigs.png](report_figures/SNWD_grid_Pos_coeff1.png)
Most negtive
![SNWD_grid_neg_coeff1.png](report_figures/SNWD_grid_neg_coeff1.png)

Large positive values of coeff1 correspond to more than average snow. Low values correspond to less than average snow.

#### Coeff2
Most positive
![SNWD_mean_eigs.png](report_figures/SNWD_grid_Pos_coeff2.png)
Most negtive
![SNWD_mean_eigs.png](report_figures/SNWD_grid_Neg_coeff2.png)

large positive values of coeff2 correspond to a late snow season in April whereas negtive values of coeff2 refer to an early snow starting from Jan.

#### Coeff3
Most positive
![SNWD_mean_eigs.png](report_figures/SNWD_grid_Pos_coeff3.png)
Most negtive
![SNWD_mean_eigs.png](report_figures/SNWD_grid_Neg_coeff3.png)
Large positive values of coeff3 give less snow fall in April but more snow fall from Nov to Jan. In another word, positive values of coeff3 brings an early snow season. Vice versa,negtive values of  coeff3 correspond to positive snow in April but negtive from Nov to Jan, which translates to a late snow season in April.

# The variation in the timing of snow is mostly due to year-to-year variation

###### coeff_1 
total RMS                   =  3524.55297988<br />
RMS removing mean-by-station=  1746.91151663<br />
RMS removing mean-by-year   =  2867.27157028<br />

###### coeff_2
total RMS                   =  1174.86213425<br />
RMS removing mean-by-station=  978.857625043<br />
RMS removing mean-by-year   =  882.700935227<br />

###### coeff_3
total RMS                   =  952.099218524<br />
RMS removing mean-by-station=  910.749759066<br />
RMS removing mean-by-year   =  748.783391738<br />

From last section we discovered 
that coeff 1 is correlated to the average snow, which is a variation over location. From the RMS dropping we see that variation by station can explain most variance. 
Coeff 2 and 3, however, have more to do with temporal change as pointed out in last section too. The RMS reduces significantly by removing mean-by-year.

### Residue Analysis (PRCP)
Here we try using principle components to analize PRCP measurments
![PRCP_VarExplained1.png](report_figures/PRCP_VarExplained1.png)
![PRCP_mean_eigs.png](report_figures/PRCP_mean_eigs.png)
![PRCP_res_3.png](report_figures/PRCP_res_3.png)


The figures above show that the variance that can be explained with respect to how many eigenvectores included. And the three eigenvectors are plotted to compare with the mean.

total RMS                   =  82.72441947
RMS removing mean-by-station=  78.8430007305
RMS removing mean-by-day   =  44.2530387151

![CDF_daily_rainfall.png](report_figures/CDF_daily_rainfall.png)

We see in this region it rains half the days.


In the following we do the p-value analysis to see the correlation between the stations in the region.From the histogram, we see there are 7 bars that have significance < 0.05, which indicates 7 pairs stations have strong correlation. It can be further proved in the P_norm figure.


![P_norm_hist.png](report_figures/P_norm_hist.png)

![P_norm.png](report_figures/P_norm.png)

A group of very correlated stations is :

![correlated_group.png](report_figures/correlated_group.png)

Next figure shows how 4 eigenvectors can reconstruct the data.

![reconstruct_residue.png](report_figures/reconstruct_residue.png)

Lastly by reordering the rows and columns of the matrix using one of the eigenvectores, the grouping can be more evident as shown in the following figure.

![re_order_matrix.png](report_figures/re_order_matrix.png)

The correlations are plotted in the map.

![map.png](report_figures/map.png)