# PCA with EC-ERA5

In the [first notebook](https://github.com/tingsyo/taiwan_weather_types/blob/main/notebook/01_PCA_based_Clustering.ipynb) we performed clustering with PCA results derived from NECT-CFSR dataset, and here we will do similar analysis with the EC-ERA5 dataset.

The ERA5 dataset was preprocess as single variable in netCDF4 format (.nc file). The data domain focuses on East Asia (10-50'N, 100-140'E) with a resolution of (161, 161) (0.25 degree interval). Scripts to perform the PCA analysis were [`ipca_era5.py`](https://github.com/tingsyo/taiwan_weather_types/blob/main/utils/ipca_era5.py) and [`run_ipca_era5.sh`](https://github.com/tingsyo/taiwan_weather_types/blob/main/utils/run_era5_ipca.sh).

In the following we will just use the results of PCA.

In [1]:
import joblib
import numpy as np
from sklearn.decomposition import PCA, IncrementalPCA
import pandas as pd
# Defined Parameters
MODEL_PATH = '../data/pca_era5'
LAYERS = ['q925','t925','u925','v925','q800','t800','u800','v800','q700','t700','u700','v700','h500','u200','v200']
ts = pd.read_csv('../data/era5_timestamp.csv')
ts.head()

Unnamed: 0,timestamp,year,month,day,hour
0,1979010100,1979,1,1,0
1,1979010200,1979,1,2,0
2,1979010300,1979,1,3,0
3,1979010400,1979,1,4,0
4,1979010500,1979,1,5,0


In [3]:
h500 = joblib.load('../data/pca_era5/h500.pca.mod')
print(np.cumsum(h500.explained_variance_ratio_))

[0.84469425 0.89534185 0.92990369 0.94918399 0.96028474 0.96989363
 0.97481601 0.97900806 0.98285091 0.98571433 0.98758365 0.98892287
 0.99025766 0.99133559 0.99219327 0.99293092 0.99355441 0.99411804
 0.99456543 0.99494337 0.99527978 0.99558428 0.99588309 0.99615418
 0.99640423 0.99660879 0.99680068 0.99696565 0.99712413 0.99728123
 0.99742019 0.99754613 0.9976523  0.99775401 0.99784742 0.99793727
 0.99802466 0.99810715 0.99818696 0.99825575 0.99832236 0.99838066
 0.99843456 0.99848791 0.99853715 0.99858533 0.99863121 0.99867442
 0.99871592 0.99875551]
