When the two features are positively correlated, the covariance is greater than zero, otherwise, it has a negative value. Furthermore, if there is no evidence of a correlation between them, hence the covariance is equal to zero.

As you can see, the covariance matrix defines both the spread (variance) and the orientation (covariance) of our data. To this matrix can be assigned two further elements: a representative vector and a number which indicates its magnitude. The vector will point into the direction of the larger spread of data, the number will be equal to the spread (variance) of that direction. These two elements are, respectively, an Eigenvector and Eigenvalue. Let’s visualize them:


![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*PimvwRx26O9SmJxq4CHfOA.png)

The direction in green is the eigenvector, and it has a corresponding value, called eigenvalue, which describes its magnitude

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*jhaTRyyEl-EUVLLupwPGsQ.png)

![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*NlAfmaMOVwIvQyu9aFKSKA.png)

In [3]:
import pandas as pd

import plotly.express as pe


from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

In [4]:
df  = pd.read_csv("/home/harshit/Desktop/IntroductionToML/Dataset/AAPL.csv", index_col="Date", parse_dates=True)

df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-04-11,27.242500,27.652500,27.207500,27.254999,25.350115,117630000
2016-04-12,27.334999,27.625000,27.165001,27.610001,25.680305,108929200
2016-04-13,27.700001,28.084999,27.700001,28.010000,26.052345,133029200
2016-04-14,27.905001,28.097500,27.832500,28.025000,26.066298,101895600
2016-04-15,28.027500,28.075001,27.432501,27.462500,25.543112,187756000
...,...,...,...,...,...,...
2021-04-01,123.660004,124.180000,122.489998,123.000000,123.000000,74957400
2021-04-05,123.870003,126.160004,123.070000,125.900002,125.900002,88651200
2021-04-06,126.500000,127.129997,125.650002,126.209999,126.209999,80171300
2021-04-07,125.830002,127.919998,125.139999,127.900002,127.900002,83466700


In [5]:
df.dropna(inplace=True)

In [8]:
sc = StandardScaler()

df [      ["Open","High","Low", "Close", "Adj Close", "Volume"]     ] = sc.fit_transform(df[["Open","High","Low", "Close", "Adj Close", "Volume"]])

df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-04-11,-0.985547,-0.978638,-0.983646,-0.987387,-0.996965,-0.208524
2016-04-12,-0.982498,-0.979532,-0.985072,-0.975671,-0.986238,-0.356665
2016-04-13,-0.970466,-0.964583,-0.967118,-0.962469,-0.974152,0.053664
2016-04-14,-0.963708,-0.964177,-0.962671,-0.961974,-0.973698,-0.476419
2016-04-15,-0.959670,-0.964908,-0.976095,-0.980539,-0.990695,0.985448
...,...,...,...,...,...,...
2021-04-01,2.192872,2.158211,2.213956,2.172644,2.175448,-0.935072
2021-04-05,2.199795,2.222555,2.233421,2.268358,2.269662,-0.701920
2021-04-06,2.286493,2.254077,2.320003,2.278589,2.279733,-0.846299
2021-04-07,2.264407,2.279749,2.302888,2.334367,2.334637,-0.790192


In [13]:
results  = []
for dimension in range(1,6, 1):
    model = PCA(n_components=dimension)
    pca_features =model.fit_transform(df[["Open","High","Low", "Close", "Adj Close", "Volume"]])
    
    col_names = [f"PCA_{count}" for count  in range(1,dimension+1)]
    result = pd.DataFrame(pca_features, columns=col_names)
    display(result)
    ans = model.explained_variance_ratio_
    results.append(ans)


Unnamed: 0,PCA_1
0,-2.207812
1,-2.199009
2,-2.163286
3,-2.163250
4,-2.168289
...,...
1253,4.870385
1254,4.998351
1255,5.097492
1256,5.141531


Unnamed: 0,PCA_1,PCA_2
0,-2.207812,-0.185235
1,-2.199009,-0.333465
2,-2.163286,0.076449
3,-2.163250,-0.453653
4,-2.168289,1.008364
...,...,...
1253,4.870385,-0.986797
1254,4.998351,-0.754671
1255,5.097492,-0.900518
1256,5.141531,-0.844557


Unnamed: 0,PCA_1,PCA_2,PCA_3
0,-2.207812,-0.185235,0.008558
1,-2.199009,-0.333465,-0.000425
2,-2.163286,0.076449,-0.000499
3,-2.163250,-0.453653,0.005104
4,-2.168289,1.008364,0.021255
...,...,...,...
1253,4.870385,-0.986797,0.015783
1254,4.998351,-0.754671,-0.058666
1255,5.097492,-0.900518,0.006147
1256,5.141531,-0.844557,-0.060260


Unnamed: 0,PCA_1,PCA_2,PCA_3,PCA_4
0,-2.207812,-0.185235,0.008558,0.003286
1,-2.199009,-0.333465,-0.000425,0.007959
2,-2.163286,0.076449,-0.000499,-0.000210
3,-2.163250,-0.453653,0.005104,0.003584
4,-2.168289,1.008364,0.021255,-0.006340
...,...,...,...,...
1253,4.870385,-0.986797,0.015783,-0.027653
1254,4.998351,-0.754671,-0.058666,0.005971
1255,5.097492,-0.900518,0.006147,-0.035718
1256,5.141531,-0.844557,-0.060260,-0.000878


Unnamed: 0,PCA_1,PCA_2,PCA_3,PCA_4,PCA_5
0,-2.207812,-0.185235,0.008558,0.003286,0.009655
1,-2.199009,-0.333465,-0.000425,0.007959,0.002348
2,-2.163286,0.076449,-0.000499,-0.000210,0.005171
3,-2.163250,-0.453653,0.005104,0.003584,0.004160
4,-2.168289,1.008364,0.021255,-0.006340,0.001114
...,...,...,...,...,...
1253,4.870385,-0.986797,0.015783,-0.027653,-0.005931
1254,4.998351,-0.754671,-0.058666,0.005971,-0.002456
1255,5.097492,-0.900518,0.006147,-0.035718,-0.005359
1256,5.141531,-0.844557,-0.060260,-0.000878,-0.005872
