# Chapter 10

# Question 8

Manually calculating the proportional of variance explained (PVE)

In [1]:
import statsmodels.api as sm
import numpy as np
import sklearn.decomposition
import sklearn.preprocessing

On the `USArrests` data, calculate PVE in two ways:

In [2]:
usarrests = sm.datasets.get_rdataset("USArrests", "datasets").data
usarrests.head()

Unnamed: 0,Murder,Assault,UrbanPop,Rape
Alabama,13.2,236,58,21.2
Alaska,10.0,263,48,44.5
Arizona,8.1,294,80,31.0
Arkansas,8.8,190,50,19.5
California,9.0,276,91,40.6


In [3]:
X = sklearn.preprocessing.StandardScaler().fit_transform(usarrests)

### (a) Using the` sdev` output of the `prcomp()` function

`Sdev` is the standard deviation of each principal component. By squaring this, we obtain the variance explained by each principal component. To compute the PVE, we just normalise by dividing by the sum over all principal components. 


In [4]:
pca = sklearn.decomposition.PCA()
pca.fit(X)
print(pca.explained_variance_ratio_)

[0.62006039 0.24744129 0.0891408  0.04335752]


### (b) Using Equation 10.8 directly.

That equation:

$$ \textrm{PVE}_{m} =  \frac{ \sum_{i=1}^n \left( \sum_{j=1}^p \phi_{jm} x_{ij} \right)^2 }{  \sum_{i=1}^n \sum_{j=1}^p x_{ij}^2 } $$

In [5]:
denominator = np.sum(np.square(X).ravel())  # because of the scaling, should be equal to the number of elements

numerator = np.sum(np.square(X@pca.components_.T),axis=0)

pve = numerator/denominator
print(pve)

[0.62006039 0.24744129 0.0891408  0.04335752]
