Principal Component Analysis (PCA)
==================================

Introduction
------------

After alignment, the multidimensional system must be reduced, and one way to
accomplish this is through principal component analysis (PCA). We read in our
trajectory and perform PCA on the coordinates. We can then determine the number
of components needed to use for QAA. We typically want to have enough
information captured within the decomposition (i.e. $\geq 95\%$). This can be
determined by plotting the explained variance ratio. Additionally, the program
displays the number of components needed to capture various percentages
($85\% \leq x \leq 95\%$ in increments of 5%) along with the percentages of
captured information using 50 or 100 components.

Perform PCA
-----------

In [None]:
!qaa pca -s output/average.pdb -f output/align.nc -o output \
    -l output/pca.log -v

Visualize the data
------------------

We can create plots of the data using `qaa plot`.

In [None]:
!qaa plot -i output/projection.csv -o output/projection.png \
    -l output/proj-plot.log --pca -v

<img src=output/projection.png>

Or, in a Jupyter notebook, we can use the following code to analyze the data
interactively.

In [None]:
%matplotlib notebook
import holoviews as hv
import numpy as np
import pandas as pd
from holoviews import opts

hv.extension("plotly")

### Explained variance ratio

In [None]:
data = pd.read_csv("output/explained_variance_ratio.csv", header=0)

evr = hv.Curve(data, kdims="Component", vdims="Percentage of Explained Variance")
evr.opts(opts.Curve(color="black", line_width=1.5))

### PCA projections

In [None]:
projection = pd.read_csv("output/projection.csv", header=0)
dataset = hv.Dataset(projection)
scatter = hv.Scatter3D(dataset, kdims=["PC1", "PC2", "PC3"])
scatter.opts(title="First 3 PCs")