# Principal Component Analysis

#### *variationalform* <https://variationalform.github.io/>

#### *Just Enough: progress at pace*

<https://variationalform.github.io/>

<https://github.com/variationalform>

Simon Shaw
<https://www.brunel.ac.uk/people/simon-shaw>.


<table>
<tr>
<td>
<img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" style="height:18px"/>
<img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" style="height:18px"/>
<img src="https://mirrors.creativecommons.org/presskit/icons/sa.svg?ref=chooser-v1" style="height:18px"/>
</td>
<td>

<p>
This work is licensed under CC BY-SA 4.0 (Attribution-ShareAlike 4.0 International)

<p>
Visit <a href="http://creativecommons.org/licenses/by-sa/4.0/">http://creativecommons.org/licenses/by-sa/4.0/</a> to see the terms.
</td>
</tr>
</table>

<table>
<tr>
<td>This document uses python</td>
<td>
<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" style="height:30px"/>
</td>
<td>and also makes use of LaTeX </td>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/92/LaTeX_logo.svg/320px-LaTeX_logo.svg.png" style="height:30px"/>
</td>
<td>in Markdown</td> 
<td>
<img src="https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png" style="height:30px"/>
</td>
</tr>
</table>

## What this is about:

The connection between SVD, the **Singular Value Decomposition**, and PCA,
**Principal Component Analysis**.

As usual our emphasis will be on *doing* rather than *proving*:
*just enough: progress at pace*


## Assigned Reading

For this worksheet you are recommended Chapters 4 and 10 of [MML],
Chapter 10 of [MLFCES], Chapter 5.3 of [IPDS], 

- MML: Mathematics for Machine Learning, by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong.
  Cambridge University Press. <https://mml-book.github.io>.
- MLFCES: Machine Learning: A First Course for Engineers and Scientists, by Andreas Lindholm,
  Niklas Wahlström, Fredrik Lindsten, Thomas B. Schön. Cambridge University Press. 
  <http://smlbook.org>.
- IPDS: Introduction to Probability for Data Science, by Stanley H. Chan,
  <https://probability4datascience.com>

These can be accessed legally and without cost.

There are also these useful references for coding:

- PT: `python`: <https://docs.python.org/3/tutorial>
- NP: `numpy`: <https://numpy.org/doc/stable/user/quickstart.html>
- MPL: `matplotlib`: <https://matplotlib.org>

## Review


We have seen these 

- Eigenvalue decomposition
- SVD, the **Singular Value Decomposition**

Let's review them...

## Eigen-systems of Symmetric Matrices

Given a real square $n$-row by $n$-column matrix,
$\boldsymbol{A}\in\mathbb{R}^{n\times n}$ the eigenvalue problem
is that of finding scalar eigenvalues $\lambda$
and $n$-dimensional eigenvectors $\boldsymbol{v}$ such that

$$
\boldsymbol{A}\boldsymbol{v}=\lambda\boldsymbol{v}    
\quad\Longrightarrow\quad
\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}\boldsymbol{D}  
\quad\Longrightarrow\quad
\boldsymbol{A} = 
\sum_{k=1}^n
\lambda_k\boldsymbol{v}_k\boldsymbol{v}_k^T.
$$

The eigensystem is **real**.

We have the *Spectral Theorem* - see [MML, Theorem 4.15]

> **Spectral Theorem (for matrices)**
> If $\boldsymbol{A}$ is real and symmetric then its eigenvalues are
> all real and its eigenvector matrix $\boldsymbol{V}$ can be taken
> as *orthogonal* so that $\boldsymbol{V}^{-1}=\boldsymbol{V}^T$.
Hence...

$$
\boldsymbol{A}=\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^T  
$$


## The SVD: Singular Value Decomposition

Given a real $m$-row by $n$-column matrix, 
$\boldsymbol{B}\in\mathbb{R}^{m\times n}$

$$
\boldsymbol{B} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T
=\sum_{j=1}^{p} \sigma_j \boldsymbol{u}_j\boldsymbol{v}_j^T
$$

where: for the left singular vectors: $\boldsymbol{U}\in\mathbb{R}^{m\times m}$;
for the singular values: $\boldsymbol{\Sigma}\in\mathbb{R}^{m\times n}$;
and, for the right singular vectors, $\boldsymbol{V}\in\mathbb{R}^{n\times n}$.
Here $p=\min\{m,n\}$.

Note that $\boldsymbol{\Sigma}=\text{diag}(\sigma_1,\ldots,\sigma_p) + \mathit{zeros}$,
and we can always arrange that $0 \le \sigma_1\le\cdots\le\sigma_p$.

As $\boldsymbol{B}$ is real,
$\boldsymbol{U}$ and $\boldsymbol{V}$ are real and *orthogonal*.

If $\sigma_r\ne 0$ and $\sigma_p= 0$ for all $p>r$ then
$r$ is the rank of $\boldsymbol{B}$.



## How are these factorizations connected?

On the face of it they are very different. the first applies only to 
square symmetric matrices, while the second applies also to
rectangular, and hence (why?) non-symmetric matrices.

But... Look at this... Given the SVD 
$\boldsymbol{B} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$
we have,

$$
\boldsymbol{B}^T\boldsymbol{B}
= \Big(\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T\Big)^T
\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T
$$

and remembering that, in general,
$(\boldsymbol{K}\boldsymbol{L})^T = \boldsymbol{L}^T\boldsymbol{K}^T$
(this could called *taking the transpose through*), we can write,

$$
\boldsymbol{B}^T\boldsymbol{B}
= \boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{U}^T
\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T
= \boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{\Sigma}\boldsymbol{V}^T
$$
because $\boldsymbol{U}^T\boldsymbol{U}=\boldsymbol{I}$ (orthogonal).

Similarly, because also
$\boldsymbol{V}^T\boldsymbol{V}=\boldsymbol{I}$ (orthogonal),

$$
\boldsymbol{B}\boldsymbol{B}^T
= 
\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T
\Big(\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T\Big)^T
= 
\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T
\boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{U}^T
= 
\boldsymbol{U}\boldsymbol{\Sigma}
\boldsymbol{\Sigma}^T\boldsymbol{U}^T.
$$

Do you recognise these?

We have just shown that,

$$
\boldsymbol{B}^T\boldsymbol{B}
= \boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{\Sigma}\boldsymbol{V}^T
\qquad\text{ and }\qquad
\boldsymbol{B}\boldsymbol{B}^T
= 
\boldsymbol{U}\boldsymbol{\Sigma}
\boldsymbol{\Sigma}^T\boldsymbol{U}^T.
$$

Familiar? Think about $\boldsymbol{A}=\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^T$.

- Put $\boldsymbol{A} = \boldsymbol{B}^T\boldsymbol{B}$ (symmetric) and 
$\boldsymbol{D} = \boldsymbol{\Sigma}^T\boldsymbol{\Sigma}$. Then,

$$
\boldsymbol{B}^T\boldsymbol{B}
= \boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{\Sigma}\boldsymbol{V}^T
\qquad\text{becomes}\qquad
\boldsymbol{A}=\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^T.
$$

- Put $\boldsymbol{A} = \boldsymbol{B}\boldsymbol{B}^T$ (symmetric) and 
$\boldsymbol{D} = \boldsymbol{\Sigma}\boldsymbol{\Sigma}^T$. Then,

$$
\boldsymbol{B}\boldsymbol{B}^T
= 
\boldsymbol{U}\boldsymbol{\Sigma}
\boldsymbol{\Sigma}^T\boldsymbol{U}^T
\qquad\text{becomes}\qquad
\boldsymbol{A}=\boldsymbol{U}\boldsymbol{D}\boldsymbol{U}^T.
$$

- $\boldsymbol{V}$, the right singular vectors in the SVD are the eigenvectors of 
$\boldsymbol{B}^T\boldsymbol{B}$.

- $\boldsymbol{U}$, the left singular vectors in the SVD are the eigenvectors of 
$\boldsymbol{B}\boldsymbol{B}^T$.

- In both cases $\boldsymbol{\Sigma}$ contains the positive square
roots of the eigenvalues of $\boldsymbol{B}^T\boldsymbol{B}$
and $\boldsymbol{B}\boldsymbol{B}^T$.

- **NOTE:** $\boldsymbol{B}^T\boldsymbol{B}$ and $\boldsymbol{B}\boldsymbol{B}^T$
have the same non-zero eigenvalues (same rank).


## Why does this matter?
 
Our data, $\boldsymbol{X}$, is organized into rows of feature values with one observation per row 
and one feature per column. We write this as

$$
\boldsymbol{X} = \Big(
\boldsymbol{X}_0, \boldsymbol{X}_1, \cdots, \boldsymbol{X}_N
\Big)
$$

If $N=3$ (four features)...

... we recall that the **covariance matrix** takes this form:

$$
\boldsymbol{M} = 
\left(\begin{array}{llll}
\mathrm{Var}(X_0)  &  \mathrm{Cov}(X_0,X_1)  &  \mathrm{Cov}(X_0,X_2)  &  \mathrm{Cov}(X_0,X_3) \\
\mathrm{Cov}(X_1,X_0)  &  \mathrm{Var}(X_1)  &  \mathrm{Cov}(X_1,X_2)  &  \mathrm{Cov}(X_1,X_3) \\
\mathrm{Cov}(X_2,X_0)  &  \mathrm{Cov}(X_2,X_1)  &  \mathrm{Var}(X_2)  &  \mathrm{Cov}(X_2,X_3) \\
\mathrm{Cov}(X_3,X_0)  &  \mathrm{Cov}(X_3,X_1)  &  \mathrm{Cov}(X_3,X_2)  &  \mathrm{Var}(X_3) \\
\end{array}\right)
$$

because $\mathrm{Cov}(X,X)=\mathrm{Var}(X)$. Since $\mathrm{Cov}(X,Y)=\mathrm{Cov}(Y,X)$, this matrix is **symmetric**
and so has real eigenvalues.

We have seen that if the data are already centred then,

$$
(N-1)\boldsymbol{M} = 
\left(\begin{array}{llll}
\boldsymbol{X}_0\cdot\boldsymbol{X}_0 & \boldsymbol{X}_0\cdot\boldsymbol{X}_1 &
\boldsymbol{X}_0\cdot\boldsymbol{X}_2 & \boldsymbol{X}_0\cdot\boldsymbol{X}_3
\\
\boldsymbol{X}_1\cdot\boldsymbol{X}_0 & \boldsymbol{X}_1\cdot\boldsymbol{X}_1 &
\boldsymbol{X}_1\cdot\boldsymbol{X}_2 & \boldsymbol{X}_1\cdot\boldsymbol{X}_3
\\
\boldsymbol{X}_2\cdot\boldsymbol{X}_0 & \boldsymbol{X}_2\cdot\boldsymbol{X}_1 &
\boldsymbol{X}_2\cdot\boldsymbol{X}_2 & \boldsymbol{X}_2\cdot\boldsymbol{X}_3
\\
\boldsymbol{X}_3\cdot\boldsymbol{X}_0 & \boldsymbol{X}_3\cdot\boldsymbol{X}_1 &
\boldsymbol{X}_3\cdot\boldsymbol{X}_2 & \boldsymbol{X}_3\cdot\boldsymbol{X}_3
\\
\end{array}\right)
=
\left(\begin{array}{l}
\boldsymbol{X}_0^T
\\
\boldsymbol{X}_1^T
\\
\boldsymbol{X}_2^T
\\
\boldsymbol{X}_3^T
\\
\end{array}\right)
\left(\begin{array}{llll}
\boldsymbol{X}_0
&
\boldsymbol{X}_1
&
\boldsymbol{X}_2
&
\boldsymbol{X}_3
\\
\end{array}\right)
$$

and, hence (in general), the (sample) covariance matrix is

$$
\boldsymbol{M} = 
\frac{1}{(N-1)}\boldsymbol{X}^T\boldsymbol{X}.
$$


# I N C O M P L E T E

There is more to come - this document will be replaced with an update in due course.

### Review

We covered *just enough*, to make *progress at pace*. We looked at

- How the SVD and eigenvalue decomposition are related.
- How this becomes relevant to the data covariance matrix.
- How this can be used.

Now we can start putting all of this material to work.

## Technical Notes, Production and Archiving

Ignore the material below. What follows is not relevant to the material being taught.

#### Production Workflow

- Finalise the notebook material above
- Clear and fresh run of entire notebook
- Create html slide show:
  - `jupyter nbconvert --to slides 10_pca.ipynb `
- Set `OUTPUTTING=1` below
- Comment out the display of web-sourced diagrams
- Clear and fresh run of entire notebook
- Comment back in the display of web-sourced diagrams
- Clear all cell output
- Set `OUTPUTTING=0` below
- Save
- git add, commit and push to FML
- copy PDF, HTML etc to web site
  - git add, commit and push
- rebuild binder

Some of this originated from

<https://stackoverflow.com/questions/38540326/save-html-of-a-jupyter-notebook-from-within-the-notebook>

These lines create a back up of the notebook. They can be ignored.

At some point this is better as a bash script outside of the notebook

In [None]:
%%bash
NBROOTNAME=10_pca
OUTPUTTING=0

if [ $OUTPUTTING -eq 1 ]; then
  jupyter nbconvert --to html $NBROOTNAME.ipynb
  cp $NBROOTNAME.html ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.html
  mv -f $NBROOTNAME.html ./formats/html/

  jupyter nbconvert --to pdf $NBROOTNAME.ipynb
  cp $NBROOTNAME.pdf ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.pdf
  mv -f $NBROOTNAME.pdf ./formats/pdf/

  jupyter nbconvert --to script $NBROOTNAME.ipynb
  cp $NBROOTNAME.py ../backups/$(date +"%m_%d_%Y-%H%M%S")_$NBROOTNAME.py
  mv -f $NBROOTNAME.py ./formats/py/
else
  echo 'Not Generating html, pdf and py output versions'
fi