# T10 Dimensionality reduction

## Principal component analysis

We generate an artifical dataset consisting of three variables. The three variables show some dynamics as function of time which varies from [0,10] s. 

We will perform a Principal Component Analysis on this dataset. The aim is to find two latent varaibles which could account for the dynamics of the original three variables. 


Let's start by importing the necessary libraries and creating the variables. 

In [11]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

%matplotlib notebook

# generating three time-dependent variables 
t = np.arange(1000)/100. # time array 
x1 = np.sin(t) + t/1. + np.random.rand(len(t))/2.
x2 = np.sin(t) - t/1. + np.random.rand(len(t))/2.
x3 = t/1. + np.random.rand(len(t))/2.


#### Visualize the raw data

As usual let's start by plotting the data first. Simply display the three variables as function of time. 

In [12]:
# your code goes here

#### Visualize the raw data in 3D

Let's also look at the three variables in the 3-dimensional space, where x1, x2 and x3 are used as x, y and z. You can use the matplotlib function `plot3D()` which takes `x, y` and `z` as input arguments. Note that time is not explictely plotted in this depiction. It is implicit in the trajectory of the data. 

What do you observe when rotating the axes (use the interactive modus through `%matplotlib notebook`)? 

In [15]:
from mpl_toolkits import mplot3d




#### Check the correlation structure between variables

PCA removes correlations between variables. Let's see whether correlations between the three variables exist. In other words, let's calculate the covariance matrix. Use the matplotlib `np.cov()` function with the three variables as input argument. 

In [7]:
# your code goes here

#### Calculate the PCA 

Let's now move on to perform the PCA analysis. We will use the scikit learn module, specifically the `PCA` function in `sklearn.decomposition`. We will look at the PCA components (via `pca.components_`, the explained variance (via `pca.explained_variance_`) and the covariance matrix from the PCA (via `pca.get_covariance()`).  

In [8]:
# perform PCA analysis 
X = np.column_stack((x1,x2,x3)) # first we have to concatenate the three variables 
pca = PCA(n_components=3)
pca.fit(X)


PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

#### Visualize principal compnents in 3D

Let's plot the first two basis vectors in the 3-dimensional space. What is the spatial relation between the three variables and the Eigenvectors? 


In [16]:
comp = pca.components_


#ax.plot([0,3*comp[0,0]],[0,3*comp[0,1]],[0,3*comp[0,2]])
#ax.plot([0,3*comp[1,0]],[0,3*comp[1,1]],[0,3*comp[1,2]])
#### Visualize the raw data
plt.show()

#### Transform the original data in the PCA space

Next, let's now convert the data to the new space spanned by the principal components by using `pca.transform([original data])`. Plot the two first latent variables. 

In [20]:
# tranform the variables into the prinipal component space 

# plot new variables in the PCA space 


#### Perform inverse tranformation

Finally, let's look at the inverse tranformation, i.e., the transformation from the PCA space back to the original coordinate system. Perform the inverse transformation for the three variables. Plot the inversely transformed variables in the orginal space and compare with the original variables. 

*Hint :* The function `pca.inverse_transform()` allows to perform the inverse transformation and takes the variables in the pca space as input arguments. 

In [21]:
# reverse transformation 


Try the inverse tranformation without taking into accout the first principal component. What does this mean? 

In [22]:
# reconstract original data without 1st principal component


## The end