<a href="https://colab.research.google.com/github/yesoly/MachineLearningProject/blob/master/Assignment_08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Principal Component Analysis

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random as rd

## 1. Data



- the data are given by the file data-pca.txt
- the data consist of a set of points $\{ (x_i, y_i) \}_{i=1}^{n}​$ where $z_i = (x_i, y_i)$ denotes a 2-dimensional point in the cartesian coordinate



load the data from the files

In [None]:
path = '/content/drive/My Drive/ML_Assignment/data/data-pca.txt'
data = np.loadtxt(path, delimiter=',')
x = data[:,0]
y = data[:,1]

Plot the original data points

In [None]:
fig_1 = plt.figure(figsize = (6,6))
plt.scatter(x, y, c='r', marker = '+') 
plt.title('original data points')
plt.show()
fig_1.savefig('original data points.png')

## 2. Normalization

- the data is normalized to have the mean = 0 and the standard deviation = 1
- $x = \frac{x - \mu_x}{\sigma_x}$ and $y = \frac{y - \mu_y}{\sigma_y}$​


> * $\mu_x$​ denotes the mean of $x$
> * $\sigma_x$​ denotes the standard deviation of $x$
> * $\mu_y$​ denotes the mean of $y$
> * $\sigma_y$​ denotes the standard deviation of $y$









define a function to normalize the input data points $x$ and $y$

In [None]:
def normalize_data(x, y):

    xn = (x - x.mean(axis=0)) / x.std(axis=0) # normalize x. the mean of xn is zero and the standard deviation of xn is one #
    yn = (y - y.mean(axis=0)) / y.std(axis=0) # normalize y. the mean of yn is zero and the standard deviation of yn is one #

    return xn, yn

Plot the normalized data points

In [None]:
xn, yn = normalize_data(x, y)

In [None]:
fig_2 = plt.figure(figsize = (6,6))
plt.scatter(xn, yn, c='r', marker = '+') 
plt.title('data normalized by z-scoring')
plt.axis([-3, 3, -3, 3])
plt.show()
fig_1.savefig('normalized data points.png')

## 3. Covariance Matrix




- compute the co-variance matrix
- $\Sigma = \frac{1}{n} \sum_{i = 1}^n z_i z_i^T = \frac{1}{n} Z^T Z$

> * $n$ denotes the number of data

> * $Z = \begin{bmatrix} z_1^T \\ \vdots \\ z_n^T \end{bmatrix}$





define a function to compute the co-variance matrix of the data

In [None]:
def compute_covariance(z):
    # compute the covariance matrix #
    covar = np.cov(z.T)
    return covar

In [None]:
Z = np.c_[xn,yn]
covariance = compute_covariance(Z)  # return 2x2 metrix

##4. Principal Components

* compute the eigen-values and the eigen-vectors of the co-variance matrix

define a function to compute the principal directions from the co-variance matrix

In [None]:
def compute_principal_direction(covariance):

    e_value, e_vector = np.linalg.eig(covariance) # compute the principal directions from the co-variance matrix #
    
    return e_value, e_vector

In [None]:
e_value, e_vector = compute_principal_direction(covariance)

In [None]:
print(e_vector)
print(e_value)

In [None]:
# 내림차순 정렬
idx = np.flip(e_value.argsort())
e_value = e_value[idx]
e_vector = e_vector[:, idx]

Plot the principal axes

In [None]:
fig_3 = plt.figure(figsize = (6,6))
plt.scatter(xn, yn, c='r', marker = '+')
Origin = [0], [0]
plt.quiver([0,0], [0,0], e_vector[0, 0], e_vector[1, 0], color=['g'],angles='xy', scale_units='xy', scale=0.4)
plt.quiver([0,0], [0,0], e_vector[0, 1], e_vector[1, 1], color=['b'],angles='xy', scale_units='xy', scale=2)
plt.title('principal direction')
plt.axis([-3, 3, -3, 3])
plt.show()
fig_3.savefig('principal direction.png')

Plot the first principal axis

In [None]:
fig_4 = plt.figure(figsize = (6,6))
plt.scatter(xn, yn, c='r', marker = '+')
plt.title('first principal axis')
plt.axis([-3, 3, -3, 3])
plt.show()
fig_3.savefig('first principal axis.png')

define a function to compute the projection of the data point onto the principal axis

In [None]:
# 축에 수직으로 projection 했을 때의 점
def compute_projection(point, axis):

    # compute the projection of point on the axis #
    projection = np.dot(point, e_vector)
    
    return projection

In [None]:
pca = compute_projection(Z, e_vector)

In [None]:
fig_4 = plt.figure(figsize = (6,6))
plt.scatter(xn, yn, c='r', marker = '+') 
plt.scatter(pca[:,0], pca[:,1], c = 'b')
plt.title('first principal axis')
plt.show()
fig_4.savefig('first principal axis.png')

In [None]:
# 유클리드 좌표계에서 점사이의 거리
def compute_distance(point1, point2):

    distance = # compute the Euclidean distance between point1 and point2 #
    
    return distance

# Output

##1. Plot the original data points [1pt]

In [None]:
fig_1

## 2. Plot the normalized data points [1pt]

- $z = \frac{z - \mu}{\sigma}​$
- $\mu$ denotes the average and $\sigma$ denotes the standard deviation

In [None]:
fig_2

## 3. Plot the principal axes [2pt]

- plot the normalized data points
- plot the first principal vector
- plot the second principal vector

In [None]:
fig_3

##4. Plot the first principal axis [3pt]

- plot the normalized data points
- plot the first principal axis

In [None]:
fig_4

##5. Plot the project of the normalized data points onto the first principal axis [4pt]

- plot the normalized data points
- plot the first principal axis
- plot the projected points from the normalized data points onto the first principal axis

In [None]:
fig_5

## 6. Plot the lines between the normalized data points and their projection points on the first principal axis [3pt]

- plot the normalized data points
- plot the first principal axis
- plot the projected points from the normalized data points onto the first principal axis
- plot the lines that connect between the normalized data points and their projection points on the first principal axis

In [None]:
fig_6

##7. Plot the second principal axis [3pt]

- plot the normalized data points
- plot the second principal axis

In [None]:
fig_7

##8. Plot the project of the normalized data points onto the second principal axis [4pt]

- plot the normalized data points
- plot the second principal axis
- plot the projected points from the normalized data points onto the second principal axis

In [None]:
fig_8

##9. Plot the lines between the normalized data points and their projection points on the second principal axis [3pt]

- plot the normalized data points
- plot the second principal axis
- plot the projected points from the normalized data points onto the second principal axis
- plot the lines that connect between the normalized data points and their projection points on the second principal axis

In [None]:
fig_9