## 主成分分析

目的： 数据降维。
简单说就是将原始空间的数据转换到新特征空间。

1，进行normalization，使得原始数据的$\mu$为0

$\displaystyle x_j^{(i)} = \frac {x_j^{(i)} - \mu_j}{s_j}$

2，求原始数据的covariance matrix

$\displaystyle \Sigma = \frac{1}{m}\sum\limits_{i=1}^{m}(x^{(i)})(x^{(i)})^T = \frac{1}{m} \cdot X^TX$

3，对convariance matrix，通过svd求eigenvectors和eigenvalues

$\displaystyle (U,S,V^T) = SVD(\Sigma)$

4,从U取出前k个奇异向量，构建一个约减矩阵

$\displaystyle U_{reduce} = (u^{(1)},u^{(2)},\cdots,u^{(k)})$

5,计算新的特征向量

$\displaystyle z^{(i)}=U_{reduce}^T \cdot x^{(i)}$

In [1]:
import numpy as np

def normalize(X):
    copy = X.copy()
    m,n = copy.shape
    mean = np.mean(copy,axis=0)
    std = np.std(copy,axis=0)
    normed = (copy - mean)/std
    return normed

In [2]:
def PCA(X,k=1):
    m,n = X.shape
    normed = normalize(X)
    Coef = normed.T * normed/m
    U,S,V = np.linalg.svd(Coef)
    UReduce = U[:,0:k]
    Z = normed * UReduce
    return normed,Z,U,S,V,UReduce

In [3]:
def recover(UReduce,Z):
    return Z * UReduce.T