Thoushands or even millions of features not only make training extremely slow, also make it harder to find a good solution, this is called **the curse of dimensionality**.

# Main content
- the curse of dimensionality
- know high-dimensional space
- projection and Manifold learning
- PCA, Kernel PCA, LLE.

# The Curse of Dimensionality
The more dimensions the training set has, the greater the risk of overfitting it. In theory, one solution to the curse of dimensionality could be to increase the size of training set to reach a sufficient density of training instances.

# Main Approaches for Dimensionality Reduction
## Projection
Training instances are not spread out uniformly across all dimensions. Many features are almost constant, while others are highly correlated. As a result, all training instances actually lie within a much lower-dimensional space. See an example.

![8](images/8-2.png)

![8](images/8-3.png)

![8](images/8-4.png)

## Manifold Learning

![8](images/8-5.png)

![8](images/8-6.png)

# PCA
Principal Component Analysis is the most popular dimensionality reduction algorithm. First it identifies the hyperplane that lies closest to the data, and then it projects the data onto it.

## Preserving the Variance
Before projecting the training set onto a lower-dimensional htperplane, you first need to choose the right hyperplane.

![8](images/8-7.png)

## Principal Components
**PCA identifies the axis that accounts for the largest amount of variance in the training set.**

The unit vector that defines the $i^{th}$ axis is called the $i^{th}$ **principal component(PC)**. In fig 8-7, the 1st PC is $c_1$ and the 2nd PC is $c_2$.

### How to find the PC of a training set?
There is a standard matrix factorization technique called **Singular Value Decomposition(SVD)** that can decompose the training set matrix **X** into the ot product of three matrices $U\cdot \sum \cdot V^T$, where $V^T$ contains all the principal components that we are looking for.

![8](images/e8-1.png)

Use Numpy's `svd()` function to obtain all the principal components of the training set, then extracts the first two PCs:
```
X_centered = X - X.mean(axis=0)
U,s,V = np.linalg.svd(X_centered)
c1 = V.T[:,0]
c2 = V.T[:,1]
```

**PCA assumes that the dataset is centered around the origin. Don't forget to center the data first**.

## Projecting Down to d Dimensions
Once you have identified all the principal components, you can reduce the dimensionality of the dataset down to d dimensions by projecting it onto the hyperplane defined by the first d principal components. Selecting this hyperplane ensures that the projection will preserve as much variance as possible.

$W_d$ is the matrix containing the first d principal components(i.e., the matrix composed of the first d columns of $V^T$.

![8](images/e8-2.png)

## Using Scikit-Learn
```
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X2D = pca.fit_transform(X)

first_PC = pca.components_.T[:,0]
```

## Explained Variance Ratio