# Lesson 3: A Practical Introduction to Principal Component Analysis (PCA)

# A Practical Introduction to Principal Component Analysis (PCA)

## An Introduction to Principal Component Analysis (PCA)
Let's dive into **Principal Component Analysis (PCA)**, a technique often used in machine learning to simplify complex data while keeping important details. PCA transforms datasets with many closely connected parts into datasets where parts do not directly relate to each other. Think of it as organizing a messy room and putting everything in clear, separate bins.

## Make A Simple Dataset
To start using PCA, we'll create a simple 3D dataset of 200 points:

```python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

np.random.seed(0)
# Creating 200-point 3D dataset
X = np.dot(np.random.random(size=(3, 3)), np.random.normal(size=(3, 200))).T
# Plotting the dataset
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1], X[:,2])
plt.title("Scatter Plot of Original Dataset")
plt.show()
```

## Standardizing the Dataset
Before applying PCA, we need to standardize our dataset. This ensures that all features have a mean of 0 and a similar range of values:

```python
# Calculate the mean and the standard deviation
X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
# Make the dataset standard
X = (X - X_mean) / X_std
```

The above code calculates the dataset's average (`np.mean`) and spread (`np.std`), and then adjusts each point accordingly.

## Covariance Matrix
Next, we calculate the **covariance matrix**, which shows how much two variables correlate:

```python
# Calculate Covariance Matrix
cov_matrix = np.cov(X.T)
```

We use `np.cov` to compute the covariance matrix.

## Eigendecomposition
The covariance matrix is then decomposed into **eigenvectors** and **eigenvalues**. This process helps us understand the direction and magnitude of the data's spread:

```python
# Break into eigenvectors and eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
```

Eigenvalues represent the spread of the data, while eigenvectors represent the direction of that spread.

## Sorting Eigenvalues and Eigenvectors
We sort the eigenvalues and their corresponding eigenvectors in descending order to identify the principal components:

```python
# Sort out eigenvalues and corresponding eigenvectors
eigen_pairs = [(np.abs(eigenvalues[i]), eigenvectors[:,i]) for i in range(len(eigenvalues))]
eigen_pairs.sort(reverse=True)
```

## Projecting the Original Dataset
Now that we have sorted the eigenvalues, we can select the top `k` eigenvectors to form the projection matrix. This step allows us to transform the original dataset into fewer dimensions:

```python
# Make the projection matrix
W = np.hstack((eigen_pairs[0][1].reshape(3,1), eigen_pairs[1][1].reshape(3,1)))
# Change the original dataset
X_pca = X.dot(W)
```

## Visualizing Results
Finally, we visualize the simplified dataset after applying PCA, showing how we've reduced it from three dimensions to two without losing important information:

```python
plt.figure()
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title("Scatter Plot of Transformed Dataset Using PCA")
plt.show()
```

This graph demonstrates how PCA simplified the data while retaining its essential structure.

## Wrapping Up
Well done! You've just learned about **Principal Component Analysis (PCA)**, a powerful technique for simplifying data without losing important details. Now it's time for you to practice! Remember, practice is the key to mastering any new concept. Keep learning!


## Visualizing Dimension Reduction with PCA

## Expanding the Horizon with Two Principal Components

## Unveiling the Secrets of PCA: Eigendecomposition and Transformation