## Dimensionality Reduction

Dimensionality reduction is a process used to reduce the number of input variables in a dataset, while retaining as much relevant information as possible. It transforms high-dimensional data into a lower-dimensional space, making it more manageable and interpretable without losing important patterns or relationships.

### Why Do We Need Dimensionality Reduction?

1. **Curse of Dimensionality**:
   When dealing with high-dimensional data, certain algorithms (like k-NN, clustering, or regression models) can become less effective because distances between data points become harder to measure accurately. As the number of dimensions increases, the volume of the data space increases exponentially, making the data points sparse. This is known as the "curse of dimensionality."

2. **Computational Efficiency**:
   High-dimensional datasets require more computational power for both processing and storage. By reducing the number of dimensions, we reduce the computational costs associated with training machine learning models, making algorithms faster.

3. **Overfitting Prevention**:
   High-dimensional data often contain irrelevant or noisy features that do not contribute meaningfully to the prediction task. These features can lead to overfitting, where the model becomes too specific to the training data and fails to generalize to new data. Reducing dimensions helps mitigate this risk.

4. **Visualization**:
   Visualization of high-dimensional data is difficult. By reducing dimensions to two or three, it becomes easier to visualize and understand the data, revealing underlying patterns or clusters.

### Motivation Behind Dimensionality Reduction

1. **Data Simplification**:
   Often, not all features in a dataset are equally important. Many features might be redundant or highly correlated. Dimensionality reduction helps simplify data by removing such redundancies and preserving only the essential features, which improves the interpretability of models.

2. **Improved Model Performance**:
   With fewer dimensions, models may become more robust and generalizable. Simplified models can also improve accuracy, as irrelevant or noisy features are removed from consideration, focusing on the most informative aspects of the data.

3. **Easier Data Storage and Transmission**:
   Lower-dimensional data is smaller in size, which reduces storage requirements and makes data transmission faster and easier, particularly when working with large datasets or streaming data.

### Common Dimensionality Reduction Techniques

1. **Principal Component Analysis (PCA)**:
   PCA transforms the data into new features called "principal components," which are linear combinations of the original features. These components are chosen to maximize variance, preserving the most information in the data while reducing dimensions.

2. **Linear Discriminant Analysis (LDA)**:
   LDA is a supervised dimensionality reduction technique that maximizes the separation between different classes in the data by finding the linear discriminants.

3. **t-SNE (t-Distributed Stochastic Neighbor Embedding)**:
   t-SNE is a nonlinear dimensionality reduction technique that visualizes high-dimensional data by mapping it to a lower-dimensional space (usually 2D or 3D), while preserving the local relationships between data points.

4. **Autoencoders**:
   In deep learning, autoencoders are a type of neural network used to learn compressed representations of input data, often used for dimensionality reduction in complex, nonlinear datasets.

### Conclusion

Dimensionality reduction helps manage the challenges of high-dimensional data by reducing noise, improving computation efficiency, preventing overfitting, and facilitating data visualization. It is a crucial step when working with large datasets, ensuring models remain accurate and interpretable.

## Principal Component Analysis (PCA)

**Principal Component Analysis (PCA)** is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving as much variance (information) as possible. It is commonly used for data compression, noise reduction, and as a preprocessing step for machine learning algorithms.

The core idea of PCA is to identify the directions (principal components) in which the data varies the most and project the data onto these new directions. These directions are orthogonal (perpendicular) to each other and capture the maximum variance in the dataset.

### How PCA Works (Step-by-Step)

1. **Standardization**: 
   Since PCA is affected by the scale of the features, it's important to standardize the dataset (mean = 0, variance = 1).

2. **Covariance Matrix Calculation**: 
   The covariance matrix of the data is calculated to understand how the variables are related to each other.

3. **Eigenvalues and Eigenvectors**:
   The covariance matrix is decomposed into its **eigenvectors** and **eigenvalues**. Eigenvectors represent the directions of the new feature space, and eigenvalues represent the magnitude of variance along these directions.

4. **Principal Components**: 
   The eigenvectors corresponding to the largest eigenvalues are selected as the principal components. These are the new axes onto which the data is projected.

5. **Projection**: 
   The data is projected onto the selected principal components to obtain a lower-dimensional representation while retaining the maximum amount of variance.

### PCA Example

Let's consider a simple example where we have a dataset with two features, `X1` and `X2`.

```plaintext
X1  X2
2   4
3   5
4   6
5   7
6   8
```

#### Step 1: Standardization

Standardize the data so that both features have a mean of 0 and variance of 1.

```plaintext
Standardized data:
X1'  X2'
-1.41  -1.41
-0.71  -0.71
0.00   0.00
0.71   0.71
1.41   1.41
```

#### Step 2: Covariance Matrix

Next, we calculate the covariance matrix of the standardized data to understand how the variables are related.

```plaintext
Covariance Matrix:
[1  1]
[1  1]
```

#### Step 3: Eigenvectors and Eigenvalues

We compute the eigenvalues and eigenvectors of the covariance matrix. These tell us the directions and the amount of variance in those directions.

```plaintext
Eigenvalues: 2, 0
Eigenvectors: [0.71, 0.71], [-0.71, 0.71]
```

Here, the first eigenvector `[0.71, 0.71]` corresponds to the larger eigenvalue of `2`, meaning this is the principal component that captures the most variance in the data.

#### Step 4: Projection

Finally, we project the data onto the new axis (the principal component).

```plaintext
Projected Data:
PC1  
-1.99
-0.99
0.00
0.99
1.99
```

The data has now been reduced to a single dimension (`PC1`), retaining most of the variance.

### Eigenvalues and Eigenvectors in PCA

- **Eigenvalues**: Eigenvalues tell us how much variance there is along each principal component. A higher eigenvalue indicates that the principal component captures more variance in the data.
- **Eigenvectors**: Eigenvectors represent the directions (or axes) of the principal components. They are orthogonal to each other and point in the direction of maximum variance.

In PCA, the eigenvector with the largest eigenvalue is the first principal component, and it explains the most variance in the data. The eigenvector with the second largest eigenvalue is the second principal component, and so on.

### Interpretation of PCA

- **Dimensionality Reduction**: PCA allows us to reduce the number of dimensions in the dataset by selecting only the top `k` principal components that capture most of the variance. For example, in a 3D dataset, you might reduce it to 2D by selecting the two components with the highest eigenvalues.
  
- **Variance Explained**: The proportion of the total variance explained by each principal component can be computed as the ratio of the corresponding eigenvalue to the sum of all eigenvalues. This is often visualized using a **scree plot**, which shows the explained variance for each component.

### Applications of PCA

1. **Data Compression**: PCA is used to reduce the size of datasets while preserving important patterns and structures.
2. **Noise Reduction**: By keeping only the principal components that capture significant variance, PCA helps filter out noise in the data.
3. **Visualization**: PCA is often used for visualizing high-dimensional data in two or three dimensions.

### Conclusion

PCA is a powerful tool for reducing the dimensionality of large datasets while retaining essential information. It helps alleviate the "curse of dimensionality" and makes data easier to visualize and work with in machine learning models. The eigenvectors and eigenvalues computed during PCA provide the principal components and their importance in explaining the variance in the data.

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = np.array([[3, 7],
                 [-4, -6],
                 [1, -1],
                 [7, 8],
                 [-4, -1],
                 [-3, -7]])

In [3]:
dataframe = pd.DataFrame(data, columns=['x1', 'x2'])

Standard Normal Form: If it contains mean as 0 and standard deviation as 1

In [4]:
dataframe

Unnamed: 0,x1,x2
0,3,7
1,-4,-6
2,1,-1
3,7,8
4,-4,-1
5,-3,-7


In [5]:
dataframe.describe()

Unnamed: 0,x1,x2
count,6.0,6.0
mean,0.0,0.0
std,4.472136,6.324555
min,-4.0,-7.0
25%,-3.75,-4.75
50%,-1.0,-1.0
75%,2.5,5.0
max,7.0,8.0


In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dataScaled = scaler.fit_transform(dataframe)

In [7]:
type(dataScaled)

numpy.ndarray

In [8]:
dataframe = pd.DataFrame(dataScaled, columns=['x1', 'x2'])

In [9]:
dataframe.describe()

Unnamed: 0,x1,x2
count,6.0,6.0
mean,1.850372e-17,0.0
std,1.095445,1.095445
min,-0.9797959,-1.212436
25%,-0.9185587,-0.822724
50%,-0.244949,-0.173205
75%,0.6123724,0.866025
max,1.714643,1.385641


1. To check whether the data is in standard normal form or not. As our data is in standard normal form, we are skipping that step of coverting our data to standard normal form.

2. Covariance Matrix between the two features that we have (x1, x2)

In [10]:
## Approach 1
c1 = dataframe.x1
c2 = dataframe.x2
np.cov(c1, c2)

array([[1.2       , 1.06066017],
       [1.06066017, 1.2       ]])

In [11]:
dataframe.shape

(6, 2)

In [12]:
## Approach 2
covarianceMatrix = dataframe.T @ dataframe / 5
covarianceMatrix

Unnamed: 0,x1,x2
x1,1.2,1.06066
x2,1.06066,1.2


In [13]:
## Approach 3
np.sum(c1 * c2)/5

1.0606601717798212

3. Evaluate the eigen values and eigen vectors of the above coorelation matrix

In [14]:
eigenValues, eigenVectors = np.linalg.eig(covarianceMatrix)

In [15]:
## eigenValues = # features
## eigenValue represents the strength of the information given by the eigenVector
eigenValues

array([2.26066017, 0.13933983])

In [16]:
eigenVectors

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

In [17]:
eigenVectors[:, 0]

array([0.70710678, 0.70710678])

In [18]:
dataframe

Unnamed: 0,x1,x2
0,0.734847,1.212436
1,-0.979796,-1.03923
2,0.244949,-0.173205
3,1.714643,1.385641
4,-0.979796,-0.173205
5,-0.734847,-1.212436


In [19]:
## PC1 - Contains the maximum information of the original two features that you have
PC1 = dataframe @ eigenVectors[:, 0]

In [20]:
PC1

0    1.376937
1   -1.427667
2    0.050731
3    2.192231
4   -0.815295
5   -1.376937
dtype: float64

In [21]:
## PC2
PC2 = dataframe @ eigenVectors[:, 1]
PC2

0    0.337706
1   -0.042027
2   -0.295680
3   -0.232640
4    0.570346
5   -0.337706
dtype: float64

In [22]:
dataframe

Unnamed: 0,x1,x2
0,0.734847,1.212436
1,-0.979796,-1.03923
2,0.244949,-0.173205
3,1.714643,1.385641
4,-0.979796,-0.173205
5,-0.734847,-1.212436
