# Dimensionality Reduction 

This helps identify patterns in data based, on the correlation between features. it us an insupervised linear transformation technique used in feature extraction, EDA, denoising stock market trading etc.

`Type`:
    
    • Principal component analysis (PCA) for unsupervised data compression
    
    • Linear discriminant analysis (LDA) as a supervised dimensionality reduction technique for maximizing   
        class separability
    
    • Nonlinear dimensionality reduction via kernel principal component
        analysis (KPCA)

`Note` : The dataset must be scaled first before applying any dimensionality reduction technique




![image.png](attachment:image.png)
#### example of PCA

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

sns.set_style()

In [3]:
file ='../data/wine/wine.data'

names = ['Class label', 'Alcohol','Malic acid', 'Ash','Alcalinity of ash', 'Magnesium','Total phenols', 
         'Flavanoids', 'Nonflavanoid phenols','Proanthocyanins','Color intensity', 'Hue',
         'OD280/OD315 of diluted wines','Proline']

df_wine = pd.read_csv(file, names=names)
df_wine.columns = df_wine.columns.str.lower()

In [4]:
df_wine.head()

Unnamed: 0,class label,alcohol,malic acid,ash,alcalinity of ash,magnesium,total phenols,flavanoids,nonflavanoid phenols,proanthocyanins,color intensity,hue,od280/od315 of diluted wines,proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [5]:
# split sets
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.3, stratify=y)

In [6]:
# standardized features
std_sc = StandardScaler()
X_train_scaled = std_sc.fit_transform(X_train)
X_test_scled = std_sc.transform(X_test)

In [7]:
# Calculating the  