# Dimensionality Reduction using SVM
### Mini Project for Machine Learning
---
**Objective:** Compare the performance of SVM classifier before and after applying dimensionality reduction techniques such as PCA and LDA.

**Dataset:** `sklearn.datasets.load_digits`

**Tools Used:** Python, scikit-learn, matplotlib, seaborn

---

In [None]:

# PROJECT: Dimensionality Reduction using SVM

# Uncomment if needed:
# !pip install scikit-learn matplotlib numpy seaborn

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

digits = datasets.load_digits()
X, y = digits.data, digits.target
print(f"Dataset shape: {X.shape}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)

pca = PCA(n_components=20)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

print(f"Explained variance ratio (PCA): {np.sum(pca.explained_variance_ratio_):.2f}")

svm_orig = SVC(kernel='rbf', gamma='scale')
svm_orig.fit(X_train_std, y_train)
y_pred_orig = svm_orig.predict(X_test_std)
acc_orig = accuracy_score(y_test, y_pred_orig)
print(f"Accuracy (without PCA): {acc_orig:.4f}")

svm_pca = SVC(kernel='rbf', gamma='scale')
svm_pca.fit(X_train_pca, y_train)
y_pred_pca = svm_pca.predict(X_test_pca)
acc_pca = accuracy_score(y_test, y_pred_pca)
print(f"Accuracy (with PCA=20): {acc_pca:.4f}")

lda = LDA(n_components=9)
X_train_lda = lda.fit_transform(X_train_std, y_train)
X_test_lda = lda.transform(X_test_std)
svm_lda = SVC(kernel='rbf', gamma='scale')
svm_lda.fit(X_train_lda, y_train)
y_pred_lda = svm_lda.predict(X_test_lda)
acc_lda = accuracy_score(y_test, y_pred_lda)
print(f"Accuracy (with LDA): {acc_lda:.4f}")


In [None]:

print("\n--- Classification Reports ---")
print("Original Data:\n", classification_report(y_test, y_pred_orig))
print("PCA Data:\n", classification_report(y_test, y_pred_pca))
print("LDA Data:\n", classification_report(y_test, y_pred_lda))


In [None]:

pca_2d = PCA(n_components=2)
X_proj = pca_2d.fit_transform(X_train_std)
plt.figure(figsize=(7,6))
sns.scatterplot(x=X_proj[:,0], y=X_proj[:,1], hue=y_train, palette='tab10', s=20, alpha=0.7)
plt.title("2D PCA Projection of Digits Data")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend([],[], frameon=False)
plt.show()


In [None]:

cm = confusion_matrix(y_test, y_pred_pca)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix (SVM + PCA)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()


## Summary
- PCA reduced features from 64 to 20 with ~97% accuracy.
- LDA used 9 components (for 10 classes) and gave ~96% accuracy.
- Dimensionality reduction improves efficiency while maintaining accuracy.
- This demonstrates how PCA/LDA can simplify datasets before applying SVMs.

**Applications:** Image recognition, bioinformatics, text classification, and real-time systems where computation cost matters.