# Hands-on 2
### The Olivetti faces dataset
This dataset contains a set of face images taken at AT&T Laboratories Cambridge. There are ten different images of each of 40 distinct subjects.
|Attribute|Value|
|:------|:-|
|Classes|40|
|Samples total|400|
|Dimensionality|4096|
|Features| real, between 0 and 1|

In [None]:
# Initialization
%matplotlib inline
from warnings import filterwarnings
filterwarnings("ignore")

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from random import randint
from sklearn.datasets import fetch_olivetti_faces
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split as split, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

# Load the Olivetti faces dataset
data = fetch_olivetti_faces()
X = data.images  # Images data
y = data.target  # Target labels

# Reshape the images data to 2D array (flatten the images)
n_samples, h, w = X.shape
X = X.reshape(n_samples, h * w)

# Function to plot original and reconstructed images
def plot_faces(original, n=10):
    rn = [randint(0, 399) for _ in range(n)]
    plt.figure(figsize=(20, 4))
    for i in range(n):
        # Original faces
        ax = plt.subplot(2, n, i + 1)
        plt.imshow(original[rn[i]].reshape((h, w)), cmap='gray')
        plt.title(y[rn[i]])
        plt.axis('off')
    plt.show()
    
# Plot some of the faces before and after PCA transformation
plot_faces(X, n=10)

To do: 
  - Split the dataset into training (70%) and testing (30%) sets.
  - Train a knn model (using default settings).
  - Evaluate its performance using the testing set and print the score.

To do: 
  - Construct a pipeline with two steps: dimensionality reduction using PCA ('dr') and classification ('clf').
  - Use gridsearch to search for the best process for each steps as follows:
    - dimensionality reduction with PCA: 50 to 200 principal components (with an interval of 50).
    - classification:  kNN, logistic regression, decision tree, random forest, support vector machine and multilayer perceptron
    - Use default settings for all the classifiers.

To do: 
- Print the best parameters
- Store the best model as 'model'.
- Evaluate the best model using the testing set and print the score.

To do:
- Print classification report for the best model