<a href="https://colab.research.google.com/github/ugonna05/005-email-spam-classifier-python/blob/main/hyperparameter_comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# hyperparameter tuning

**Basically, anything in machine learning and deep learning that you decide their values or choose their configuration before training begins and whose values or configuration will remain the same when training ends is a hyperparameter. **

Here are some common examples



*  Train-test split ratio 
* Learning rate in optimization  algorithms (e.g. gradient descent)
*  Choice of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)
*  Choice of activation function in a neural network (nn) layer (e.g. Sigmoid, ReLU, Tanh)
* The choice of cost or loss function the model will use
* Number of hidden layers in a nn
* Number of activation units in each layer
* The drop-out rate in nn (dropout probability)
* Number of iterations (epochs) in training a nn
* Number of clusters in a clustering task
*   Kernel or filter size in convolutional layers
* Pooling size
* Batch size 



# Hyperparameter Optimization (HPO) of Machine Learning Models

**sample code for classification problems**
**Dataset used:**
MNIST from sklearn

**Machine learning algorithms used:**
Random Forest (RF), Support Vector machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN)

**HPO algorithm used:**
Grid Search, random search, hyperband, Bayesian Optimization with structured Gaussian Processes (BO-GP), Bayesian Optimization with structured Parzen Estimator (BO-TPE), particle swarm optimization (PSO), genetic algorithm (AG)

**performance metrics:**
Classification accuracy

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.svm import SVC, SVR
from sklearn import datasets
import scipy.stats as stats

# Load MNIST Dataset
the MNIST database (modefied national institutebof standard and technology database) is a large database of handwritten digits that is commonly used for training various image processesing systems, the MNIST database has training set of 60,000 examples, and a test set of 10,000 examples. ikt is a subset of a large set available from NIST.
the digits have been size-normalized and centered in afixed-size image.

you can read more about the data and its varable

In [None]:
d=datasets.load_digits()
x=d.data
y=d.target

In [None]:
 datasets.load_digits()

{'DESCR': ".. _digits_dataset:\n\nOptical recognition of handwritten digits dataset\n--------------------------------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 1797\n    :Number of Attributes: 64\n    :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n    :Missing Attribute Values: None\n    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n    :Date: July; 1998\n\nThis is a copy of the test set of the UCI ML hand-written digits datasets\nhttps://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits\n\nThe data set contains images of hand-written digits: 10 classes where\neach class refers to a digit.\n\nPreprocessing programs made available by NIST were used to extract\nnormalized bitmaps of handwritten digits from a preprinted form. From a\ntotal of 43 people, 30 contributed to the training set and different 13\nto the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\n4x4 and the number o

**Baseline Machine Learning models:
Clasifiers with Default Hyperparameters**

In [None]:
# RANDOM NFORET CLASSIFER
clf = RandomForestClassifier()
clf.fit(x,y)
scores= cross_val_score(clf, x, y, cv=3, scoring='accuracy')
print('accuracy:'+ str(scores.mean()))

accuracy:0.9376739009460211


In [None]:
#   SVM
clf = SVC(gamma='scale')
clf.fit(x,y)
scores= cross_val_score(clf, x, y, cv=3, scoring='accuracy')
print('accuracy:'+ str(scores.mean()))

accuracy:0.9699499165275459


In [None]:
# KNN
clf = KNeighborsClassifier()
clf.fit(x,y)
scores= cross_val_score(clf, x, y, cv=3, scoring='accuracy')
print('accuracy:'+ str(scores.mean()))

accuracy:0.9627156371730662


In [None]:
#ANN
from keras.models import Sequential, Model
from  keras.layers import Dense, Input
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.callbacks import EarlyStopping

def ANN(optimizer = 'sgd', neurons=32, batch_size=32, epochs=20, activation='relu',patience=3,loss='categorical_crossentropy'):
  model = Sequential()
  model.add(Dense(neurons, input_shape=(x.shape[1],), activation=activation))
  model.add(Dense(neurons, activation=activation))
  model.add(Dense(10, activation='softmax'))
  model.compile(optimizer=optimizer, loss=loss)
  early_stopping =EarlyStopping(monitor='loss', patience=patience) # early stop patience
  history=model.fit(x, pd.get_dummies(y).values,
                    batch_size=batch_size,
                    epochs=epochs,
                    callbacks=[early_stopping],
                    verbose=0)# verbose set to 1 will show the training process
  return model


In [None]:
clf = KerasClassifier(build_fn=ANN, verbose=1)
scores= cross_val_score(clf, x, y, cv=3, scoring='accuracy')
print('accuracy:'+ str(scores.mean()))

  """Entry point for launching an IPython kernel.


accuracy:0.9821925431274346


# HPO Algorithm 1: Grid Search
search all the given hyper-parameter configurations

# Advantatages:

*  Simple implementation.

# Disadvantages

*  time consuming
*  only efficient with categorical HPs






In [None]:
# Random forest
from sklearn.model_selection import GridSearchCV
# define the hyperparameter configuration space
params = {
    'n_estimators': [10, 20, 30],
    # 'max_features': ['sqrt', 0.5],
    'max_depth': [15, 20, 30, 50],
    # 'min_samples_leaf': [1,2,4,8]
    #"bootstrap":[True,False],
    "criterion":['gini', 'entropy']
}

clf= RandomForestClassifier(random_state=0)
grid=GridSearchCV(clf, params, cv=3, scoring='accuracy')
grid.fit(x, y)
print(grid.best_params_)
print('Accuracy:'+str(grid.best_score_))

{'criterion': 'gini', 'max_depth': 15, 'n_estimators': 30}
Accuracy:0.9360044518642182


In [None]:
# SVM
from sklearn.model_selection import GridSearchCV
# define the hyperparameter configuration space
params = {
    'C': [1, 10, 100],
    'kernel':['linear', 'poly', 'rbf', 'sigmoid']
}

clf= SVC(gamma='scale')
grid=GridSearchCV(clf, params, cv=3, scoring='accuracy')
grid.fit(x, y)
print(grid.best_params_)
print('Accuracy:'+str(grid.best_score_))

{'C': 10, 'kernel': 'rbf'}
Accuracy:0.9738452977184195


In [None]:
#KNN
from sklearn.model_selection import GridSearchCV
# define the hyperparameter configuration space
params = {
  'n_neighbors':[2, 3, 5, 10, 15, 20],
}

clf= KNeighborsClassifier()
grid=GridSearchCV(clf, params, cv=3, scoring='accuracy')
grid.fit(x, y)
print(grid.best_params_)
print('Accuracy:'+str(grid.best_score_))

{'n_neighbors': 3}
Accuracy:0.9682804674457429


In [None]:
#ANN
from sklearn.model_selection import GridSearchCV
# define the hyperparameter configuration space
params = {
    'optimizer':['adam', 'rmsprop', 'sgd'],
    'activation':['relu', 'tanh'],
    'batch_size':[16, 32],
    'neurons':[16,32],
    'epochs':[20,50],
    'patience':[2,5]
}

clf= KerasClassifier(build_fn=ANN, verbose=0)
grid=GridSearchCV(clf, params, cv=3, scoring='accuracy')
grid.fit(x, y)
print(grid.best_params_)
print('MSE:'+str(grid.best_score_))

  del sys.path[0]


{'activation': 'relu', 'batch_size': 16, 'epochs': 50, 'neurons': 32, 'optimizer': 'sgd', 'patience': 5}
MSE:0.9988870339454646


# HPO Algorithm 2: Random Search space

randomly search hyperparameter combinations in the search space

**Advantages:**

*  More efficient than gridsearch GS.
*  Enable parallelization

**Disadvantages:**

*  does not consider previous results
*  not efficient with cionditional HPs





In [None]:
# Random Forest
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange
from sklearn.model_selection import RandomizedSearchCV

#define the hyperparameter configuration space
params = {
    'n_estimators': sp_randint(10,100),
    'max_features': sp_randint(1, 64),
    'max_depth': sp_randint(5,50),
    'min_samples_split': sp_randint(2, 11),
    'min_samples_leaf': sp_randint(1,11),
    'criterion': ['gini', 'entropy']
}

n_iter_search=20 # number of iterations is set to 20, you can increase it if time permits
clf = RandomForestClassifier(random_state=0)
Random = RandomizedSearchCV(clf, param_distributions=params, n_iter=n_iter_search,cv=3,scoring='accuracy' )
Random.fit(x, y),
print(Random.best_params_)
print('Accuracy:'+str(Random.best_score_))

{'criterion': 'entropy', 'max_depth': 35, 'max_features': 36, 'min_samples_leaf': 1, 'min_samples_split': 4, 'n_estimators': 47}
Accuracy:0.9176405119643851


In [None]:
# SVM
from scipy.stats import randint as sp_randint
from sklearn.model_selection import RandomizedSearchCV

#define the hyperparameter configuration space
params = {
    'C': stats.uniform(0,50),
    'kernel':['linear', 'poly', 'rbf', 'sigmoid']
}

n_iter_search=20 # number of iterations is set to 20, you can increase it if time permits
clf = SVC(gamma='scale')
Random = RandomizedSearchCV(clf, param_distributions=params, n_iter=n_iter_search,cv=3,scoring='accuracy' )
Random.fit(x, y),
print(Random.best_params_)
print('Accuracy:'+str(Random.best_score_))

{'C': 30.854054917308055, 'kernel': 'rbf'}
Accuracy:0.9738452977184195


In [None]:
#KNN
from scipy.stats import randint as sp_randint
from sklearn.model_selection import RandomizedSearchCV

#define the hyperparameter configuration space
params = {
    'n_neighbors': range(1, 20),
}

n_iter_search=20 # number of iterations is set to 20, you can increase it if time permits
clf = KNeighborsClassifier()
Random = RandomizedSearchCV(clf, param_distributions=params, n_iter=n_iter_search,cv=3,scoring='accuracy' )
Random.fit(x, y),
print(Random.best_params_)
print('Accuracy:'+str(Random.best_score_))



{'n_neighbors': 3}
Accuracy:0.9682804674457429


In [None]:
# ANN
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange
from sklearn.model_selection import RandomizedSearchCV

#define the hyperparameter configuration space
params = {
    'optimizer': ['adam','rmsprop','sgd'],
    'activation': ['relu','tanh'],
    'batch_size': [16,32,64],
    'neurons':sp_randint(10,100),
    'epochs':[20,50],
    #'epochs':[20,50,100,200],
    'patience':sp_randint(3,20)

}

n_iter_search=10 # number of iterations is set to 20, you can increase it if time permits
clf = KerasClassifier(build_fn=ANN, verbose=0)
Random = RandomizedSearchCV(clf, param_distributions=params, n_iter=n_iter_search,cv=3,scoring='accuracy' )
Random.fit(x, y)
print(Random.best_params_)
print('Accuracy:'+str(Random.best_score_))



# HPO Algorithm 3: Hyperband
Generate mall-sized subsets and allocate budgets to each hyperparameter combination based on its performance

**Advantages:**
> Enables parallelization

**disadvantages:**
*  Not efficient with conditional HPs
*  Require subsets with small budgets to be representative

In [None]:
# Randm Forest
from hyperband import HyperbandSearchCV
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange

# define the hyperparameter configuration space

params = {
    'n_estimators': sp_randint(10, 100),
    'max_features': sp_randint(1, 64),
    'max_depth': sp_randint(5, 50),
    'min_samples_split': sp_randint(2, 11),
    'min_samples_leaf': sp_randint(1, 11),
    'criterion': ['gini', 'entropy']
}

clf = RandomForestClassifier(random_state=0)
hyper = HyperbandSearchCV(clf, param_distribution=params, cv=3,min_iter=1,max_iter=20,scoring='accuracy',resource_param='n_neighbors')
hyper.fit(x, y)
print(hyper.best_params)
print("Accuracy:"+ str(hyper.best_score_))

ModuleNotFoundError: ignored

In [None]:
#SVM

from hyperband import HyperbandSearchCV
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange

# define the hyperparameter configuration space

params = {
    'C': stats.uniform(0, 50)
},
'kernel':['linear', 'poly', 'rbf', 'sigmoid']

clf = SVC(gamma'scale)
hyper = HyperbandSearchCV(clf, param_distribution=params, cv=3,min_iter=1,max_iter=20,scoring='accuracy',resource_param='n_neighbors')
hyper.fit(x, y)
print(hyper.best_params)
print("Accuracy:"+ str(hyper.best_score_))

SyntaxError: ignored

In [None]:
#KNN

from hyperband import HyperbandSearchCV
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange

# define the hyperparameter configuration space

params = {
    'n_neighbors': range(1, 2),
}

clf = KNeighborsClassifier()
hyper = HyperbandSearchCV(clf, param_distribution=params, cv=3,min_iter=1,max_iter=20,scoring='accuracy',resource_param='n_neighbors')
hyper.fit(x, y)
print(hyper.best_params)
print("Accuracy:"+ str(hyper.best_score_))

In [None]:
from hyperband import HyperbandSearchCV
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange

# define the hyperparameter configuration space

params = {
    'optimizer': ['adam','rmsprop','sgd'],
    'activation': ['relu','tanh'],
    'batch_size': [16,32,64],
    'neurons':sp_randint(10,100),
    'epochs':[20,50],
    #'epochs':[20,50,100,200],
    'patience':sp_randint(3,20
}

clf = kerasClassifier(built_fn=ANN, epochs=20, verbose=0)
hyper = HyperbandSearchCV(clf, param_distribution=params, cv=3,min_iter=1,max_iter=20,scoring='accuracy',resource_param='n_neighbors')
hyper.fit(x, y)
print(hyper.best_params)
print("Accuracy:"+ str(hyper.best_score_))

In [None]:
"""
"""
from .search import HyperbandSearchCV

__all__ = ['HyperbandSearchCV']

ImportError: ignored

**HPO Algorithm 4: BO-GP**
Bayesian Optimization with Gaussian Process (BO-GP)

**Advantages:**
* Fast convergence speed for continuous HPs.

**Disavantages:**

* Poor capacity for parallelization.
* Not efficient with conditional HPs.

**Using skopt,BayesSearchCV**

In [None]:
# Random forest
from skopt import OPtimizer
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer

#

ModuleNotFoundError: ignored

In [None]:
!pip install skopt

[31mERROR: Could not find a version that satisfies the requirement skopt (from versions: none)[0m
[31mERROR: No matching distribution found for skopt[0m
