**20. Hyperparameter tuning using Optuna**

**`References:`**
* Optuna paper: https://arxiv.org/pdf/1907.10902
* Tree Structured Parzen Estimator: http://arxiv.org/pdf/2304.11127

**`Tools in hyperparameter tuning:`**
* Grid Search Cross Validation (CV) [defines a grid w.r.t params to be tuned, and tests each] (COSTLY)
* Random Search Cross Validation (CV) [randomly tests on the grid defined] (MAY MISS ON BEST VALUE)
* Bayesian Search 

**`Key terms in Optuna`**
1. Study:<br> A study in Optuna is an optimization session that encompasses multiple trials. It's essentially a collection of trials aimed at optimizing the objective function. You can think of a study as the overall experiment or search process. Example: A study to find the best hyperparameters for XGBoost model. 
   
2. Trial:<br> A trial is a single iteration of the optimization process where a specific set of hyperparameters is evaluated. Each trial runs the objective function once with a distinct set of hyperparameters. Example: One trial could include training a model with a learning rate of 0.01 and a max depth of 5.
   
3. Trial Parameters:<br> These are the specific hyperparameter values chosen during a trial. Each trial will have a unique combination of hyperparameters that are evaluated to see how they impact the objective function. Example: In one trial the learning rate may be 0.001 while the batch_size is 32, while in other, the learning rate could be 0.01 and batch_size is 64.
   
4. Objective function:<br> The function to be optimized (minimized or maximized) during the hyperparameter search. It takes the hyperparameters as input and returns a value (such as accuracy, loss, or any other metric) that Optuna tries to optimize. Example: In a classification task, the objective function could be the cross entropy loss which Optuna seeks to minimize.
   
5. Sampler:<br> An algorithm that suggests which hyperparameters should be evaluated next. Optuna uses the Tree-Structured Parzen Estimator (TPE) by default, but it also supports other sampling methods like random search or even custom samplers. Eg: TPE suggests promising areas of the hyperparameter space, focusing on regions that are likely to yield better results.

In [1]:
# important imports
import optuna 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load the Pima Indian Diabetes dataset (from UCI repository)
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI',
           'DiabetesPedigreeFunction', 'Age', 'Outcome']

df = pd.read_csv(url,names = columns)
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
# handling missing values (represented as zeros)
import numpy as np

# replace zeros values with NaN in columns where zero is not a valid value
cols_with_missing_value = ['Glucose','BloodPressure','SkinThickness','Insulin','BMI']
df[cols_with_missing_value] = df[cols_with_missing_value].replace(0,np.nan)

# impute the missing values with the mean of the respective column
df.fillna(df.mean(),inplace=True)

# check if there are any remaining missing values
print(df.isnull().sum())

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64


In [3]:
# Split into features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']

# Split data into training and test sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Optional: Scale the data for better model performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Check the shape of the data
print(f'Training set shape: {X_train.shape}')
print(f'Test set shape: {X_test.shape}')

Training set shape: (537, 8)
Test set shape: (231, 8)


In [12]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# define objective function
def objective(trial:optuna.trial):
    # suggest values for hyper-parameter tuning
    n_estimators = trial.suggest_int('n_estimators',50,200)
    max_depth = trial.suggest_int('max_depth',3,20)

    # create the random forest classifier with suggested hyperparams
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        random_state=14
    )

    # perform 3-fold cross validation and calculate accuracy
    score = cross_val_score(model,X_train,y_train,cv = 3, scoring='accuracy').mean()
    return score

In [13]:
# create a study object and optimize the objective function
study = optuna.create_study(direction='maximize', sampler = optuna.samplers.TPESampler())
# aim to maximize accuracy
study.optimize(objective,n_trials=50) # runs 50 trials to find the best hyperparameters

[I 2025-05-28 20:36:21,262] A new study created in memory with name: no-name-c9a4642d-832b-4117-ad09-9fadaa997b3b
[I 2025-05-28 20:36:21,832] Trial 0 finished with value: 0.750465549348231 and parameters: {'n_estimators': 148, 'max_depth': 19}. Best is trial 0 with value: 0.750465549348231.
[I 2025-05-28 20:36:22,120] Trial 1 finished with value: 0.7597765363128491 and parameters: {'n_estimators': 78, 'max_depth': 8}. Best is trial 1 with value: 0.7597765363128491.
[I 2025-05-28 20:36:22,333] Trial 2 finished with value: 0.7709497206703911 and parameters: {'n_estimators': 57, 'max_depth': 9}. Best is trial 2 with value: 0.7709497206703911.
[I 2025-05-28 20:36:22,604] Trial 3 finished with value: 0.7579143389199255 and parameters: {'n_estimators': 74, 'max_depth': 10}. Best is trial 2 with value: 0.7709497206703911.
[I 2025-05-28 20:36:22,833] Trial 4 finished with value: 0.7579143389199254 and parameters: {'n_estimators': 65, 'max_depth': 18}. Best is trial 2 with value: 0.770949720670

In [None]:
print(f'Best trial accuracy: {study.best_trial.value}')
print(f'Best hyperparams: {study.best_trial.params}')

In [None]:
from sklearn.metrics import accuracy_score

# Train a RandomForestClassifier using the best hyperparameters from Optuna
best_model = RandomForestClassifier(**study.best_trial.params, random_state=42)

# Fit the model to the training data
best_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = best_model.predict(X_test)

# Calculate the accuracy on the test set
test_accuracy = accuracy_score(y_test, y_pred)

# Print the test accuracy
print(f'Test Accuracy with best hyperparameters: {test_accuracy:.2f}')


`Other Samplers in Optuna`
* Random Search
* Grid Search
* ....

In [None]:
# create a study using Random Search
study = optuna.create_study(direction='maximize', sampler = optuna.samplers.RandomSampler())
# aim to maximize accuracy
study.optimize(objective,n_trials=50) # runs 50 trials to find the best hyperparameters

In [None]:
print(f'Best trial accuracy: {study.best_trial.value}')
print(f'Best hyperparams: {study.best_trial.params}')

In [None]:
# grid search
search_space = {
    'n_estimators': [50,100,150,200],
    'max_depth': [5,10,15,20]
}


# create a study using Random Search
study = optuna.create_study(direction='maximize', sampler = optuna.samplers.GridSampler(search_space))
# aim to maximize accuracy
study.optimize(objective)

In [None]:
print(f'Best trial accuracy: {study.best_trial.value}')
print(f'Best hyperparams: {study.best_trial.params}')

**20.01. Optuna Visualizations**

In [5]:
from optuna.visualization import plot_optimization_history,plot_parallel_coordinate,plot_slice,plot_contour,plot_param_importances

In [6]:
# create a study object and optimize the objective function
study = optuna.create_study(direction='maximize', sampler = optuna.samplers.TPESampler())
# aim to maximize accuracy
study.optimize(objective,n_trials=50) # runs 50 trials to find the best hyperparameters

[I 2025-05-28 15:34:56,777] A new study created in memory with name: no-name-db1eba07-72ba-4991-89e6-6f96574973b7
[I 2025-05-28 15:34:57,448] Trial 0 finished with value: 0.7541899441340782 and parameters: {'n_estimators': 195, 'max_depth': 13}. Best is trial 0 with value: 0.7541899441340782.
[I 2025-05-28 15:34:57,669] Trial 1 finished with value: 0.7579143389199255 and parameters: {'n_estimators': 65, 'max_depth': 20}. Best is trial 1 with value: 0.7579143389199255.
[I 2025-05-28 15:34:58,301] Trial 2 finished with value: 0.7411545623836125 and parameters: {'n_estimators': 196, 'max_depth': 15}. Best is trial 1 with value: 0.7579143389199255.
[I 2025-05-28 15:34:58,633] Trial 3 finished with value: 0.7672253258845437 and parameters: {'n_estimators': 92, 'max_depth': 9}. Best is trial 3 with value: 0.7672253258845437.
[I 2025-05-28 15:34:59,256] Trial 4 finished with value: 0.7616387337057727 and parameters: {'n_estimators': 170, 'max_depth': 20}. Best is trial 3 with value: 0.7672253

In [7]:
# 1. Optimization history
plot_optimization_history(study).show()

In [8]:
# 2. Parallel coordinate plot
plot_parallel_coordinate(study).show()

In [9]:
# 3. slice plot
plot_slice(study).show()

In [10]:
# 4. Contour plot
plot_contour(study).show()

In [11]:
# 5. importance plot
plot_param_importances(study).show()

**20.02 Define by run**

* Used to create dynamic search spaces
  * Helps to decide which algorithm is the best, and what are its best parameters by creating dynamic search spaces.
  * Here we tune the algorithm itself as a hyperparameter. [SVM, XGBoost, Random Forest, Linear Regression].
  * Based on the algorithm selected we can create search spaces.[SS1, SS2, SS3, SS4].

In [20]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC

# define the objective function
def objective(trial:optuna.trial):
    # choose the algorithm the algorithm to tune
    classifier_name = trial.suggest_categorical('classifier',['SVM','RandomForest','GradientBoosting'])
    random_seed = 14
    if classifier_name == 'SVM':
        # SVM hyperparams
        c = trial.suggest_float('C',0.1,100,log=True)
        kernel = trial.suggest_categorical('kernel', ['linear','rbf','poly','sigmoid'])
        gamma = trial.suggest_categorical('gamma',['scale','auto'])
        model = SVC(C = c, kernel = kernel, gamma = gamma, random_state=random_seed)

    elif classifier_name == 'RandomForest':
        n_estimators = trial.suggest_int('n_estimators',50,300)
        max_depth = trial.suggest_int('max_depth',3,20)
        min_samples_split = trial.suggest_int('min_samples_split',2,10)
        min_samples_leaf = trial.suggest_int('min_samples_leaf',1,10)
        bootstrap = trial.suggest_categorical('bootstrap',[True,False])
        model = RandomForestClassifier(
            n_estimators = n_estimators,
            max_depth = max_depth,
            min_samples_split = min_samples_split,
            min_samples_leaf = min_samples_leaf,
            bootstrap = bootstrap,
            random_state = random_seed
        )
    elif classifier_name == 'GradientBoosting':
        # gradient boosting hyperparameters
        n_estimators = trial.suggest_int('n_estimators',50,300)
        learning_rate = trial.suggest_float('learning_rate',0.01,0.3,log = True)
        max_depth = trial.suggest_int('max_depth',3,20)
        min_samples_split = trial.suggest_int('min_samples_split',2,10)
        min_samples_leaf = trial.suggest_int('min_samples_leaf',1,10)

        model = GradientBoostingClassifier(
            n_estimators = n_estimators,
            learning_rate = learning_rate,
            max_depth = max_depth,
            min_samples_split = min_samples_split,
            min_samples_leaf = min_samples_leaf,
            random_state = random_seed
        )
    
    score = cross_val_score(model, X_train, y_train, cv = 3, scoring = 'accuracy').mean()
    return score

In [21]:
# create a study and optimize it (defualt sampler = TPE)
study = optuna.create_study(direction='maximize',sampler = optuna.samplers.TPESampler())
study.optimize(objective,n_trials=100)

[I 2025-05-28 20:54:40,651] A new study created in memory with name: no-name-fdbb65bd-c956-471f-911e-c9f4096f3045
[I 2025-05-28 20:54:40,891] Trial 0 finished with value: 0.7709497206703911 and parameters: {'classifier': 'RandomForest', 'n_estimators': 66, 'max_depth': 7, 'min_samples_split': 9, 'min_samples_leaf': 3, 'bootstrap': True}. Best is trial 0 with value: 0.7709497206703911.
[I 2025-05-28 20:54:40,918] Trial 1 finished with value: 0.7746741154562384 and parameters: {'classifier': 'SVM', 'C': 0.9655703798413977, 'kernel': 'rbf', 'gamma': 'auto'}. Best is trial 1 with value: 0.7746741154562384.
[I 2025-05-28 20:54:42,930] Trial 2 finished with value: 0.7337057728119181 and parameters: {'classifier': 'GradientBoosting', 'n_estimators': 135, 'learning_rate': 0.07951992615888773, 'max_depth': 16, 'min_samples_split': 4, 'min_samples_leaf': 3}. Best is trial 1 with value: 0.7746741154562384.
[I 2025-05-28 20:54:46,640] Trial 3 finished with value: 0.7299813780260708 and parameters:

In [23]:
# retrieve the best trial
best_trial = study.best_trial
print(f'Best trial parameters:{best_trial.params}')
print(f'Best trial accuracy:{best_trial.value}')

Best trial parameters:{'classifier': 'SVM', 'C': 0.14410069729785183, 'kernel': 'linear', 'gamma': 'auto'}
Best trial accuracy:0.7895716945996275


In [25]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_C,params_bootstrap,params_classifier,params_gamma,params_kernel,params_learning_rate,params_max_depth,params_min_samples_leaf,params_min_samples_split,params_n_estimators,state
0,0,0.770950,2025-05-28 20:54:40.652981,2025-05-28 20:54:40.891661,0 days 00:00:00.238680,,True,RandomForest,,,,7.0,3.0,9.0,66.0,COMPLETE
1,1,0.774674,2025-05-28 20:54:40.892829,2025-05-28 20:54:40.918074,0 days 00:00:00.025245,0.965570,,SVM,auto,rbf,,,,,,COMPLETE
2,2,0.733706,2025-05-28 20:54:40.920076,2025-05-28 20:54:42.930597,0 days 00:00:02.010521,,,GradientBoosting,,,0.079520,16.0,3.0,4.0,135.0,COMPLETE
3,3,0.729981,2025-05-28 20:54:42.931928,2025-05-28 20:54:46.639761,0 days 00:00:03.707833,,,GradientBoosting,,,0.024362,13.0,1.0,9.0,298.0,COMPLETE
4,4,0.769088,2025-05-28 20:54:46.641307,2025-05-28 20:54:46.850180,0 days 00:00:00.208873,,True,RandomForest,,,,18.0,7.0,7.0,65.0,COMPLETE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,95,0.789572,2025-05-28 20:55:04.542892,2025-05-28 20:55:04.564169,0 days 00:00:00.021277,0.131780,,SVM,auto,linear,,,,,,COMPLETE
96,96,0.785847,2025-05-28 20:55:04.565737,2025-05-28 20:55:04.589950,0 days 00:00:00.024213,0.248177,,SVM,auto,linear,,,,,,COMPLETE
97,97,0.729981,2025-05-28 20:55:04.590842,2025-05-28 20:55:04.616605,0 days 00:00:00.025763,0.353488,,SVM,auto,poly,,,,,,COMPLETE
98,98,0.785847,2025-05-28 20:55:04.617745,2025-05-28 20:55:04.640903,0 days 00:00:00.023158,0.195122,,SVM,auto,linear,,,,,,COMPLETE


In [26]:
study.trials_dataframe()['params_classifier'].value_counts()

params_classifier
SVM                 79
RandomForest        11
GradientBoosting    10
Name: count, dtype: int64

In [27]:
study.trials_dataframe().groupby('params_classifier')['value'].mean()

params_classifier
GradientBoosting    0.747300
RandomForest        0.765871
SVM                 0.775806
Name: value, dtype: float64

**Other features of Optuna**
* Distributed Computing is possible.
* Integration with other libraries like scikit-learn, keras, pytorch, mlflow is great.