# **Ex 10 Implementation of  Hyperparameter optimization Techniques - Manual Search, Random Search, Grid Search, Halving Grid Search, Randomized Search., Automated Hyperparameter tuning, Bayesian Optimization.**

1. Import Required Libraries

Loads libraries for:

Data handling (numpy, pandas)

Modeling (RandomForestClassifier)

Hyperparameter tuning (GridSearchCV, RandomizedSearchCV, HalvingGridSearchCV, BayesSearchCV, TPOTClassifier)

Model evaluation (accuracy_score)

warnings.filterwarnings('ignore'): hides warning messages (especially from TPOT).

enable_halving_search_cv: enables experimental support for HalvingGridSearch.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.model_selection import HalvingGridSearchCV
from sklearn.metrics import accuracy_score


In [None]:
!pip install scikit-optimize tpot
!pip install --upgrade tpot




2. Load iris dataset and split the dataset

Loads the Iris dataset, a classic multi-class classification dataset (3 flower species).

X: Features (sepal/petal length/width)

y: Target labels (species)

Splits the dataset into:

80% training

20% testing

Create the results dictionary

In [None]:
# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

results= {}


3. Manual Search (Baseline)

You manually choose a set of hyperparameters based on intuition or trial and error.

Sets fixed values for n_estimators and max_depth manually.

Trains and evaluates the model.

Used as a baseline for comparison.

In [None]:
model = RandomForestClassifier(n_estimators=100, max_depth=3, random_state=42)
model.fit(X_train, y_train)
acc_manual=accuracy_score(y_test, model.predict(X_test))
results['Manual Search'] = acc_manual
print("Manual Search Accuracy:", acc_manual)


Manual Search Accuracy: 1.0


4. Random Search

Tries a random subset of hyperparameter combinations from a distribution, not a grid.

Imports Python’s built-in random module.

This module allows you to randomly select values from a list.
n_estimators is a key hyperparameter for a Random Forest model.

It defines the number of decision trees in the forest.

This line randomly picks one value from the list [50, 100, 150].

If random.choice() selects 100, the forest will consist of 100 trees.

max_depth controls the maximum depth of each decision tree.

A tree's depth is the number of splits it can make from root to leaf.

None means the tree is allowed to grow until all leaves are pure or contain fewer samples than min_samples_split.

If random.choice() selects 3, trees will only grow to depth 3.

If None, the trees can grow very deep (risk of overfitting).

Instantiates a RandomForestClassifier using the randomly selected values for:

n_estimators

max_depth

random_state=42 ensures reproducibility (you get the same results every time the code is run with same settings).

Trains (fits) the Random Forest model using the training dataset (X_train, y_train).

The model learns patterns from the data by constructing multiple decision trees and averaging their outputs (or majority voting for classification).

Predicts the class labels for the test dataset (X_test) using the trained model.

Each tree votes, and the class with the most votes is the final prediction.

Compares the actual labels (y_test) with the predicted labels.

Calculates the accuracy: the proportion of correct predictions over total predictions.

Prints the accuracy of the model using the randomly selected parameters.






In [None]:
import random

n_estimators = random.choice([50, 100, 150])
max_depth = random.choice([2, 3, 5, None])

model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
model.fit(X_train, y_train)
acc_random= accuracy_score(y_test, model.predict(X_test))
results['Random Search'] = acc_random
print("Random Search Accuracy:", acc_random)


Random Search Accuracy: 1.0


5. Grid Search (Exhaustive)

Tries all combinations of the given hyperparameter values.

Cross-validates (3-fold CV) each combination.

Returns the best combination based on performance.

Time-consuming for large parameter sets but very thorough.

This is a dictionary of hyperparameters we want to explore for the model.

Each key is a hyperparameter of RandomForestClassifier.

Each value is a list of possible values to test.

GridSearchCV is a scikit-learn class for exhaustive search over hyperparameter combinations.

RandomForestClassifier(random_state=42) → The model we want to tune.

param_grid → The dictionary of hyperparameters to try.

cv=3 → Use 3-fold cross-validation:

The training data (X_train, y_train) is split into 3 parts.

Each model is trained on 2 parts and validated on the 3rd.

This is repeated 3 times (rotating the validation part).

The scores are averaged to reduce variance and avoid overfitting.

Trains the model using all 8 hyperparameter combinations.

For each combination:

Runs 3-fold CV on X_train

Measures performance (e.g., accuracy by default)

Chooses the best combination based on CV score.

.best_params_ gives you the hyperparameter values that performed best during CV.

Example Output:
Grid Search Best Params: {'criterion': 'entropy', 'max_depth': 3, 'n_estimators': 100}

Makes predictions on the test set using the best model found and Calculates accuracy.


In [None]:
param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [3, None],
    'criterion': ['gini', 'entropy']
}

grid_search = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=3)
grid_search.fit(X_train, y_train)
print("Grid Search Best Params:", grid_search.best_params_)
acc_gridsearch= accuracy_score(y_test, grid_search.predict(X_test))
results['Grid Search'] = acc_gridsearch
print("Grid Search Accuracy:", acc_gridsearch)


Grid Search Best Params: {'criterion': 'gini', 'max_depth': None, 'n_estimators': 50}
Grid Search Accuracy: 1.0


6. Halving Grid Search

HalvingGridSearchCV is a smarter and more efficient version of GridSearchCV. Instead of testing all combinations of parameters exhaustively, it:

Starts with many candidates trained on fewer resources (e.g., samples).

In each iteration, it eliminates the worst-performing combinations and allocates more resources to the better ones.

This "successive halving" makes it much faster and scalable, especially with large datasets or expensive models.

param_grid → A dictionary of hyperparameter combinations to search through:

cv=3 → Uses 3-fold cross-validation:

Splits the training data into 3 parts.

Trains on 2 parts, validates on the remaining one, and rotates.

Starts training the model using the successive halving algorithm:

Initially trains all combinations on a small subset of data.

Selects the best-performing half.

In the next round, only trains those selected models on more data.

Repeats until the best model is found.

It’s significantly more resource-efficient than traditional grid search, especially for large parameter grids or datasets.

After halving and validation, it gives you the best combination of hyperparameters that performed well across all rounds.

Predicts the labels for the test dataset using the best model found.

Calculates accuracy by comparing predicted labels with actual y_test.

In [None]:
halving_search = HalvingGridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=3)
halving_search.fit(X_train, y_train)
print("Halving Grid Best Params:", halving_search.best_params_)
acc_halvinggrid = accuracy_score(y_test, halving_search.predict(X_test))
results['Halving Grid Search'] = acc_halvinggrid
print("Halving Grid Search Accuracy:", acc_halvinggrid)


Halving Grid Best Params: {'criterion': 'entropy', 'max_depth': 3, 'n_estimators': 100}
Halving Grid Search Accuracy: 1.0


7. Randomized Search


RandomizedSearchCV is a technique for hyperparameter optimization. Instead of trying every possible combination like in GridSearchCV, it:

Randomly samples a fixed number of parameter combinations.

This makes it faster and more efficient—especially when the parameter space is large.

Imports the randint function from scipy.stats, which is used to specify a range of integer values for random sampling.

Unlike lists in GridSearchCV, here we define distributions from which values will be randomly picked.

aram_dist is a dictionary of parameter distributions.

Instead of providing fixed values, we're giving ranges:

n_estimators	Integers from 10 to 199
max_depth	Integers from 1 to 9
Each combination in the search will randomly pick a value from each of these ranges.

RandomForestClassifier(random_state=42): The model you want to tune.

param_distributions=param_dist: The distributions you want to sample from.

n_iter=10: Run the model for 10 different random combinations.

cv=3: Perform 3-fold cross-validation for each random combination.

The search begins:

Randomly selects 10 different combinations from the param_dist.

For each one, performs 3-fold cross-validation.

Computes the average performance metric (accuracy by default).

Finds and stores the best performing model and parameters.

Displays the best parameter combination found from the 10 random tries.

Uses the best model found to:

Predict labels on the unseen test set.

Calculate the accuracy by comparing predictions to y_test.

In [None]:
from scipy.stats import randint

param_dist = {
    'n_estimators': randint(10, 200),
    'max_depth': randint(1, 10)
}

random_search = RandomizedSearchCV(RandomForestClassifier(random_state=42), param_distributions=param_dist, n_iter=10, cv=3)
random_search.fit(X_train, y_train)
print("Randomized Search Best Params:", random_search.best_params_)
acc_randomizedsearch = accuracy_score(y_test, random_search.predict(X_test))
results['Randomized Search'] = acc_randomizedsearch
print("Randomized Search Accuracy:", acc_randomizedsearch)



Randomized Search Best Params: {'max_depth': 6, 'n_estimators': 104}
Randomized Search Accuracy: 1.0


8. Bayesian Optimization (with BayesSearchCV)

Bayesian Optimization builds a probabilistic model of the objective function and uses it to select the most promising hyperparameters to evaluate next, aiming to minimize the number of iterations while still finding the best results.

It balances:

Exploration (trying new combinations) and

Exploitation (focusing on promising areas).

Imports BayesSearchCV, the scikit-optimize tool that wraps around scikit-learn estimators to perform Bayesian hyperparameter optimization.

You're specifying the range for each hyperparameter to search:

'n_estimators': Number of trees in the forest → from 10 to 200

'max_depth': Maximum depth of each tree → from 1 to 10

These ranges are continuous integer intervals and BayesSearchCV will intelligently sample values from them.

RandomForestClassifier(random_state=42)	The model to optimize
search_spaces=search_space	The parameter ranges defined above
n_iter=20	Perform 20 iterations (try 20 different parameter combinations)
cv=3	Use 3-fold cross-validation for each combination
During these 20 iterations, the algorithm uses a probabilistic model (like Gaussian Processes) to predict which parameter combination will likely perform best and then evaluates it.

The fit() method starts the optimization process:

Selects initial hyperparameters (randomly or via prior)

Builds a surrogate model (a probability model of the objective function)

Selects next points based on expected improvement

Updates the model iteratively

Displays the best parameter combination discovered during the search.

Uses the best model found to:

Predict outcomes on the test set

Measure and print the accuracy

In [None]:
from skopt import BayesSearchCV

search_space = {
    'n_estimators': (10, 200),
    'max_depth': (1, 10)
}

opt = BayesSearchCV(RandomForestClassifier(random_state=42), search_spaces=search_space, n_iter=20, cv=3)
opt.fit(X_train, y_train)
print("Bayes Search Best Params:", opt.best_params_)
acc_bayesiansearch = accuracy_score(y_test, opt.predict(X_test))
results['Bayesian Optimization'] = acc_bayesiansearch
print("Bayesian Optimization Accuracy:", acc_bayesiansearch)


Bayes Search Best Params: OrderedDict([('max_depth', 5), ('n_estimators', 50)])
Bayesian Optimization Accuracy: 1.0


9. Comparative analysis of optimization techniques

In [None]:

print("\n Accuracy Comparison of Hyperparameter Optimization Techniques:\n")
df_results = pd.DataFrame(list(results.items()), columns=['Technique', 'Accuracy'])
print(df_results.sort_values(by='Accuracy', ascending=False).to_string(index=False))


 Accuracy Comparison of Hyperparameter Optimization Techniques:

            Technique  Accuracy
        Manual Search       1.0
        Random Search       1.0
          Grid Search       1.0
  Halving Grid Search       1.0
    Randomized Search       1.0
Bayesian Optimization       1.0
