**Q. What are hyperparameters in machine learning, and how do you optimize them?**

**Ans.**

**Hyperparameters definition:**

Hyperparameters are settings that control the learning process of a machine learning model, but are not learned from the data itself.

They are set before training begins and influence the model's performance.

**Techniques to optimize hyperparameters:**

- **Grid Search:** Exhaustively searches a predefined grid of hyperparameter values.
- **Random Search:** Randomly samples hyperparameter values from a specified range.
- **Bayesian Optimization:** Uses a probabilistic model to intelligently select hyperparameter values.
- **Gradient-Based Optimization:** Treats hyperparameters as continuous variables and optimizes them using gradient descent.
- **Genetic Algorithms:** Mimics natural selection to evolve optimal hyperparameter configurations.

### Load the Dataset
We'll use the Breast Cancer dataset from `scikit-learn`.

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
# Loat the dataset
data = load_breast_cancer()
X = data.data
y = data.target

In [3]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Perform Grid Search
Search for the best hyperparameters using a predefined grid.

In [4]:
from sklearn.model_selection import GridSearchCV

# Define hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

In [5]:
# Grid search with cross-validation
grid_search = GridSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=5,
    scoring="accuracy",
    n_jobs=-1
)

In [6]:
grid_search.fit(X_train, y_train)

In [7]:
# Best hyperparameters and corresponding score
print("Best Hyperparameters (Grid Search):", grid_search.best_params_)
print("Best Score (Grid Search):", grid_search.best_score_)

Best Hyperparameters (Grid Search): {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 150}
Best Score (Grid Search): 0.9626373626373625


### Perform Random Search
Search for the best hyperparameters using a randomized approach.

In [8]:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

# Define hyperparameter distribution
param_dist = {
    'n_estimators': np.arange(50, 200, 10),
    'max_depth': [None] + list(np.arange(10, 50, 10)),
    'min_samples_split': np.arange(2, 11),
    'min_samples_leaf': np.arange(1, 5)
}

# Randomized search with cross-validation
random_search = RandomizedSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_distributions=param_dist,
    n_iter=50,
    cv=5,
    scoring="accuracy",
    n_jobs=-1,
    random_state=42
)

random_search.fit(X_train, y_train)

In [9]:
# Best hyperparameters and corresponding score
print("\nBest Hyperparameters (Random Search):", random_search.best_params_)
print("Best Score (Random Search):", random_search.best_score_)


Best Hyperparameters (Random Search): {'n_estimators': np.int64(150), 'min_samples_split': np.int64(3), 'min_samples_leaf': np.int64(1), 'max_depth': np.int64(10)}
Best Score (Random Search): 0.9648351648351647


### Evaluate the best model
Use the best model from the search and evaluate it on the test set.

In [10]:
# Train the best model from random search
best_model = random_search.best_estimator_
best_model.fit(X_train, y_train)

# Evaluate on test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest Accuracy of Best Model: {test_accuracy:.4f}")


Test Accuracy of Best Model: 0.9649
