# Hyperparameter Tuning

Hyperparamter tuning is the process of finding the best combination of hyperparameters for a given model.

## Types

1. Grid Search
   - Exhaustive search over all possible combinations of hyperparamters. 
2. Random Search
   - Randomly sample combinations of hyperparatmers from a given distribution. 
3. Bayesian Optimization
   - Model the objective function and search for the maximum. 
4. Gradient-based Optimization
   - Use gradient descent to find the minimum of the objectve function.  
5. Genetic Algorithm 

# Cross Validation

Cross validation is a technique used to evaluate the performance of a model on unseen data. It is used to check how well the model generalizes to new data.

In [6]:
# import libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
# load the data
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

In [3]:
# define the model
model = RandomForestClassifier()

# create the paramter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [4, 5, 6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy']
}

# setup the grid
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
)

# fit the model
grid.fit(X, y)

# print the best parameters
print(f"Best Paramters: {grid.best_estimator_}")

Fitting 5 folds for each of 252 candidates, totalling 1260 fits


420 fits failed out of a total of 1260.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
171 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1467, in wrapper
    estimator._validate_params()
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", line 

Best Paramters: RandomForestClassifier(max_depth=4, n_estimators=300)


In [10]:
# define the model
model = RandomForestClassifier()

# create the paramter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    # 'max_depth': [4, 5, 6, 7, 8, 9, 10],
    # 'criterion': ['gini', 'entropy']
}

# setup the grid
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
)

# fit the model
grid.fit(X, y)

# print the best parameters
print(f"Best Paramters: {grid.best_estimator_}")

Fitting 5 folds for each of 10 candidates, totalling 50 fits


15 fits failed out of a total of 50.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1467, in wrapper
    estimator._validate_params()
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\khan\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", line 95, 

Best Paramters: RandomForestClassifier(n_estimators=400)
