# hyperparameter tunning
Hyperparameter tuning is the process of finding the bset combination of parameters to optimize the model.

## example
- grid search: finding the best combination of parameter
- random search: randomly sample combination of parameter
- bayesian optimization: model the objective function and find the best combination
- gradient boosting optimization: use gradient descent to find the minimum of the objective function



## Cross Validation
cross validation is a tecnique used to evaluate the performance of a model on unseen data. It is used to check how well a model generalizes to new, unseen data.

In [1]:
# import libraries
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [3]:
# load the dataset
from sklearn.datasets import load_iris
iris = load_iris()
X= iris.data
y = iris.target
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [9]:
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200,300,400,500],
    'max_depth': [4,5,6,7,8,9,10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
    
}
### param_grid = was reduced to fit the model perfectly

# set up the grid
grid= GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy', verbose=1, n_jobs=-1)




# fit the model

grid.fit(X, y)


# print the best parameters
print(grid.best_params_)


Fitting 5 folds for each of 168 candidates, totalling 840 fits
{'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}


# Random Search cv

In [10]:
# load the dataset
from sklearn.datasets import load_iris
iris = load_iris()
X= iris.data
y = iris.target
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [14]:
from sklearn.model_selection import RandomizedSearchCV
# define the model
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200,300,400,500],
    'max_depth': [4,5,6,7,8,9,10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
    
}
### param_grid = was reduced to fit the model perfectly

# set up the grid
grid= RandomizedSearchCV(estimator=model, param_distributions=param_grid, cv=5, scoring='accuracy', verbose=1, n_jobs=-1, n_iter=20)




# fit the model

grid.fit(X, y)


# print the best parameters
print(grid.best_params_)


Fitting 5 folds for each of 20 candidates, totalling 100 fits
{'n_estimators': 200, 'max_depth': 7, 'criterion': 'entropy', 'bootstrap': True}
