# Parameter tuning

<a href="https://colab.research.google.com/github/thomasjpfan/ml-workshop-intermediate-1-of-2/blob/master/notebooks/02-parameter-tuning.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

In [None]:
# Install dependencies for google colab
import sys
if 'google.colab' in sys.modules:
    %pip install -r https://raw.githubusercontent.com/thomasjpfan/ml-workshop-intermediate-1-of-2/master/requirements.txt

In [None]:
import sklearn
assert sklearn.__version__.startswith("1.0"), "Plese install scikit-learn 1.0"

In [None]:
import seaborn as sns
sns.set_theme(context="notebook", font_scale=1.4,
              rc={"figure.figsize": [10, 6]})

First let's load the iris dataset

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X, y = digits.data, digits.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42, stratify=y
)

In [None]:
X[0]

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, axes = plt.subplots(4, 4)
for i, ax in zip(range(16), axes.ravel()):
    ax.imshow(X[i].reshape(8, 8), cmap="gray_r")
    ax.set(xticks=(), yticks=(), title=y[i])
plt.tight_layout()

Create a classifier to parameter search

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import numpy as np

In [None]:
params = {
    'max_depth': [2, 4, 8, 12, 16],
    'max_features': [4, 8, 16, 32]
}

In [None]:
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42), param_grid=params,
    verbose=1,
    n_jobs=8)  # Update to the number of physical cpu cores

In [None]:
grid_search.fit(X_train, y_train)

In [None]:
grid_search.best_score_

In [None]:
grid_search.best_params_

In [None]:
grid_search.score(X_test, y_test)

In [None]:
import pandas as pd
cv_df = pd.DataFrame(grid_search.cv_results_)

In [None]:
res = (cv_df.pivot(index='param_max_depth', columns='param_max_features', values='mean_test_score')
            .rename_axis(index='max_depth', columns='max_features'))

In [None]:
_ = sns.heatmap(res, cmap='viridis')

## Exercise 1

1. Use a `RandomSearchCV` with the following parameter distrubution:

```python
from scipy.stats import randint

param_dist = {
    "max_features": randint(1, 11),
    "min_samples_split": randint(2, 11)
}
```

Set `random_state=0` to have reproducable results and `verbose=1` to show the progress.

2. What were the best hyper-parameters found by the random search?
3. Evaluate the model on the test set. 
4. **Extra**: Use grid search `SVC` from `sklearn.svm` and search through the kernel options: `linear`, `poly`, `rbf`, `sigmoid`. Which kernel option performed the best? Evaluate on the test set. Does this model perform better than the random forest?

**If you are running locally**, you can uncomment the following cell to load the solution into the cell. On **Google Colab**, [see solution here](https://github.com/thomasjpfan/ml-workshop-intermediate-1-of-2/blob/master/notebooks/solutions/02-ex01-solutions.py). 

In [None]:
# %load solutions/02-ex01-solutions.py