<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 12 - Day 1</h1> </center>

<center> <h2> Part 1: Grid Search</h2></center>

## Outline
1. <a href='#1'>Grid Search for Hyperparameter Tuning</a>
2. <a href='#2'>Multiple Parameters in Grid Search</a>

<a id="1"></a>

## 1. Grid Search for Hyperparameter Tuning
* Tricky to find the values of the important parameters of a model that provide the best generalization performance
* Grid search makes it possible to try all possible combinations of the parameters of interest
* Uses kfold cross-validations behind the scenes to evaluate the performance of th model using each combination of parameters

In [1]:
from sklearn.model_selection import GridSearchCV

In [2]:
import pandas as pd

from sklearn.datasets import fetch_california_housing
california = fetch_california_housing()  # Bunch object

df = pd.DataFrame(california.data, columns=california.feature_names)
df["Value"] = california.target

features = df.drop("Value", axis=1)
target = df["Value"]

In [3]:
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

#split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=3000)


model = SVR(gamma="scale", C=1).fit(X=X_train, y=y_train)


print("R-squared value for training set: ", r2_score(y_train, model.predict(X_train)))
print("R-squared value for testing set: ", r2_score(y_test, model.predict(X_test)))

R-squared value for training set:  -0.0234649714509787
R-squared value for testing set:  -0.03447975020106897


* To use the GridSearchCV class, we need to define a dictionary that stores the parameters we want to search over
    * Keys: the parameters we want to adjust
    * Values: the parameter settings we want to try out

In [4]:
param_grid = {"C":[0.001, 0.01, 0.1, 1, 10, 100]}

In [6]:
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(SVR(gamma="scale"), param_grid, cv=5)

#split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=3000)

#fit the grid search object on the training data (CV will be performed on this)
grid_search.fit(X=X_train, y=y_train)

#the performance of the best found parameters on the test set
print("Test set score: ", grid_search.score(X_test, y_test))

#result of grid search
print("Best parameters: ", r2_score(y_train, model.mean_squared_error(X_train)))
print("Best cross-validation score: ", r2_score(y_test, model.mean_squared_error(X_test)))

KeyboardInterrupt: 

## 2. Multiple Parameters in Grid Search
* Can add multiple parameters to the parameter grid (dictionary)

In [None]:
import pandas as pd
from sklearn.datasets import load_digits

#load the digits dataset
digits = load_digits()

df = pd.DataFrame(digits.data)
df["target"] = digits.target

features = df.drop("target", axis = 1)
target = df["target"]

In [None]:
param_grid = {"max_depth":[1, 10, 100], "min_samples_split": [2, 10, 100]}

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split


from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)

#split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=3000)

#fit the grid search object on the training data (CV will be performed on this)
grid_search.fit(X=X_train, y=y_train)


#the performance of the best found parameters on the test set
print("Test set score: ", grid_search.score(X_test, y_test))


#result of grid search
print("Best parameters: ", grid_search.best_params_)

#this is the best performance during training
print("Best cross-validation score: ", grid_search.best_score_)