<span style='color:Black'>
<div class="alert alert-info">
    <h1 align="center"> Grid Search </h1> 
    <h3 align="center" style='color:black'> International Graduate School of Artificial Intelligence - YunTech </h3>
    <h5 align="center">  </h5>
</div>
</span>

#### <span style='color:Black'> **Introduction** </span>

- A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot be estimated from data.

- For example, k in k-Nearest Neighbors, the number of hidden layers in Neural Networks.

<span style='color:Red'> In contrast, a parameter is an internal characteristic of the model and its value can be estimated from data. Example, beta coefficients of linear/logistic regression. </span>

### <span style='color:green'> Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’ predictions.</span>

### Grid search in sklearn

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

In [3]:
from sklearn import svm, datasets
import pandas as pd
import numpy  as np
from sklearn.model_selection import GridSearchCV

#ML
from sklearn.neighbors import KNeighborsClassifier

#import libraries for model validation
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import LeavePOut
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import cross_val_score

In [4]:
# load the training data from iris.txt
from sklearn import datasets
df_training = datasets.load_iris()

In [5]:
iris = df_training
iris = datasets.load_iris()

In [6]:
iris.data[:3]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2]])

In [7]:
iris.target[:3]

array([0, 0, 0])

In [8]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [9]:
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [10]:
x = iris.data
y = iris.target

#### KNN in Sklearn

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

In [11]:
#instantiate the KNN classifer
# {'metric': 'minkowksi', 'n_neighbors': 5, 'p':2, 'weights': 'uniform'}
clf = KNeighborsClassifier()

In [12]:
#Get the KNN parameters
clf.get_params()

{'algorithm': 'auto',
 'leaf_size': 30,
 'metric': 'minkowski',
 'metric_params': None,
 'n_jobs': None,
 'n_neighbors': 5,
 'p': 2,
 'weights': 'uniform'}

In [13]:
#Get KNN parameter keys
clf.get_params().keys()
#Returns a dictionary and this will help you set range values.

dict_keys(['algorithm', 'leaf_size', 'metric', 'metric_params', 'n_jobs', 'n_neighbors', 'p', 'weights'])

In [19]:
n_neighbors = [3,5,7,9,11,13,15,19,23,29]
algos = ['ball_tree', 'kd_tree']
dist_metric = ['minkowski']
p_root = [1,2,3,4,5]
weights = ['uniform', 'distance']
leaf_size = [15,30,40,50,60,70]

In [20]:
#Define the parameters dict
parameters = dict(
                n_neighbors = n_neighbors,
                metric = dist_metric,
                p = p_root,
                weights = weights,
                #Leaf_size = leaf_size
              )

print(parameters)

{'n_neighbors': [3, 5, 7, 9, 11, 13, 15, 19, 23, 29], 'metric': ['minkowski'], 'p': [1, 2, 3, 4, 5], 'weights': ['uniform', 'distance']}


In [21]:
# define splits 
n_splits = 5

kf = KFold(n_splits = n_splits, shuffle=True, random_state=100)

In [22]:
#Instantiate the grid search CV - a for loop
grid = GridSearchCV(estimator = clf,
                    param_grid = parameters,
                    scoring = 'accuracy', 
                    cv=kf,
                    verbose=2)


In [23]:
#fit the data to the grid object 
grid.fit(x,y)

Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=uniform; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=uniform; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=uniform; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=uniform; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=uniform; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=distance; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=distance; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=distance; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=distance; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=1, weights=distance; total time=   0.0s
[CV] END metric=minkowski, n_neighbors=3, p=2, weights=uniform; total time=   0.0s
[CV] END metric=min

In [24]:
#What are the best parameters that you got?
print('Estimator: \n',   grid.best_estimator_)
print('Best params : \n', grid.best_params_)
print(grid.classes_)
print(grid.best_score_)

Estimator: 
 KNeighborsClassifier(n_neighbors=13, p=3)
Best params : 
 {'metric': 'minkowski', 'n_neighbors': 13, 'p': 3, 'weights': 'uniform'}
[0 1 2]
0.9866666666666667


<div style="font-size: 1em; margin: 1em 1em 1em 1em; border: 1px solid #86989B; background-color: #8fffff;padding: 1em 1em 1em 1em; ">
<div align="center">
<img src='imgs/icon5.png'  width='10%'>

<h2 style="text-align: center;color: Darkgreen"> Explainable AI</h2>


</div>