# I. Introduction to Scikit-learn

### Step 1. Import the necessary libraries

In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

### Step 2. Import one of the toy datasets (digits) from Scikitlearn
This is a copy of the test set of the UCI ML hand-written digits datasets https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where each class refers to a digit.

Each datapoint is a 8x8 image of a digit, Classes=10, Samples per class~180, 

Samples total=1797, Dimensionality=64, and Features=(integers 0-16)


After importing this dataset, split it into test and train sets. You may check the shape of data and the target attributes of the dataset. You may also want to print a few samples from the dataset.

# II. Sklearn API for model training
-------------------
### Step 1. Import your model class

As an example, let us `LinearSVC`, a linear support vector classifier. This classifier is imprted from `sklearn.svm` module which includes Support Vector Machine algorithms.

### Step2. Instantiate an object and set the parameters

### Step 3. Fit the model
When fitting the model, use the train dataset.

### Step 4. Predict and Evaluate
Use the test set for this purpose, for now.

### Step5. Try another Algorithm
Try `RandomForestCLassifier` this time, import it from `sklearn.ensemble` module.

# II. Cross-validation

Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set X_test, y_test. 

When evaluating different settings (“hyperparameters”) for estimators, there is still a risk of overfitting on the test set because the parameters can be tweaked until the estimator performs optimally. This way, knowledge about the test set can “leak” into the model and evaluation metrics no longer report on generalization performance. To solve this problem, yet another part of the dataset can be held out as a so-called “validation set”: training proceeds on the training set, after which evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation can be done on the test set.

However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.

A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:

A model is trained using  of the folds as training data; the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).

The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as is the case when fixing an arbitrary validation set), which is a major advantage in problems such as inverse inference where the number of samples is very small.
(Check https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation)

Grid Searches
=================
Exhaustive search over specified parameter values for an estimator.

Important members are fit, predict.

GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

See (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

Grid-Search with build-in cross validation

A GridSearchCV object behaves just like a normal classifier.