# Model Evaluation and Hyperparameter

Hyperparameter are "higher-level" free parameters.
- Depth (number of hidden layers)
- Width (number of hidden neurons in a hidden layer)
- Activation function (choice of nonlinearity in non-input nodes)
- Regularization parameter (way to trade off simplicity vs. fit to the data)

Recall

A predictor obtained by training the free parameters of the considered model using the available annotated data.


## How to Choose a Model

It is not to evaluate if a chosen model is good or not (performs well in training and in real practice)

Validation methods (validation set):
- Holdout validation
- Cross-validation
- Leave-one-out validation

## Holdout Validation
- Randomly choose 30% of data to form a validaton set
- Remaining data forms the training set
- Estimate the test performance on the validation set against all candidate models
    - Regression: Compute the cost function (such MSE) on the validation set (instead of the training set)
    - Classification: Compute the 0 - error metric:
    $ \frac{number of wrong predictions}{number of predictions} = 1 - Accuracy $
- Choose the model with the lowest validation error (such as lowest MSE)
- Re-train with chosen model on joined training and validation to obtain predictor
- Estimate future performance of the obtained predictor on test set
- Ready to deploy with the predictor

### k-Fold Cross-Validation

<pre>
Full Dataset
│
├──> Split once --> Train+Validation Set (80%)       Test Set (20%)
                     │
                     └──> K-Fold Cross-Validation (e.g., k=5)
                          ┌────┬────┬────┬────┬────┐
                          │ F1 │ F2 │ F3 │ F4 │ F5 │   <- folds
                          └────┴────┴────┴────┴────┘
     Iteration 1: Train on F2–F5, Validate on F1
     Iteration 2: Train on F1, F3–F5, Validate on F2
     ...
     Iteration 5: Train on F1–F4, Validate on F5
</pre>

After CV: Pick best hyperparameters → retrain on all 80%
Evaluate once on the 20% Test Set → final model performance

<img src="images/k-fold.png" width="450" />

- Split the training set randomly into k (equal-sized) disjoint sets
- Use k - 1 of those together for training
- Use the remaining one for validation
- Permuate the k sets and repeat k times
- Average the performances on k validation sets
  - Take the mean of all k errors: $MSE_3fold = 2.05$
- Repeat for all candidate models
- Choose the model with the smallest average 3-fold cross validation error.
- Re-train with chose model on joined training and validation to obtain the predictor
- Estimate future performance of the obtained predictor on test set
- Deploy the predictor in real-world


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC

# Load data
X, y = load_iris(return_X_y=True)

# 1. Split into training+validation and test
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Define model and hyperparameter grid
model = SVC()
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf']
}

# 3. GridSearchCV performs k-fold CV (default k=5)
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_trainval, y_trainval)

# 4. Best hyperparameters
print("Best hyperparameters:", grid_search.best_params_)

# 5. Final test set evaluation
best_model = grid_search.best_estimator_
test_accuracy = best_model.score(X_test, y_test)
print("Test accuracy:", test_accuracy)

Best hyperparameters: {'C': 1, 'kernel': 'linear'}
Test accuracy: 1.0



### Leave-one-out Validation

- Leave a single example for validation, and train on all the rest of the annotated data
- For a total of N examples, repeat this N times, each time leaving out a single example
- Take the average of the validation errors as measured on the left-out points
- Same as N-fold cross-validation where N is the number of labelled points

<img src="images/validation-methods.png" width="500" />


