# Principles Of Model Selection

## Bias-Variance Tradeoff

On one hand, simplicity is generalisable and robust, and on the other hand, some problems are inherently complex in nature. There is a trade-off between the two, which is known as the bias-variance tradeoff in machine learning. 

![117.png](attachment:4aa3435a-3df5-4b73-8049-f31adba1519f.png)


**Bias and Variance**

We considered the example of a model memorising the entire training data set. If you change the data set slightly, this model will also need to change drastically. The model is, therefore,  **unstable and sensitive to changes in training data**, and this is called **high variance**.



The ‘variance’ of a model is the **variance in its output** on some test data with respect to the changes in the training data. In other words, variance here refers to the **degree of changes in the model itself** with **respect to changes in the training data**.



**Bias** quantifies how **accurate the model is** likely to be on future (test) data. Extremely simple models are likely to fail in predicting complex real-world phenomena. Simplicity has its own disadvantages.



Imagine solving digital image processing problems using simple linear regression when much more complex models such as neural networks are typically successful for such problems. We say that a linear model has a high bias because it is quite simple to be able to learn the complexity involved in the task.



Ideally, we want to reduce both bias and variance because the expected total error of a model is the sum of the errors in bias and variance, as shown in the figure given below.

![](https://d35ev2v1xsdze0.cloudfront.net/f8ea2788-76ac-4870-bbba-88d09735169c-spxpefsr.png)

In practice, however, we often cannot have a model with a low bias and a low variance. As the model complexity increases, the bias reduces, whereas the variance increases and, hence, the trade-off.



Recall that in the competitive exam analogy, the first person learns using a much more complex mental model than the second one.

![119.png](attachment:0602c5bc-7d10-471b-8c59-e6e2d6cf3d69.png)


## Regularization

Having established that we need to find the correct balance between model bias and variance or between simplicity and complexity, we need tools that can reduce or increase the complexity. 

**Regularization is the process of deliberately simplifying models to achieve the correct balance between keeping the model simple and not too naive**. Recall that a few objective ways of measuring simplicity are as follows: choice of simpler functions, fewer model parameters and usage of lower degree polynomials

![120.png](attachment:b67e8c22-2e45-48ed-8081-a442844cc683.png)

![121.png](attachment:8206be10-2f36-45f1-960d-a3a73ac88e9c.png)



# Summary

<div class="MuiBox-root css-j7qwjs"><div class="MuiBox-root css-lrle2m-container"><div class="text_component"><p style="text-align: justify;"><strong><span style="font-size: 14px;">Summary</span></strong><span><br><br>In this session, you learnt about the most fundamental principles of machine learning that you should now be able to apply while building models. The most important points to re-iterate are as follows:</span></p><p style="text-align: justify;"><span><br></span></p><p style="text-align: justify;"><span><strong>Occam's Razor</strong></span></p><ul><li style="margin-left: 20px; text-align: justify;"><span>A model should be as simple as necessary but not simpler than that.</span></li><li style="margin-left: 20px; text-align: justify;"><span>When in doubt, choose a simpler model.</span></li><li style="margin-left: 20px; text-align: justify;"><span>Advantages of simplicity are generalisability, robustness, requirement of a few assumptions and less data required for learning</span></li></ul><p style="text-align: justify;"><span><strong>Bias-Variance Tradeoff</strong></span></p><ul><li style="margin-left: 20px; text-align: justify;"><span>Bias measures how accurately a model can describe the actual task at hand.</span></li><li style="margin-left: 20px; text-align: justify;"><span>Variance measures how flexible the model is with respect to changes in the training data.</span></li><li style="margin-left: 20px; text-align: justify;"><span>As complexity increases, bias reduces and variance increases, and we aim to find the optimal point where the total model error is the least.</span></li></ul><p style="text-align: justify;"><span><strong>Overfitting</strong></span></p><ul><li style="margin-left: 20px; text-align: justify;"><span>A model memorises the data rather than intelligently learning the underlying trends in it.</span></li><li style="margin-left: 20px; text-align: justify;"><span>This is because it is possible to memorise data, and this is a problem because the real test happens on unseen, real-world data.&nbsp;</span></li></ul><p style="text-align: justify;"><span></span><br><br><span style="font-size: 14px;"><strong>Note</strong>: The lecture notes for both sessions of <strong>Model Selection</strong> are provided in the last session of this module.&nbsp;</span></p></div></div><div class="MuiBox-root css-0"></div></div>

# Model Evaluation

In this session, we will discuss two evaluation strategies corresponding to the cases wherein we have abundant and limited (or little) training data. 


This session will cover the following:

- Meaning and use of hyperparameters
- Meaning and use of validation data
- Cross-validation
- Hold-Out Strategy

## Regularization and Hyperparameters


**Regularization** discourages the model from becoming highly complex even if it explains the (training) observations better. In the previous session, you were introduced to this term that is used to find the optimal point between extreme complexity and simplicity. In this context, we will discuss the use of the hyperparameters of a model.



**Note:** Regularization will be covered in depth in the optional module on Advanced Regression. This session intends to introduce regularization as a concept.



**Hyperparameters** are parameters that we pass on to the learning algorithm to control the complexity of the final model. They are choices that the algorithm designer makes to ‘tune’ the behaviour of the learning algorithm. Therefore, the **choice of hyperparameters has a lot of bearing on the final model** produced by the learning algorithm.



Hyperparameters are part of most learning algorithms that are used for training and regularization. In linear regression, hyperparameters are used to regularize models so that they do not become more complex than they should be. In the next video, you will learn about this in detail.


- Hyperparameters are used to 'fine-tune' or regularize the model to keep it optimally complex.
- The learning algorithm is given the hyperparameters as the input, and it returns the model parameters as the output.
- Hyperparameters are not part of the final model output.


![122.png](attachment:6a29d6d4-dc51-4b2b-bbce-b8455690c932.png)




## Model Evaluation and Cross Validation

We will now shift our attention to model evaluation. The key point to remember here is that a model should never be evaluated on data that it has already seen before. With that in mind, you will have either one of the following two cases: 

1. the training data is abundant
2. the training data is limited.

The first case is straightforward because you can use as many observations as per your preference to train and test the model. In the second case, however, you will need to find some ‘hack’ so that the model can be evaluated on unseen data and, simultaneously, does not eat up the data available for training. This hack is called **cross-validation.**

![118.png](attachment:75a3b24d-9d25-43d9-b40d-166446988d61.png)

# MCQ

#### Q1. Suppose Rohit builds two linear regression models to solve the car pricing problem. Model 1 has three features, and model 2 has 11 features. Which of these two models is likely to undergo a larger change when a new training data set is used?

 - [ ] Model 1

 - [ ] Model 2


**Comprehension - Bias Variance Tradeoff**

An artificially generated data set was used to generate data of the form (x, 2x + 55 + e), where e is a normally distributed noise with a mean of zero and variance of 1. The following three regression models have been created to fit the data: linear, a degree-15 polynomial and a higher degree polynomial that passes through all the training points.

![](https://d35ev2v1xsdze0.cloudfront.net/f7837974-8663-43dd-8f3b-fc70bec4f2c5-9nd2n5qc.png)



#### Q2. Which of the following is the correct order of bias in the three models?

 - [ ]  Straight line > Degree-15 > Polynomial
 - [ ] Straight line > Polynomial > Degree-15
 - [ ] Polynomial > Degree-15 > Straight line
 - [ ] Polynomial > Straight line > Degree-15
ans:
The bias is high when the model is highly simple.

#### Q3. Why is the variance in the higher degree polynomial said to be higher than the other two models?

 - [ ] The variance in the y-values of the polynomial is clearly higher, as shown in the figure.

 - [ ] The model will change drastically from its current state when plotted on unseen test data.

 - [ ] The model will change drastically from its current state if the current training data is altered

 - [ ] The y-values of the model will change drastically from the current y-values when tested on unseen data

#### Q4. When is regularization typically performed?

- [ ] While the model is being tested on unseen data

- [ ] While the learning algorithm uses the training data to produce a model

- [ ] While the learning algorithm uses the test data to produce a model

- [ ] While the model uses the training data to produce a model

#### Q5. Why is it often not possible to use the validation set approach?

- [ ] The hyperparameters are often not needed to train a model.

- [ ] The data available for training and testing is limited.

- [ ] The data available for training and testing is unlimited.

- [ ] Cross validation is almost always a better alternative.

# Model Evaluation: Python Demonstration - I - Part 1

In the previous few segments, you learnt about regularization and hyperparameters. You also learnt about cross-validation. In the next few segments, you will learn how to carry out hyperparameter tuning and cross-validation in Python.



For demonstration, you will use **cross-validation with linear regression**. Then, you will **tune the hyperparameter** for the linear regression model. The hyperparameter for the linear regression model is the number of features being used for training.



For this demonstration, you will use the housing.csv file that was used in the module on Linear Regression.

Please find the car_price datasets [here](https://ml-course2-upgrad.s3.amazonaws.com/Model+Selection/Model+Evaluation/CarPrice_Assignment.csv), housing dataset [here](https://ml-course2-upgrad.s3.amazonaws.com/Model+Selection/Model+Evaluation/Housing.csv) and the code file [here](https://github.com/ContentUpgrad/Model-Selection/blob/main/Model%20Evaluation/Cross-Validation%20-%20Linear%20Regression-checkpoint%20(1).ipynb)