# Logistics Regression - Regularization

## The problem of overfitting
- What is overfitting?
- What is regularization and how does it help

## Overfitting with linear regression
- Using our house pricing example again
- Fit a linear function to the data - ** not a great model **
- This is **<font color="#E30000">underfitting</font>** - 
  also known as **<font color="#E30000">high bias</font>**
  - Bias - if we're fitting a straight line to the data we have a strong preconception 
    that there should be a linear fit
  - In this case, this is not correct, but a straight line can't help being straight!
  <img src="images/underfitting.png">
- Fit a quadratic function
  - Works well **<font color="#1C3387">("just right")</font>**
  - But - goes down when size of house get bigger (not shown in the drawing)
    - maybe should use **kubic** polinomial...
  <img src="images/just right.png">
- Fit a 4th order polynomial
  - Now curve fit's through all five examples
  - Seems to do a good job fitting the **training set**
  - But, despite fitting the data we've provided very well, this is actually not such a good model
    - It doesn't generalize well!
    - This is **<font color="#E30000">overfitting</font>** - 
      also known as **<font color="#E30000">high variance</font>**
  <img src="images/overfitting.png">      

## To recap
- if we have too many features then the learned hypothesis may give a cost function of exactly zero
  - But this tries too hard to fit the training set
  - Fails to provide a general solution - **<font color="#E30000">unable to generalize</font>** (apply to new examples)

## Overfitting with logistic regression
- Same thing can happen to logistic regression
  - Sigmoid function is an underfit
  - But a high order polynomial gives and overfitting (high variance hypothesis)
  <img src="images/overfitting with logistics regression.png">

## Addressing overfitting
- Later we'll look at identifying when overfitting and underfitting is occurring
- Plotting hypothesis is one way to decide (that it looks "too curvy"), but doesn't always work
- Often have lots of a features - harder to plot the data and visualize to decide 
  what features to keep and which to drop
- If you have lots of features and little data - overfitting can be a problem

### How do we deal with this?
- **Reduce number of features**
  - Manually select which features to keep
  - Use model selection algorithms (will not be covered here)
  - But, in reducing the number of features we lose some information
- **<font color="#1C3387" size="4em">Regularization</font>**
  - Keep all features, but **<font color="#1C3387" size="4em">reduce magnitude of parameters $θ$</font>**
  - Works well when we have a lot of features, each of which contributes a bit to predicting $y$

## Cost function optimization for regularization
- Penalize and make some of the $θ$ parameters really small
- e.g. here $θ_3$ and $θ_4$
  - modify our cost function to help penalize $θ_3$ and $θ_4$
<img src="images/regularization - 1.png">
- So here we end up with $θ_3$ and $θ_4$ being close to zero
  - So we're basically left with a quadratic function
<img src="images/regularization - 2.png">
- In this example, we penalized two of the parameter values
- More generally, regularization is as follows
<img src="images/regularization - 3.png">
  - By convention you don't penalize $θ_0$ - minimization is from $θ_1$ onwards
  - **<font color="#1C3387" size="3em">$λ$</font>** 
    is the **<font color="#1C3387" size="3em">regularization parameter</font>**
    - Controls a trade off between our two goals
      - Want to fit the training set well
      - Want to keep parameters small
  
- ### Small values for parameters corresponds to a simpler hypothesis (smoother curve)
  - You effectively get rid of some of the terms
  - A simpler hypothesis is less prone to overfitting

## Regularized linear regression
<img src="images/regularization - 4.png">