# Part I: Diagnosing Bias and Variance

**Bias and Variance**
![image.png](attachment:image.png)
- When we are underfitting, the cost will be high for the training set and the cross validation set
- When we are overfitting, the training cost will be low, but its actual application will be minimal
- If we choose the right algorithm, both training and cross validating cost will be low

**Understanding Bias and Variance**
![image.png](attachment:image.png)
- The higher the power of the polynomial, the lower will the training cost be, and vice versa
- However, for the cross validating set, both low and high polynomial will result in high cost
- High bias (underfit) -> $J_{train}$ will be high ($J_{train} = J_{cv}$) 
- High variance (overfit) -> $J_{cv} >> J_{train}$ ($J_{train}$ may be low)
- High bias and high variance -> $J_{train}$ will be high, and $J_{cv} >> J_{train}$

# Part II: Regularization and Bias/Variance

**Linear Regression with Regularization**
![image.png](attachment:image.png)

- When we choose a small $\lambda$, the regularization will be nothing, thus we are likely to overfit
- Conversely, choosing a large $\lambda$ will decrease the size of the weights, causing a high bias (underfit)

**Choosing the Regularization Parameter**
![image.png](attachment:image.png)

- Start out with low values of $\lambda$, and try to double the size each time
- Record the parameters w's and the cross validation cost each time
- Choose the lowest cost (and report the test error)

**Understanding Bias/Variance and Lambda**
![image.png](attachment:image.png)

- At small $\lambda$, there will be no regularization, thus the cost for training will be low, and cross validating will be high
- At high $\lambda$, the weights will be close to 0, causing $f_{\vec{w}, b} = b$ and underfits
- Note that the graph here is opposite to cost vs. polynomials, where now high variance is in the left and high bias is in the right

# Part III: Establishing a Baseline Level of Performance

**What is the Reasonable Level of Error You Can Hope to Get to?**
- Human level performance
- Competing algorithms performance
- Guess based on experience

**Bias/Variance Example**
![image.png](attachment:image.png)

- If the difference between the cross validation set and the training set is large, while the training set is close to our baseline, then it is high variance
- If there is not much difference between training and validating set, but the result deviates from baseline performance, then it is a high bias problem

# Part IV: Learning Curves

**Learning Curves**
![image.png](attachment:image.png)

- For the training set, if there are few examples, we generally don't find a lot of errors either. As the number of examples goes up, we will have more error; however, this curves will flatten out as m goes to $\infty$  
- For the validating set, we tend to find large errors with fewer examples. This number will get averaged out as the sample size gets larger

**High Bias**
![image.png](attachment:image.png)

- In this scenario, both $J_{train}$ and $J_{cv}$ will be high, and both higher than our base line

**High Variance**
![image.png](attachment:image.png)

- In an overfitting scenario, $J_{train}$ measures similarly to our baseline, but the cost for the validating set deviates from both greatly

# Part V: Bias/Variance and Neural Networks

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)