# Bonus Homework Lecture 3: Bias-Variance Tradeoff

## Preliminaries

### Imports

In [1]:
import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import scipy.optimize
import sklearn.datasets
from sklearn.linear_model import LinearRegression


%matplotlib inline


### Data Directory

In [2]:
raw_data_dir="../../raw/C50"
data_dir="../../data/C50"

### Random Seed

In [3]:
seed=2506
np.random.seed(seed)

## Bias-Variance Trade-off

### Description of Problem

We will investigate the Bias-Variance decomposition in the learning problem of estimating the function
$$
    f(x) = \begin{cases}
                \max(0,1 - 1.7 x) \,\,\text{  for } x>0 \\
                \max(0,1 + 2 x)  \,\,\text{  for } x<0
            \end{cases}
$$
from data generated by the process
$$
    y = f(x) + \sigma(x) \,\epsilon
$$
where $\epsilon\sim \mathcal{N}(0,1)$ is Gaussian noise.
and
$$
    \sigma(x) = \sigma_0 \sqrt{ 1 - x^2}
$$
with  $\sigma_0=0.05$

In [4]:
sigma0=0.05

In [5]:
def f(x):
    slope=2-0.3*(x>0)
    return  np.maximum(0,1-slope*np.abs(x))

def sigma(x):
    return np.sqrt(1-x*x)*sigma0

### Visualize $f(x)$

<div class="alert alert-block alert-success"> Problem 0.1 </div>
Generate an array `X_test` of 101 uniformly space points in the range [-1,1]

<div class="alert alert-block alert-success"> Problem 0.2 </div>
plot the function $f(x)$ in the range [-1,1]


### Generate Sample data

<div class="alert alert-block alert-info"> Problem 1.0 </div>
using `numpy.random.uniform` generate `N`=50 random samples of $X$  in the range (-1,1)

<div class="alert alert-block alert-info"> Problem 1.1 </div>
using `numpy.random.normal` generate N=50 random samples of $Y$  acording to the process described in section 1.2

<div class="alert alert-block alert-info"> Problem 1.2 </div>
Plot the random sample of  `X` and `Y` that you generated and the true function on the same graph.

<div class="alert alert-block alert-info"> Problem 1.3 </div>
Write a python function `generate_sample_Y(N)` that, given an array of inputs X, returns an array `Y` of points generate acording to $f(x)+\sigma(x) \epsilon$

<div class="alert alert-block alert-info"> Problem 1.4 </div>
Write a python function `generate_sample(N)` that returns and array `X` of 'N' uniformly distributed ooubts on [-1,1]  and and array `Y` ob  `N` points as per the process above

### Hypothesis Space

Our hypothesis space $\mathcal{H}_K$ will be **symmetric** (even) polynomials $p(x)$ up to degree $K$.

<div class="alert alert-block alert-info"> Problem 2.1 </div>
Write a function `generate_basis(X,K)` that evalue a basis of $\mathcal{H}_K$  of even monomials at an array of given points `X` 

[HINT] You may want to refer to the writen homework for this week.

<div class="alert alert-block alert-info"> Problem 2.2 </div>
Plot all basis functions up to K=6

### Regression Solver

<div class="alert alert-block alert-info"> Problem 3.1 </div>

Fill in the bodies of the `fit` and `predict` functions in the `class` below so that they solve the learning problem for $\hat{h}$ the least square
approximation to $f(x)$ within our hypothesis space $\mathcal{H}$.

[HINT] 

1. You worked the math of this on assignment 2
2. The function `generate_basis`  can be useful
3. You can also look at how a similar problem is solved in the  [`BiasVarianceTradeOff`](.\BiasVarianceTradeOff.ipynb) notebook.

In [6]:
class RegressionSolver:
    def __init__(self,K):
        self.K=K
        self.base_model=LinearRegression(fit_intercept=False)
    def fit(self,X,Y):
        pass
    def predict(self,X):
        pass

<div class="alert alert-block alert-info"> Problem 3.2 </div>

1. fit the `X`, `Y` data you generated before to our model assuming K=10.
2. Plot the predicted values $\hat{h}(x)$ on the points defined by `X_test`

Compare to the true function $f(x)$.

<div class="alert alert-block alert-info"> Problem 3.3 </div>
Compute the *in sample* mean training square error

<div class="alert alert-block alert-info"> Problem 3.4 </div>
Generate 100 new data points and compute the *out of sample* mean test error

###  Bias-Variance Decomposition

<div class="alert alert-block alert-info"> Problem 4.1 </div>
Using still K=10, N=50

Reserve some space so that you can keep track of the average of $\hat{h}(x)$ and $(\hat{h}(x))^2$ at each point in `X_test`

Using R=1,000 repeats do as follows
1. Generate N random samples of X,Y according to our process specification.
2. Fit model to the data
3. Compute $\hat{h}$ on the uniform grid X_test
4. Accumulate results so that you can later  compute the average of $\hat{h}(x)$  and $(\hat{h}(x))^2$ over each point in `X_test`.
4. Use function `generate_Y`  to generate new  `Y_test` values at each  point in `X_test`.
5. Compute the square error $\left(Y_{test}-\hat{h}(x)\right)^2$ at each point in `X_test`.
6. Accumulate results so that you can compute the average square error averaged  over all the 'R` repeats of the procedure

Using the results of the `R` repeats above
1. Compute the average over the `R` repeats of $\hat{h}$ evaluated at each point in `X_test` 
1. Compute the average over the `R` repeats of $(\hat{h})^2$ evaluated at each pint in `X_test` 
2. Compute the mean square error of `Y_test` over the points `X_test` 

<div class="alert alert-block alert-info"> Problem 4.1 </div>
Plot the average of the prediction $\hat{h}$(x) at the points `X_test`. Compare to the actual function

<div class="alert alert-block alert-info"> Problem 4.2 </div>
Compute the bias at each point in `X_test`

<div class="alert alert-block alert-info"> Problem 4.3 </div>
Compute the variance  at each point in `X_test`

<div class="alert alert-block alert-info"> Problem 4.4 </div>
Show that `bias` + `variance` + irreductible error $\sigma^2$ approximately add up to the mean square prediction error for each point in `X_test`

<div class="alert alert-block alert-info"> Problem 4.5 </div>
In which regions does the bias contribution dominate the prediction error? In which regions does variance  dominate?
In which regions dominates irreducible error?

### Bias-Variance Trade-off

<div class="alert alert-block alert-info"> Problem 5.1 </div>
Using still N=50 training sample points, and, averaging over $R=1,000$ repeats find the value of $K$ that has the optimal bias-variance trade-off when averaged over all the points in  `X_test`.

[HINT] Do not need to consider K larger than 14, as variance becomes very large. 

<div class="alert alert-block alert-info"> Problem 5.3 </div>
In what values  of $K$ does bias dominates, in what region does variance dominate?

<div class="alert alert-block alert-info"> Problem 5.3 </div>
Can the bias be made arbitrarily close to zero by increasing $K$? Explain your reasoning

## Factor Modeling of the US Treasury Yield Curve

<div class="alert alert-block alert-warning"> Warning </div>
This may be a bit harder problem. Feel free to skip it


In [2]:
treasury_data_dir="../../data/finance"

<div class="alert alert-block alert-info"> Problem 6.0 </div>
Load into a panda's DataFrame the  H15 US Treasury data set for the period **1993-2002**
(The file should had been generated by running the `FinancialDataSets` notebook, and it was called `H15_old.csv`.

<div class="alert alert-block alert-info"> Problem 6.1 </div>
1. Separate the dataset into a training set for all dates before 1998-12-31 and a testing set will all dates afterwards


<div class="alert alert-block alert-info"> Problem 6.2 </div>
Compute the daily rate changes for the training and test datasets

## Multi-factor Gaussian model
The G2 Factor model can be generalized to K factors by defining
$$
    \Delta R(T)_t = \sum_{k=1}^K H_k(T) \Delta r_{t,k}
$$
where $\Delta r_{t,k}$ are the $K$ daily changes in interest  rate factors as of date $t$.

The factor loadings for each tenor $T$ are
$$
    H_k(T) = \frac{e^{-\lambda_k T} -1}{\lambda_k T}
$$

<div class="alert alert-block alert-info"> Problem 6.3 </div>
Using as a guide the class  `G2_FactorModel` and the function `errors_30Y_factors` from the `Treasury_CurveModel` notebook fit models with 2,  3  factors to predict the 30Y treasury rate change.

Fit both models to the training data set

<div class="alert alert-block alert-info"> Problem 6.4 </div>
Compute the training and test mean square error of the 2 and 3 factor models

<div class="alert alert-block alert-info"> Problem 6.5 </div>
Does the 2 or 3 factor models has lower training mean squared error? and for the testing data?

<div class="alert alert-block alert-info"> Problem 6.7 </div>
Does difference  betwen the 2 and 3 factor models look significant?