<a href="https://colab.research.google.com/github/yexf308/AppliedStatistics/blob/main/Homework/HW3/565Sp23HW3Q3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%pylab inline 
import pandas as pd
from scipy import linalg
from itertools import combinations
import scipy
import scipy.io as io
import scipy.sparse as sparse

Populating the interactive namespace from numpy and matplotlib


# Q3: Polynomial Regression and Learning Curve
Recall that polynomial regression learns a function $h_{\mm{\theta}}(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \ldots + \theta_d x^d$.  In this case, $d$ represents the polynomial's degree.  We can equivalently write this in the form of a  linear model
\begin{equation}
h_{\mm{\theta}}(x) = \theta_0 \phi_0(x) + \theta_1 \phi_1(x)  + \theta_2 \phi_2(x)  + \ldots + \theta_d \phi_d(x)  \enspace ,
\end{equation}
using the basis expansion that $\phi_j(x) = x^j$.  Notice that, with this basis expansion, we obtain a linear model where the features are various powers of the single univariate $x$.  We're still solving a linear regression problem, but are fitting a polynomial function of the input.

## Q3.1: Implement regularized polynomial regression (20pt)
You may implement it however you like, using a closed-form solution.  We've included an example closed-form implementation of linear regression (you are welcome to build upon this implementation, but make CERTAIN you understand it, since you'll need to change several lines of it).  You are also welcome to build upon your implementation from the previous assignment, but you must follow the API below.  Note that all matrices are actually 2D numpy arrays in the implementation.

-  ```__init__(degree=1, regLambda=1E-8)``` : constructor with arguments of $d$ and $\lambda$.

- ```fit(X,Y)```: method to train the polynomial regression model

- ```predict(X)```: method to use the trained polynomial regression model for prediction.

-  ```polyfeatures(X, degree)```: expands the given $n \times 1$ matrix $X$ into an $n \times d$ matrix of polynomial features of degree $d$.  Note that the returned matrix will not include the zero-th power.


Note that the ```polyfeatures(X, degree)``` function maps the original univariate data into its higher order powers.  Specifically, $X$ will be an $n \times 1$ matrix $(X \in \mathbb{R}^{n \times 1})$ and this function will return the polynomial expansion of this data, a $n \times d$ matrix.  Note that this function will **not** add in the zero-th order feature (i.e., $x_0 = 1$).  You should add the $x_0$ feature separately, outside of this function, before training the model.

By not including the $x_0$ column in the matrix ```polyfeatures()```, this allows the ```polyfeatures``` function to be more general, so it could be applied to multi-variate data as well. (If it did add the $x_0$ feature, we'd end up with multiple columns of 1's for multivariate data.)

Also, notice that the resulting features will be badly scaled if we use them in raw form.  For example, with a polynomial of degree $d = 8$ and $x = 20$, the basis expansion yields $x^1 = 20$ while $x^8 = 2.56 \times 10^{10}$ -- an
absolutely huge difference in range.  Consequently, we will need to standardize the data before solving linear regression.  Standardize the data in ```fit()``` after you perform the polynomial feature expansion.  You'll need to apply the same standardization transformation in ```predict()``` before you apply it to new data.


In [None]:
"""
      Sample implementation of linear regression using direct computation of the solution

"""
#-----------------------------------------------------------------
#  Class LinearRegression - Closed Form Implementation
#-----------------------------------------------------------------

class LinearRegressionClosedForm:

    def __init__(self, reg_lambda=1E-8):
        """
        Constructor
        """
        self.regLambda = reg_lambda
        self.theta = None

    def fit(self, X, y):
        """
            Trains the model
            Arguments:
                X is a n-by-d array
                y is an n-by-1 array
            Returns:
                No return value
        """
        n = len(X)

        # add 1s column
        X_ = np.c_[np.ones([n, 1]), X]

        n, d = X_.shape
        d = d-1  # remove 1 for the extra column of ones we added to get the original num features

        # construct reg matrix
        reg_matrix = self.regLambda * np.eye(d + 1)
        reg_matrix[0, 0] = 0

        # analytical solution (X'X + regMatrix)^-1 X' y
        self.theta = np.linalg.pinv(X_.T.dot(X_) + reg_matrix).dot(X_.T).dot(y)

    def predict(self, X):
        """
        Use the trained model to predict values for each instance in X
        Arguments:
            X is a n-by-d numpy array
        Returns:
            an n-by-1 numpy array of the predictions
        """
        n = len(X)

        # add 1s column
        X_ = np.c_[np.ones([n, 1]), X]

        # predict
        return X_.dot(self.theta)



#-----------------------------------------------------------------
#  End of Class LinearRegression - Closed Form Implementation
#-----------------------------------------------------------------

In [None]:
#-----------------------------------------------------------------
#  Class PolynomialRegression
#-----------------------------------------------------------------

class PolynomialRegression:

    def __init__(self, degree=1, reg_lambda=1E-8):
        """
        Constructor
        """
        #TODO

    def polyfeatures(self, X, degree):
        """
        Expands the given X into an n * d array of polynomial features of
            degree d.

        Returns:
            A n-by-d numpy array, with each row comprising of
            X, X * X, X ** 3, ... up to the dth power of X.
            Note that the returned matrix will not include the zero-th power.

        Arguments:
            X is an n-by-1 column numpy array
            degree is a positive integer
        """
        #TODO

    def fit(self, X, y):
        """
            Trains the model
            Arguments:
                X is a n-by-1 array
                y is an n-by-1 array
            Returns:
                No return value
            Note:
                You need to apply polynomial expansion and scaling
                at first
        """
        #TODO

    def predict(self, X):
        """
        Use the trained model to predict values for each instance in X
        Arguments:
            X is a n-by-1 numpy array
        Returns:
            an n-by-1 numpy array of the predictions
        """
        # TODO


#-----------------------------------------------------------------
#  End of Class PolynomialRegression
#-----------------------------------------------------------------


The following code will test your implementation, which will plot the learned function.  In this case, the script fits a polynomial of degree $d=8$ with no regularization $\lambda = 0$.  From the plot, we see that the function fits the data well, but will not generalize well to new data points.  Try increasing the amount of regularization, and examine the resulting effect on the function.

In [1]:
!wget https://raw.githubusercontent.com/yexf308/AppliedStatistics/main/Homework/HW3/polydata.dat?raw=true -O polydata.dat

--2022-12-12 19:49:26--  https://raw.githubusercontent.com/yexf308/AppliedStatistics/main/Homework/HW3/polydata.dat?raw=true
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 57 [text/plain]
Saving to: ‘polydata.dat’


2022-12-12 19:49:26 (1.65 MB/s) - ‘polydata.dat’ saved [57/57]



In [4]:
allData = np.loadtxt('polydata.dat', delimiter=',')

X = allData[:, [0]]
y = allData[:, [1]]

# regression with degree = d
d = 8
model = PolynomialRegression(degree=d, reg_lambda=0)
model.fit(X, y)

# output predictions
xpoints = np.linspace(np.max(X), np.min(X), 100).reshape(-1, 1)
ypoints = model.predict(xpoints)

# plot curve
plt.figure()
plt.plot(X, y, 'rx')
plt.title('PolyRegression with d = '+str(d))
plt.plot(xpoints, ypoints, 'b-')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

## Q3.2: Learning Curve (20pt)

 In this problem we will examine the bias-variance tradeoff through learning curves. Learning curves provide a valuable mechanism for evaluating the bias-variance tradeoff. Implement the ```learningCurve()``` function to compute the learning curves for a given training/test set.  The ```learningCurve(Xtrain, ytrain, Xtest, ytest, degree, regLambda)``` function should take in the training data (```Xtrain```, ```ytrain```), the testing data (```Xtest```, ```ytest```), and values for the polynomial degree $d$ and regularization parameter $\lambda$. 

The function should return two arrays, ```errorTrain``` (the array of training errors) and ```errorTest``` (the array of testing errors).  The $i^{th}$ index (start from 0) of each array should return the training error (or testing error) for learning with $i +1$ training instances.  Note that the 0$^{th}$ index actually won't matter, since we typically start displaying the learning curves with two or more instances.

When computing the learning curves, you should learn on ```Xtrain```[0:$i$] for $i = 1, \ldots, \text{numInstances}(\texttt{Xtrain})+1$, each time computing the testing error over the **entire** test set.  There is no need to shuffle the training data, or to average the error over multiple trials -- just produce the learning curves for the given training/testing sets with the instances in their given order.  Recall that the error for regression problems is given by
\begin{equation}
\frac{1}{n} \sum_{i=1}^n (h_{\mm{\theta}}(\mathbf{x}_i) - y_i)^2 \enspace .
\end{equation}


In [None]:
def learningCurve(Xtrain, Ytrain, Xtest, Ytest, reg_lambda, degree):
    """
    Compute learning curve

    Arguments:
        Xtrain -- Training X, n-by-1 matrix
        Ytrain -- Training y, n-by-1 matrix
        Xtest -- Testing X, m-by-1 matrix
        Ytest -- Testing Y, m-by-1 matrix
        regLambda -- regularization factor
        degree -- polynomial degree

    Returns:
        errorTrain -- errorTrain[i] is the training accuracy using
        model trained by Xtrain[0:(i+1)]
        errorTest -- errorTrain[i] is the testing accuracy using
        model trained by Xtrain[0:(i+1)]

    Note:
        errorTrain[0:1] and errorTest[0:1] won't actually matter, since we start displaying the learning curve at n = 2 (or higher)
    """

    n = len(Xtrain)

    errorTrain = np.zeros(n)
    errorTest = np.zeros(n)

    #TODO -- complete rest of method; errorTrain and errorTest are already the correct shape

    return errorTrain, errorTest


Once the function is written to compute the learning curves, you should test with the following code to plot the learning curves for various values of $\lambda$ and $d$. 

You should see plots similar to the following

<img src="https://github.com/yexf308/AppliedStatistics/blob/main/image/learning.png?raw=true" width="700" />




In [None]:
#----------------------------------------------------
# Plotting tools

def plotLearningCurve(errorTrain, errorTest, regLambda, degree):
    """
        plot computed learning curve
    """
    minX = 3
    maxY = max(errorTest[minX+1:])

    xs = np.arange(len(errorTrain))
    plt.plot(xs, errorTrain, 'r-o')
    plt.plot(xs, errorTest, 'b-o')
    plt.plot(xs, np.ones(len(xs)), 'k--')
    plt.legend(['Training Error', 'Testing Error'], loc='best')
    plt.title('Learning Curve (d='+str(degree)+', lambda='+str(regLambda)+')')
    plt.xlabel('Training samples')
    plt.ylabel('Error')
    plt.yscale('log')
    plt.ylim(top=maxY)
    plt.xlim((minX, 10))


def generateLearningCurve(X, y, degree, regLambda):
    """
        computing learning curve via leave one out CV
    """

    n = len(X)

    errorTrains = np.zeros((n, n-1))
    errorTests = np.zeros((n, n-1))

    loo = model_selection.LeaveOneOut()
    itrial = 0
    for train_index, test_index in loo.split(X):
        #print("TRAIN indices:", train_index, "TEST indices:", test_index)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        (errTrain, errTest) = learningCurve(X_train, y_train, X_test, y_test, regLambda, degree)

        errorTrains[itrial, :] = errTrain
        errorTests[itrial, :] = errTest
        itrial = itrial + 1

    errorTrain = errorTrains.mean(axis=0)
    errorTest = errorTests.mean(axis=0)

    plotLearningCurve(errorTrain, errorTest, regLambda, degree)



In [None]:
allData = np.loadtxt('polydata.dat', delimiter=',')

X = allData[:, [0]]
y = allData[:, [1]]

# generate Learning curves for different params
plt.figure(figsize=(15, 9), dpi=100)
plt.subplot(2, 3, 1)
generateLearningCurve(X, y, 1, 0)
plt.subplot(2, 3, 2)
generateLearningCurve(X, y, 4, 0)
plt.subplot(2, 3, 3)
generateLearningCurve(X, y, 8, 0)
plt.subplot(2, 3, 4)
generateLearningCurve(X, y, 8, .1)
plt.subplot(2, 3, 5)
generateLearningCurve(X, y, 8, 1)
plt.subplot(2, 3, 6)
generateLearningCurve(X, y, 8, 100)
plt.show()


Notice the following:

- The y-axis is using a log-scale and the ranges of the y-scale are all different for the plots.  The dashed black line indicates the $y=1$ line as a point of reference between the plots.
- The plot of the unregularized model with $d = 1$ shows poor training error, indicating a high bias (i.e., it is a standard univariate linear regression fit).
- The plot of the unregularized model ($\lambda = 0$) with $d = 8$ shows that the training error is low, but that the testing error is high.  There is a huge gap between the training and testing errors caused by the model overfitting the training data, indicating a high variance problem.
- As the regularization parameter increases (e.g., $\lambda = 1$) with $d = 8$, we see that the gap between the training and testing error narrows, with both the training and testing errors converging to a low value.  We can see that the model fits the data well and generalizes well, and therefore does not have either a high bias or a high variance problem.  Effectively, it has a good tradeoff between bias and variance.
- Once the regularization parameter is too high ($\lambda = 100$), we see that the training and testing errors are once again high, indicating a poor fit.  Effectively, there is too much regularization, resulting in high bias.


Make absolutely certain that you understand these observations, and how they relate to the learning curve plots.  In practice, we can choose the value for $\lambda$ via cross-validation to achieve the best bias-variance tradeoff.