# Linear Regression

Linear regression is a predictive model that assumes the outcome has the linear relationship with the observed signals, e.g. $\hat{y} = w^Tx$, where $\hat{y}$ is the outcome predicted with respect to the given observation vector $x = (x_1, x_2, ..., x_k)$ and the learned weights of the model $w = (w_1, w_2, ..., w_k)$.

![linear regression](./imgs/linear_regression.png)

To quantify the predictability of the model, min-squared error (MSE) is used as the cost function is defined as $C = -\frac{1}{2n}\sum_m (y^{(m)} - \hat{y^{(m)}})^2$. Usually, a regularization term $b = \lambda ||w||_m$ is added to make less likely to overfit to the training data, making the cost function. Therefore the cost function is $C = \frac{1}{2n} \sum_m (y^{(m)} - \hat{y^{(m)}}) + \frac{\lambda}{2}||w||_2^2$, and the partial derivative for each weight $w_i$ is $-(y - \hat{y})x_i + \lambda w_i$. If the weight update is conducted in mini-batch, the weight update for $w_i$ would be $-\sum_m (y^{(m)} - \hat{y^{(m)}})x_i + \lambda\sum_m w_i$.

In [242]:
import numpy as np

class LinearRegression:
    
    def __init__(self, ndim, l2_weight):
        self.W = np.random.randn(ndim + 1, 1)  # to include the weight for the bias term
        self.l2_weight = l2_weight
        
    def predict(self, X):
        """ Predict given a batch of inputs."""
        bias = np.ones((X.shape[0], 1))  # pretend 1 to X as the bias term
        X = np.concatenate((bias, X), axis=1)
        return self.W.T.dot(X.T).T  # y = w^T * x, dim: (n_batch, 1)
    
    def train(self, X, y, lr):
        """ Update the model weights given a batch of training instances."""
        outputs = self.predict(X)  # dim (n_batch, 1)
        pred_diffs = -(np.expand_dims(y, axis=1) - outputs)  # dim: (n_batch, 1)
        bias = np.ones((X.shape[0], 1))  # pretend 1 to each input vector as the bias term
        X_with_bias = np.concatenate((bias, X), axis=1)  # dim: (n_batch, ndim+1)
        dW = np.sum(pred_diffs * X_with_bias + self.l2_weight * self.W.T, axis=0)  # (y-\hat{y})*x_i
        self.W -= lr * np.expand_dims(dW, axis=1) 
        return abs(np.sum(pred_diffs) / len(pred_diffs))  # return the loss
    
    

In [245]:
n_dim, n_batch = 15, 10

model = LinearRegression(ndim=n_dim, l2_weight=0.01)
X = np.random.randn(2, 15)
print("model.predict:", model.predict(X))

# train the model to predict sum(x).
for it in range(100):
    X = np.random.randn(n_batch, n_dim)
    y = np.sum(X, axis=1)
    loss = model.train(X, y, 0.01)
    if it % 10 == 0:
        test_X = np.random.randn(1, n_dim)
        test_y = np.sum(test_X, axis=1)
        pred_y = model.predict(test_X)
        print("iteration %d, loss: %.5f" % (it + 1, loss))


model.predict: [[-4.14617347]
 [-1.56619301]]
iteration 1, loss: 0.47604
iteration 11, loss: 0.59036
iteration 21, loss: 0.28405
iteration 31, loss: 0.05715
iteration 41, loss: 0.01114
iteration 51, loss: 0.01075
iteration 61, loss: 0.00456
iteration 71, loss: 0.01138
iteration 81, loss: 0.00357
iteration 91, loss: 0.00009
