<a href="https://colab.research.google.com/github/vikrantmehta123/ML-Algs/blob/main/Linear_Regression_on_Boston_Housing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression on Boston Housing Dataset

### Getting the Dataset:
The below cell gets the dataset from keras library and makes the split into train and test data.
* `Training_data` = Training data matrix of shape $(n, d)$
* `labels` = label vector corresponding to the training data
* `test_data` = Test data matrix of shape $(n_1, d)$ where $n_1$ is the number of examples in test dataset.
* `test_labels` = label vector corresponding to the test data

In [None]:
import numpy as np
from keras.datasets import boston_housing

Train, test = boston_housing.load_data(seed= 111)
Training_data, labels = Train[0], Train[1]
Test_data, test_labels = test[0], test[1]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


### Add Dummy Feature to the Dataset

Run the cell below to add a dummy feature in the feature matrix `Training_data` and test data matrix `test_data`.

In [None]:
dummy_feature_train, dummy_feature_test = np.ones(Training_data.shape[0]), np.ones(test_data.shape[0])
X, X_test = np.column_stack((dummy_feature_train, Training_data)), np.column_stack((dummy_feature_test, test_data))
X = X.T
X.shape

## Section 1: Normal Equation

### Weight Vector $\vec{w}$

Using the normal equation for linear regression, compute the weight vector as $$ \vec{w} = (X \cdot X^T)^{-1} \cdot (X \cdot y) $$



In [None]:
w = np.linalg.pinv(X @ X.T) @ X @ labels

### Prediction

Using $X^T \cdot \vec{w} $, predict on the training data.


In [None]:
predictions = X.T @ w

### Training Loss

Using the predictions computed in the above cell, compute the loss on the training data. Consider the loss to be defined as:

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 



In [None]:
loss = np.sqrt(np.mean((labels - predictions) ** 2))
loss

### Test Loss

Predict on the test dataset and compute the test loss. Consider the loss to be defined as

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 



In [None]:
test_predictions = X_test @ w
test_loss = np.sqrt(np.mean((test_labels - test_predictions) ** 2))
test_loss

5.327662216177319

## Section 2: Gradient Descent

Find the weight vector using the gradient descent. Here a constant learning rate of $\eta = 10^{-10}$ is used, and the number of iterations is taken as 100.



In [None]:
def gradient_descent(X, y, iters, w, lr):
    for i in range(iters):
        w = w - lr * (2 * (X @ X.T) @ w  -  2 * X @ y)
    return w

In [None]:
w = np.zeros(X.shape[0])
w = gradient_descent(X, labels, 100, w, 1e-10)
w

0.058959061195902614

### Training Loss

Run the cell below to find the loss for the training data points for the model generated by gradient descent. The loss to be defined as

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 



In [None]:
predictions = X.T @ w
np.sqrt(np.mean((predictions - labels)**2))

11.13727323702196

### Test Loss

Run the cell below to find the loss for the test data points for the model learnt using the gradient descent. The loss to be defined as

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 



In [None]:
test_predictions = X_test @ w
np.sqrt(np.mean((test_labels - test_predictions)**2))

10.964491250062146

## Section 3: Stochastic Gradient Descent

Run the cells below to find the weight vector using stochastic gradient descent. A constant learning rate of $\eta = 10^{-10}$ is being used and the number of iterations is taken as 1000, whereas the batch size is taken as $⌈\text{number of samples}/5⌉ $. For sampling the batch examples in $ith$ iteration, seed is set at $i$. The final weight vector is the last updated weight vector.



In [None]:
def stochastic_gradient_descent(X, y, w, batch_size,iters, lr):
    for i in range(iters):
        rng = np.random.default_rng(seed=i)
        indices = rng.integers(0, X.shape[1], size=batch_size)
        samples = X[: , indices]
        sample_labels = y[indices]
        w = w - lr * (2 * (samples @ samples.T) @ w  -  2 *(samples @ sample_labels))
    return w


In [None]:
# Initialise
batch_size = (X.shape[1])//5 
w = np.zeros(X.shape[0])

w = stochastic_gradient_descent(X, labels, w, batch_size, 1000, 1e-10)
w

0.10124206187312643

### Training Loss:

Run the cell to find the loss for the training data points if the model is learnt using the stochastic gradient descent. Consider the loss to be defined as

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 



In [None]:
train_predictions = X.T @ w
np.sqrt(np.mean((labels - train_predictions) ** 2))

8.6246359251934

### Test Loss

Run the cell below to find the loss for the test data points if the model is learnt using the stochastic gradient descent. Consider the loss to be defined as

$$ \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \hat{y}_i)^2}
$$

Where $\hat{y}_i$ is the prediction for $i^{th}$ data point. 


In [None]:
predictions = X_test @ w
np.sqrt(np.mean((test_labels - predictions) ** 2))

8.352305624451189