## Linear Regression II

This guide dives more into the nuts and bolts of implementing a linear regression model for prediction using PyTorch. While the conceptual basis is identical to that covered in [Linear Regression I](https://williampangbest1.github.io/machine_learning/linear_regressionI.html), we will be expanding our model to a more generaliziable case with the use of matrix operations. This guide follows closely with d2l's [implementation](https://d2l.ai/chapter_linear-regression/linear-regression-scratch.html) of linear regression, which utilizes object-orientated programming (OOP). 

Here's a sketch of what we'll be doing:

- **Defining functions needed for OOP**<br>
We will define two functions: one allows us to add methods into utility classes *after the class has been created*, and another allows us to directly translate function input arguments (into classes) as class attributes.

- **Creating a synthetic dataset**<br>
Once again, we will be creating a dataset based on some user-defined "ground truth" parameters. In addition, we'll also define a function that allows us to split the data into random batches (more on this in the future, but packaging data into smaller packages can be much more efficient when parallel computing is utilized)

- **Defining the model**<br>
The model relates the input and parameters to the output. We will be using the vectorized equation $\mathbf{\hat{y}} = \mathbf{Xw} + b$.

- **Defining the loss function**<br>
The loss function is an algorithim that tell us how well our predictions is doing — in essence a "rule of thumb" that computes the distance between the prediction and "true" expected output. We will be using the vectorized form of the mean squared error. In addition, we will also write a class that takes in a random batch of data and computes its gradient with respect to the input parameters (also known as stochastic gradient descent).

- **Training the model**<br>
This is the part where we make predictions, and iteratively make changes to our parameter predictions so that it gets closer and closer to the "ground-truth". To do so we:
    - Initialize parameters (slope $m$ and bias term $b$)
    - Repeat until done:
        - Compute the gradient (i.e., the derivative of the loss function w.r.t parameters) of each batch
        - Update our parameters


### Object Orientated Programming
These two functions are part of d2l's guide on diving into deep learning. For more information, please review this [chapter](https://d2l.ai/chapter_linear-regression/oo-design.html) from [d2l](https://d2l.ai) on OOPs.

First, we will define a `add_method` function. This function allows us to add "methods" to class (functions in classes are called methods), which aids in understanding as we can build our tools moudlarly by sequentially adding more and more methods.

In [23]:
def add_method(Class):  
    def wrapper(obj):
        setattr(Class, obj.__name__, obj)
    return wrapper

# To add method to class:
# @add_method(name_of_class)
# def name_of_method():
#    ...

Next, we will define a `HyperParameters` utility class. This allows us to saves all arguments in a class’s \__init__ method as class attributes, hence simplying the code.

In [24]:
import inspect
class HyperParameters: 
    def save_hyperparameters(self, ignore=[]):
        frame = inspect.currentframe().f_back
        _, _, _, local_vars = inspect.getargvalues(frame)
        self.hparams = {k:v for k, v in local_vars.items()
                        if k not in set(ignore+['self']) and not k.startswith('_')}
        for k, v in self.hparams.items():
            setattr(self, k, v)

To illustrate the purpose of this utility class, let's look at two cases.

In [25]:
# Case 1
class printmeplease:
    def __init__(self, first_word, second_word):
        self.first_word = first_word # Defining it as class attribute
        self.second_word = second_word # Also defining it as a class attribute
        print(self.first_word + self.second_word)
        
output = printmeplease(first_word = 'Hello ', second_word = 'World');

Hello World


In [26]:
# Case 2
class printmeplease(HyperParameters):
    def __init__(self, first_word, second_word):
        self.save_hyperparameters()
        #self.first_word = first_word 
        #self.second_word = second_word
        print(self.first_word + self.second_word)
        
output = printmeplease(first_word = 'Hello ', second_word = 'World');

Hello World


As we can see in **Case 2**, defining input arguments as class attributes from the get go can make our code more compact.

### Creating a synthetic dataset

First, we need some ground truth values. Recall that these are the values our algorithim is trying to "guess".

As a reminder, the format of the linear regression is:
$\hat{y} = w_1x_1 + w_2x_2 + b$ (generalizable as $\mathbf{y} = \mathbf{Xw} + b$).

The tensor `w = torch.tensor([2, -3.4])` gives us exactly $w_1$ and $w_2$.

In [27]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim 
import random

# Ground Truth Values
w = torch.tensor([2, -3.4]) # 2 features 
b = 4.2

Now, we create our synthetic data generator utility class. Note that we're creating 2000 samples. Let's breakdown the components:

- `self.x` is our input matrix X (also known in ML terms as features)
- `noise` is just a random number of *n rows* with mean 0 and variance 1 scaled by a user-defined noise value
- `self.y` is the output of $y = XW + b + noise$ (also known in ML terms as labels)

In [28]:
class SyntheticRegressionData(nn.Module, HyperParameters):
    def __init__(self, w, b, batch_size, noise = 0.1, n= 2000):
        super().__init__()
        
        self.save_hyperparameters() 
        
        self.batch_size = batch_size
        self.n = n
        self.X = torch.randn(self.n, len(w))
        noise = torch.randn(self.n, 1) * noise
        self.y = torch.matmul(self.X, w.reshape((-1,1))) + b + noise

Now, we can generate our synthetic data. Here we see the beauty of OOP.

In [29]:
data = SyntheticRegressionData(w=torch.tensor([2, -3.4]), b = 4.2, batch_size = 32);

print(f'The features shape: {data.X.shape}')
print(f'The labels shape: {data.y.shape}')
print(f'First entry of features: {data.X[0]}')
print(f'First entry of labels: {data.y[0]}')

The features shape: torch.Size([2000, 2])
The labels shape: torch.Size([2000, 1])
First entry of features: tensor([-0.5301, -0.6899])
First entry of labels: tensor([5.5020])


Let's now add in the method which allows us to grab a random batch of samples using the `add_method` function defined earlier.

In [30]:
@add_method(SyntheticRegressionData) #which is the same as saying wrapper(get_dataloader) in add_method
def get_dataloader(self):
    indices = list(range(self.n))
    random.shuffle(indices)
        
    for i in range(0, self.n, self.batch_size):
        batch_indices = torch.tensor(
            indices[i: i + self.batch_size])
        yield self.X[batch_indices], self.y[batch_indices]

With the method added to our `SyntheticRegressionData` utility class, we can now try and get a batch of data as defined in our `batch_size`.

In [31]:
for X, y in SyntheticRegressionData.get_dataloader(data):
    print('Features:', X, '\n Labels:', y)
    break

Features: tensor([[-6.9736e-01, -1.5922e+00],
        [-8.7239e-01, -2.1415e-01],
        [-4.2887e-02, -1.3465e+00],
        [ 1.0651e+00,  1.7080e-01],
        [-1.6511e+00, -1.1405e+00],
        [ 5.7221e-01, -1.6894e+00],
        [-8.1030e-01,  9.7975e-01],
        [ 1.3062e+00, -9.9221e-01],
        [-5.4856e-01,  1.8849e+00],
        [ 7.6326e-01,  1.1298e+00],
        [ 2.2546e+00,  6.6838e-01],
        [-1.3383e+00,  5.4558e-01],
        [ 6.1972e-01,  8.0645e-01],
        [-2.1210e-01, -2.0688e+00],
        [-1.0188e+00,  3.6359e-01],
        [ 1.8891e+00, -7.6200e-01],
        [-1.0102e+00,  7.9557e-01],
        [ 7.5183e-01, -8.6839e-01],
        [ 3.6415e-02, -3.6739e-01],
        [-5.0555e-01,  1.4970e+00],
        [-4.1458e-01,  9.4749e-01],
        [ 3.6671e-01, -3.4895e-01],
        [ 1.4519e-01,  1.0041e+00],
        [ 2.6658e-01, -7.5004e-01],
        [ 4.6444e-01,  2.6947e-01],
        [ 3.2993e-01, -2.0760e+00],
        [ 7.5001e-05, -9.1745e-01],
        [ 4.4487e-

### Defining the Model

In [32]:
class Model(nn.Module, HyperParameters):
    def __init__(self):
        
        num_inputs = 2 #Can change this later as an input argument
        sigma = 0.05
        
        super().__init__()
        self.save_hyperparameters()
        
        #Initializing parameters 
        self.w = torch.normal(0, sigma, (num_inputs, 1), requires_grad = True) # Normal Distribution weights
        self.b = torch.zeros(1, requires_grad = True) # Bias set to 0
        
    
    def forward(self, X):
        return torch.matmul(X, self.w) + self.b        #Returns [batch_size x 1] tensor
    
    def loss(self, y_hat, y):
        l = (y_hat - y.reshape(y_hat.shape)) ** 2 / 2   #Returns [batch_size x 1] tensor
        return l.mean() # Returns [1x1] tensor
    
    def optimizer(self):
        return SGD([self.w, self.b], 0.03); # returns self.w, self.b

In [33]:
class SGD(HyperParameters):
    def __init__(self, params, lr):
        self.save_hyperparameters()
        
        for param in params:
            with torch.no_grad():
                param -= lr * param.grad # value for batch_size
                #param.grad = weights.grad (differentated loss function w.r.t weights), bias.grad
                param.grad.zero_() # Reset grad to 0 (i.e., the computational graph)

### Running the Model

In [34]:
# Grab data
data = SyntheticRegressionData(w=torch.tensor([2, -3.4]), b = 4.2, batch_size = 32)
num_epochs = 10
lr = 0.003
batch_size = 32

model = Model()
for epoch in range(num_epochs):
    for X, y in SyntheticRegressionData.get_dataloader(data):
        y_hat = model(X)
        loss = model.loss(y_hat, y) # Gives us the differentated loss function w.r.t hyperparameters
        loss.backward()
        with torch.no_grad():    
            model.optimizer()

print('Estimated w:', model.w, '\n Estimated b:', model.b)

Params are: [tensor([[ 0.0212],
        [-0.0603]], requires_grad=True), tensor([0.], requires_grad=True)]
Params are: [tensor([[ 0.1136],
        [-0.1567]], requires_grad=True), tensor([0.], requires_grad=True)]
Params are: [tensor([[ 0.1136],
        [-0.1567]], requires_grad=True), tensor([0.1524], requires_grad=True)]
Params are: [tensor([[ 0.2197],
        [-0.2927]], requires_grad=True), tensor([0.1524], requires_grad=True)]
Params are: [tensor([[ 0.2197],
        [-0.2927]], requires_grad=True), tensor([0.3012], requires_grad=True)]
Params are: [tensor([[ 0.2825],
        [-0.4177]], requires_grad=True), tensor([0.3012], requires_grad=True)]
Params are: [tensor([[ 0.2825],
        [-0.4177]], requires_grad=True), tensor([0.4319], requires_grad=True)]
Params are: [tensor([[ 0.2982],
        [-0.5091]], requires_grad=True), tensor([0.4319], requires_grad=True)]
Params are: [tensor([[ 0.2982],
        [-0.5091]], requires_grad=True), tensor([0.5469], requires_grad=True)]
Params ar