<a href="https://colab.research.google.com/github/vkaggal/linear-regression-pytorch/blob/main/linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np
import torch

In [3]:
# inputs (temperature, rainfall, humidity)
inputs = np.array(
    [[73,67,43],
     [91,88,64],
     [87,134,58],
     [102,43,37],
     [69,96,70]
    ], dtype='float32'
)

In [4]:
targets = np.array(
    [[56, 70],
     [81, 101],
     [119, 133],
     [22, 37],
     [103, 119],
    ], dtype='float32'
)

In [5]:
#convert to pytorch tensors. Why start with numpy? Generally data is loaded and pre-processed using numpy
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

print("x is,", inputs)
print("y is: ", targets)

x is, tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
y is:  tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [6]:
print(f"x's shape is {inputs.shape} and y's shape is {targets.shape}")

x's shape is torch.Size([5, 3]) and y's shape is torch.Size([5, 2])


### our model

The model is nothing but combination of weights and biases

```
yield_apple = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

observing just the weights and biases, we notice that weights is a tensor of shape 2,3 and biases is a vector of size 2

In [7]:
w = torch.randn(2,3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 1.9108, -0.2543,  2.3338],
        [ 0.9950, -0.6971,  1.1052]], requires_grad=True)
tensor([-1.2795, -0.0584], requires_grad=True)


In [16]:
print(w.shape, b.shape)

torch.Size([2, 3]) torch.Size([2])


Let's define a linear regression model that would use 

$$y = x * w + b$$

- you can see the target is a linear sum of the input variables
- that sum is offset by a bias (possibly for a zero error?)

- it is a good idea to get an intuition of the matrices
- notice how 5x3 can be multiplied with 2x3 after the weights are transposed
- notice how 5x3 * 3x2 becomes 5x2 (hence we can add a 5x2 bias matrix to that result)
- but b is a vector! right, pytorch makes copies of that using 'broadcasting' (read more in the documentation)

$$
X * W^t + b $$
$$ \begin{bmatrix} x_{11} & x_{12} & x_{13}\\.&.&.\\.&.&.\\x_{51} & x_{52} & x_{53}\end{bmatrix} 
X 
\begin{bmatrix} w_{11} & x_{21} \\w_{12} & w_{22}\\w_{13} & w_{23} + \end{bmatrix}
+ 
\begin{bmatrix} b_{1} & b_{2} \\b_{1} & b_{2}\\.&.\\.&.\\b_{1} & b_{2} + \end{bmatrix}
$$

In [8]:
# note how x is of shape 5X3 and w is of shape 2X3 and be is a vector of size 2
def model(x):
  return x @ w.t() + b

In [9]:
# what are the predictions with random initialization?
predictions = model(inputs)
print(predictions)

tensor([[221.5303,  73.3956],
        [299.5961,  99.8764],
        [266.2523,  57.1983],
        [269.0448, 112.3493],
        [269.5262,  79.0413]], grad_fn=<AddBackward0>)


In [10]:
print("predictions with randomn initialization:\n",predictions, "\ncompare them to actual targets: \n", targets)

predictions with randomn initialization:
 tensor([[221.5303,  73.3956],
        [299.5961,  99.8764],
        [266.2523,  57.1983],
        [269.0448, 112.3493],
        [269.5262,  79.0413]], grad_fn=<AddBackward0>) 
compare them to actual targets: 
 tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


### Loss Function - MSE Loss

We need to evaluate how well our model is performing by comparing the predictions to actual targets
- calc the difference between predictions and targets
- calc the sq of the difference to remove negative values
- calc the avg of the elements in the resulting matrix

$$MSE = \frac{1}{N}\sum \limits _{i=1} ^N (Y_i- \hat Y_i)^2$$


In [29]:
# MSE Loss function
def mse(t1, t2):
  diff = t1 - t2
  return torch.sum(diff*diff) / diff.numel()

We can explore a bit by checking out the loss for the randomly initialized model

In [30]:
loss = mse(predictions, targets)
print(loss) # lower the loss better the model

tensor(118.5076, grad_fn=<DivBackward0>)


## That's bad - too high a loss - Gradients to the rescue
- compute gradients or derivatives of the loss wrt the weights and biases
  - differentiate loss with respect to each element in the weights matrix (keeping the rest constant)
- observe that the derivatives of the loss wrt the weight matrix is a matrix of the same dims

In [31]:
loss.backward()

In [32]:
print(w) # a matrix that represents weights
print(w.grad) # a matrix that represents the derivative of loss wrt each element of weights matrix

tensor([[-0.3597,  0.2519,  1.6156],
        [-0.1580,  0.4931,  1.1822]], requires_grad=True)
tensor([[ 141.7032, -176.1278,   71.6166],
        [ 128.9658, -128.3915,   13.4894]])


In [33]:
print(b,"\n",b.grad)

tensor([-1.3018, -0.0670], requires_grad=True) 
 tensor([1.5929, 1.1220])


### Can we reduce loss utilizing the gradients calculated and thus 

In [34]:
# update the weights and biases using the gradients and proportional to the gradients computed.
# adjust weights and reset gradients
with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5
  w.grad.zero_()
  b.grad.zero_()


In [35]:
# Calculate loss
predictions = model(inputs)
loss = mse(predictions, targets)
print(loss)

tensor(117.6149, grad_fn=<DivBackward0>)


**Objective:** find the set of weights and biases (the matrix and the vector above) where the loss is the lowest.
- loss function defined above MSE or quadratic cost function. 
- Details: equation representing MSE is a quadratic equation (that is when you plot the function, you will see a 'u' (or inverted depending on the sign of the coefficient). Think of this as a parabola/convex with 1 global minimum - vs. - a non-quadratic function which would be non-convex with multiple local minimums)
- here the gradient indicates the rate of change  of the loss. That is, the slope of the loss function WRT the weights and biases

**Note:**
> If the gradient element (of weights or biases) is positive:
> - increasing the element's value slightly will increase the loss
> - decreasing the element's value slightly will decrease the loss


### Training our model for multiple epochs

In [37]:
# perform the steps we have tried so far in a loop aka multiple epochs
for i in range(100):
  # invoke the model
  predictions = model(inputs)
  # use MSE to calculate the loss
  loss = mse(predictions, targets)
  # calculate gradients
  loss.backward()

  # adjust weights
  with torch.no_grad():
    # change the gradients proportional to w or b
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    # reset the gradient values
    w.grad.zero_()
    b.grad.zero_()

In [38]:
predictions = model(inputs)
loss = mse(predictions, targets)
print(loss)


tensor(43.4952, grad_fn=<DivBackward0>)


In [40]:
print(predictions)
print(targets)

tensor([[ 56.9949,  70.7041],
        [ 88.8167, 102.9867],
        [103.9300, 127.0547],
        [ 22.0376,  39.5846],
        [112.6041, 121.6549]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Recap:

We have seen linear regression and how to implement each of the steps:
- input and prediction matrices ( convert to tensors )
- tensors for weights and biases
- define a model (in our case a linear reg)
- define a loss function
- train for n epochs

## how about we use pyTorch built-ins to build Linear Regression model

In [41]:
import torch.nn as nn

In [42]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [43]:
# we use DataLoader to load batches using array-indexing-notation
from torch.utils.data import TensorDataset

In [44]:
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [45]:
from torch.utils.data import DataLoader

In [48]:
# the idea is that the DataLoader returns one batch of data of size defined by batch_size
batch_size = 5
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)

let's inspect the dataloader instance that we just created

In [49]:
# recall the dataloader has inputs and targets
for x,y in train_dl:
  print(x)
  print(y)
  break

tensor([[ 91.,  87.,  65.],
        [ 87., 134.,  58.],
        [ 87., 135.,  57.],
        [ 69.,  96.,  70.],
        [103.,  43.,  36.]])
tensor([[ 80., 102.],
        [119., 133.],
        [118., 134.],
        [103., 119.],
        [ 20.,  38.]])


In [52]:
# recall the linear model we built by hand? Lets use the built-ins
# note how we dont explicitly initialize weights and biases
model = nn.Linear(3,2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.3197,  0.1291, -0.3409],
        [-0.5455, -0.4388,  0.4450]], requires_grad=True)
Parameter containing:
tensor([ 0.2920, -0.0380], requires_grad=True)


In [53]:
# note further how we can already invoke the model
predictoins = model(inputs)
predictions

tensor([[ 56.9949,  70.7041],
        [ 88.8167, 102.9867],
        [103.9300, 127.0547],
        [ 22.0376,  39.5846],
        [112.6041, 121.6549]], grad_fn=<AddBackward0>)

In [54]:
# continuing on our trajectory, we can utilize nn.functional package to define a loss function  
import torch.nn.functional as F
loss_fn = F.mse_loss

In [55]:
# let's take the loss_fn for a spin ;)
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(18902.1133, grad_fn=<MseLossBackward>)


###Optimizer
Recall how we updated weights and biases using the gradients? Let's use SGD to acheive the same.

> Note:
> - SGD here is Stochastic Gradient Descent
> - examples are selected in batches


In [57]:
# model.parameters() passed in tells the optimizer the matrices it could modify
# as part of the update step
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

### Explore further:

- [Gradient descent visualization](https://www.youtube.com/watch?v=IHZwWFHWa-w)
- Boston [housing prices](https://www.kaggle.com/c/boston-housing)


## Training the model

1. recall the predictions with random weights and biases? We generate predictions using that
2. we then need to decide if we need to go further or finetune. we need to calculate the loss for this
3. we need to decide how far to adjust the weights, we utilize gradients for this. So, we need to compute gradients
4. we adjust the weights by substracting small qty proportional to the gradient
5. we then reset the gradients to zero