<a href="https://colab.research.google.com/github/thiagolermen/ml-course/blob/main/src/2-multivariate-linear-regression/PyTorch-multivariate-linear-regression-from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multivariate Linear Regression - PyTorch

We're gonna predict the price that a house will sell for. The difference this time around is we have more than one dependent variable. We're given both the size of the house in square feet, and the number of bedrooms in the house

## Imports

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import torch

## Dataset

In [2]:
path = 'https://raw.githubusercontent.com/thiagolermen/ml-course/main/data/ex1data2.txt?token=AL353PBOIXU364U56BAPW6TAXPWLS'
data = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])
data.head()

Unnamed: 0,Size,Bedrooms,Price
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


### Data normalization

In [3]:
# We put this values normally distributed
data = (data - data.mean())/data.std()
data.head()

Unnamed: 0,Size,Bedrooms,Price
0,0.13001,-0.223675,0.475747
1,-0.50419,-0.223675,-0.084074
2,0.502476,-0.223675,0.228626
3,-0.735723,-1.537767,-0.867025
4,1.257476,1.090417,1.595389


### Data preprocessing

In [44]:
# set X (training data) and y (target variable)
X = data.iloc[:,0:2]
y = data.iloc[:,2:]

# convert to matrices and initialize theta
X = np.matrix(X.values, dtype="float32")
y = np.matrix(y.values, dtype="float32")

# convert to torch tensor
X = torch.from_numpy(X)
y = torch.from_numpy(y)

In [45]:
X.shape, y.shape

(torch.Size([47, 2]), torch.Size([47, 1]))

In [46]:
# initialize weights and biases
w = torch.randn(1, 2, requires_grad=True)
b = torch.randn(1, requires_grad=True)
print(w)
print(b)

tensor([[0.6531, 0.8529]], requires_grad=True)
tensor([0.5608], requires_grad=True)


## Model

First we define the basic model to satisfy the linear regression expression as follows:

$y = w x + b \rightarrow X \cdot W^T + b$

In [47]:
def model(x):
  return x @ w.t() + b

In [48]:
# Random predictions
preds = model(X)
print(preds)

tensor([[ 0.4549],
        [ 0.0408],
        [ 0.6982],
        [-1.2313],
        [ 2.3120],
        [ 1.4779],
        [-0.0135],
        [-0.1014],
        [-0.1400],
        [-0.0464],
        [ 1.4410],
        [ 0.3695],
        [ 0.2791],
        [ 4.6474],
        [-0.2321],
        [ 1.7368],
        [-1.3102],
        [-0.2584],
        [ 1.9907],
        [ 2.3375],
        [ 0.1780],
        [-0.8434],
        [ 0.0440],
        [ 1.4590],
        [ 1.9226],
        [-0.3701],
        [-0.0759],
        [ 0.8017],
        [ 0.5338],
        [ 0.8929],
        [-0.8837],
        [-2.6939],
        [ 1.5231],
        [ 1.3038],
        [ 1.3350],
        [-0.0932],
        [-0.2559],
        [ 1.5987],
        [ 3.3105],
        [ 1.6234],
        [-1.0275],
        [ 0.5650],
        [ 1.9562],
        [-0.2880],
        [-1.6947],
        [ 1.3687],
        [-0.2855]], grad_fn=<AddBackward0>)


## Loss function

Cost function

$J(\theta) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}_{i}- y_{i} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2$

In [49]:
# we calculate the mean squared error
def mse(h, y):
  diff = h - y
  return torch.sum(diff * diff) / (diff.numel())

In [50]:
# compute loss
loss = mse(preds, y)
print(loss)

tensor(1.2018, grad_fn=<DivBackward0>)


In average each element in the prediction differs from the actual target by the square root of the loss. And that's pretty bad becasue the numbers we're trying to predict in had been normalized. It's called loss, because it indicates how bad the model is at predicting target variables.

## Compute gradients

As we set ```require_grads=True``` for our weights and biases, now we can compute the gradients.

In [51]:
loss.backward()

The gradients are sotrees in thr ```.grad``` properly of the respective tensors.

In [52]:
print(w)
print(w.grad)

tensor([[0.6531, 0.8529]], requires_grad=True)
tensor([[0.5396, 1.5197]])


In [53]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0.]])
tensor([0.])


## Train with GD

In [54]:
# train for 100 epochs
for i in range(100):
  preds = model(X) # predictions
  loss = mse(preds, y) # calcultae the loss
  loss.backward() # compute gradients (derivatives)
  with torch.no_grad(): # gardient descent
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [55]:
# calculate the new loss
preds = model(X)
loss = mse(preds, y)
print(loss)

tensor(1.1980, grad_fn=<DivBackward0>)


In [56]:
# Predictions
preds, y

(tensor([[ 0.4541],
         [ 0.0402],
         [ 0.6971],
         [-1.2297],
         [ 2.3086],
         [ 1.4752],
         [-0.0139],
         [-0.1018],
         [-0.1404],
         [-0.0468],
         [ 1.4382],
         [ 0.3687],
         [ 0.2784],
         [ 4.6410],
         [-0.2324],
         [ 1.7338],
         [-1.3085],
         [-0.2586],
         [ 1.9875],
         [ 2.3340],
         [ 0.1774],
         [-0.8421],
         [ 0.0435],
         [ 1.4563],
         [ 1.9206],
         [-0.3703],
         [-0.0764],
         [ 0.8006],
         [ 0.5329],
         [ 0.8917],
         [-0.8823],
         [-2.6900],
         [ 1.5203],
         [ 1.3023],
         [ 1.3323],
         [-0.0936],
         [-0.2562],
         [ 1.5959],
         [ 3.3062],
         [ 1.6205],
         [-1.0260],
         [ 0.5641],
         [ 1.9531],
         [-0.2882],
         [-1.6928],
         [ 1.3660],
         [-0.2857]], grad_fn=<AddBackward0>), tensor([[ 0.4757],
         [-0.08