<a href="https://colab.research.google.com/github/ugurcancakal/network_pytorch/blob/master/linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# @title Imports

import torch
import numpy as np

# Linear Regression
 We will create a model that predicts crop yields for apples and oranges (target variables) by looking at the average temperature, rainfall and humidity
 (input variables or features) in a region. Here's the training data.

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

yield_apple = $w_{11} \cdot \text{temp} + w_{12} \cdot \text{rainfall} + w_{13} \cdot  \text{humidity} + b_1$ 

yield_orange = $w_{21} \cdot \text{temp} + w_{22} \cdot \text{rainfall} + w_{23} \cdot  \text{humidity} + b_2$ 

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

**Learning part** of linear regression is to figure out a set of weights $w_{21}, w_{22}, w_{23}$

In [7]:
# Input (temp, rainfall, humidity)

inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134,58],
                   [102,43, 37],
                   [69, 96, 70]], dtype ='float32')

In [8]:
# Targets (apples, oranges)

targets = np.array([[56 , 70],
                    [81, 101],
                    [119,133],
                    [22 , 37],
                    [103,119]], dtype = 'float32')

In [9]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

# Linear Regression from scratch
$
~~~~~~~~~~~~~~X ~~~~~~~~~~~~~\cdot ~~~~~~~~~~~W^T ~~~~~ + ~~~~~~~~b \\~\\
\begin{bmatrix} 
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
\vdots & \vdots & \vdots \\
x_{51} & x_{52} & x_{53} \\
\end{bmatrix}
\cdot
\begin{bmatrix} 
w_{11} & w_{21}\\
w_{12} & w_{22}\\
x_{13} & w_{23}\\
\end{bmatrix} 
+
\begin{bmatrix} 
b_{1} & b_{2}\\
b_{1} & b_{2}\\
\vdots & \vdots\\
b_{1} & b_{2}\\
\end{bmatrix}
$

In [11]:
w = torch.randn(2,3,requires_grad = True)
b = torch.rand(2,requires_grad = True)
print(w)
print(b)

tensor([[ 0.2831,  0.1087, -2.1260],
        [ 1.1569, -1.3752,  2.2160]], requires_grad=True)
tensor([0.4744, 0.5268], requires_grad=True)


In [12]:
def model(x):
  # It broadcasts b
  return x @ w.t() + b

In [14]:
preds = model(inputs)
print(preds)
# It's random, start improving it.

tensor([[ -62.9917,   88.1277],
        [-100.2584,  126.6078],
        [ -83.6353,   45.4237],
        [ -44.6335,  141.3874],
        [-118.3737,  103.4500]], grad_fn=<AddBackward0>)


There is a huge difference between predicted values and actual values. Before improving our model, we need a way to evaluate how our model is performing. We can compare the model's prediction values with actual targets, using the following method:

* Calculate the difference between two matrices (preds and targets)
* Square all elements of the difference matrix to remove negative values
* Calculate the average of the elements in the resulting matrix.

In [22]:
def mse(t1, t2):
  diff = t2 - t1
  return torch.sum(diff**2) / diff.numel()

In [23]:
loss = mse(preds, targets)
print(loss)

tensor(16131.3545, grad_fn=<DivBackward0>)


# Compute the Gradients



In [24]:
loss.backward()
print(w)
print(w.grad)

tensor([[ 0.2831,  0.1087, -2.1260],
        [ 1.1569, -1.3752,  2.2160]], requires_grad=True)
tensor([[-12976.3154, -15038.6865,  -9286.3252],
        [  1121.8114,  -1054.2651,     22.5595]])


derivative of the loss with respect to a particular element

In [25]:
print(b)
print(b.grad)

tensor([0.4744, 0.5268], requires_grad=True)
tensor([-158.1785,    8.9993])


For only one w element in W, while the others are constant

![negative=gradient](https://i.imgur.com/w3Wii7C.png)

Before we proceed, we reset the gradients to zero by calling `.zero_()` method. We need to do this, because PyTorch accumulates, gradients i.e. the next time we call `.backward` on the loss, the new gradient values will get added to the existing gradient values, which may lead to unexpected results.

In [26]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


# Adjust the weights and biases using gradient descent

1. Generate predictions
2. Calculate the loss
3. Compute the gradients w.r.t weights and biases
4. Adjust the weights by subtracting a small quantitiy proportional to the gradient
5. Reset the gradients to zero

In [28]:
# 1. Generate predictions

preds = model(inputs)
print(preds)

tensor([[ -62.9917,   88.1277],
        [-100.2584,  126.6078],
        [ -83.6353,   45.4237],
        [ -44.6335,  141.3874],
        [-118.3737,  103.4500]], grad_fn=<AddBackward0>)


In [29]:
# 2. Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(16131.3545, grad_fn=<DivBackward0>)


In [30]:
# 3. Compute the gradients w.r.t weights and biases
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-12976.3154, -15038.6865,  -9286.3252],
        [  1121.8114,  -1054.2651,     22.5595]])
tensor([-158.1785,    8.9993])


In [31]:
# 4. Adjust the weights by subtracting a small quantitiy proportional 
# to the gradient 
# AND
# 5. Reset the gradients to zero

with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5
  w.grad.zero_()
  b.grad.zero_()

In [32]:
print(w)
print(b)

tensor([[ 0.4129,  0.2591, -2.0331],
        [ 1.1457, -1.3647,  2.2158]], requires_grad=True)
tensor([0.4760, 0.5267], requires_grad=True)


In [33]:
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(11729.6387, grad_fn=<DivBackward0>)


# Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times.

In [58]:
# For 100 epochs
for i in range(100):
  preds = model(inputs)
  loss = mse(preds, targets)
  loss.backward()
  with torch.no_grad():
    w -= w.grad * 1e-5 # learning rate, hyperparameter
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [59]:
print(loss)
print(preds)
print(targets)

tensor(3.6726, grad_fn=<DivBackward0>)
tensor([[ 57.4211,  70.2757],
        [ 80.6893, 101.6085],
        [121.7346, 130.8796],
        [ 21.9270,  36.4262],
        [ 98.8826, 121.2138]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


# Linear regression using PyTorch built-ins

The model and training process above were implemented using basic matrix operations. But since this such a common pattern , PyTorch has several built-in functions and classes to make it easy to create and train models.

Let's begin by importing the `torch.nn` package from PyTorch, which contains utility classes for building neural networks.

In [5]:
import torch
import numpy as np
import torch.nn as nn

In [6]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58], 
                   [102, 43, 37], [69, 96, 70], [73, 67, 43], 
                   [91, 88, 64], [87, 134, 58], [102, 43, 37], 
                   [69, 96, 70], [73, 67, 43], [91, 88, 64], 
                   [87, 134, 58], [102, 43, 37], [69, 96, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], [81, 101], [119, 133], 
                    [22, 37], [103, 119], [56, 70], 
                    [81, 101], [119, 133], [22, 37], 
                    [103, 119], [56, 70], [81, 101], 
                    [119, 133], [22, 37], [103, 119]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [7]:
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [8]:
from torch.utils.data import TensorDataset

In [9]:
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

The `TensorDataset` allows us to access a small section of the training data using the array indexing notation (`[0:3]` in the above code). It returns a tuple (or pair), in which the first element contains the input variables for the selected rows, and the second contains the targets.

We'll also create a `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [10]:
from torch.utils.data import DataLoader

In [11]:
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle = True)
for xb, yb in train_dl:
  print(xb)
  print(yb)

tensor([[102.,  43.,  37.],
        [ 91.,  88.,  64.],
        [ 69.,  96.,  70.],
        [ 69.,  96.,  70.],
        [ 73.,  67.,  43.]])
tensor([[ 22.,  37.],
        [ 81., 101.],
        [103., 119.],
        [103., 119.],
        [ 56.,  70.]])
tensor([[102.,  43.,  37.],
        [ 87., 134.,  58.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [ 69.,  96.,  70.]])
tensor([[ 22.,  37.],
        [119., 133.],
        [ 81., 101.],
        [119., 133.],
        [103., 119.]])
tensor([[ 73.,  67.,  43.],
        [102.,  43.,  37.],
        [ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.]])
tensor([[ 56.,  70.],
        [ 22.,  37.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.]])


# nn.Linear


In [12]:
model = nn.Linear(3,2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.0070,  0.2341, -0.5157],
        [ 0.4375,  0.5435, -0.5753]], requires_grad=True)
Parameter containing:
tensor([0.4598, 0.5648], requires_grad=True)


In [13]:
# Parameters
list(model.parameters())

[Parameter containing:
 tensor([[ 0.0070,  0.2341, -0.5157],
         [ 0.4375,  0.5435, -0.5753]], requires_grad=True),
 Parameter containing:
 tensor([0.4598, 0.5648], requires_grad=True)]

In [14]:
preds = model(inputs)
print(preds)

tensor([[ -5.5211,  44.1780],
        [-11.3089,  51.3853],
        [  2.5253,  78.0885],
        [ -7.8425,  47.2720],
        [-12.6840,  42.6575],
        [ -5.5211,  44.1780],
        [-11.3089,  51.3853],
        [  2.5253,  78.0885],
        [ -7.8425,  47.2720],
        [-12.6840,  42.6575],
        [ -5.5211,  44.1780],
        [-11.3089,  51.3853],
        [  2.5253,  78.0885],
        [ -7.8425,  47.2720],
        [-12.6840,  42.6575]], grad_fn=<AddmmBackward>)


In [15]:
import torch.nn.functional as F

In [16]:
loss_fn = F.mse_loss

In [17]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(5222.2842, grad_fn=<MseLossBackward>)


# Optimizer

In [18]:
# which matrices need to be updated
opt = torch.optim.SGD(model.parameters(), lr = 1e-5) 

# Train the model

In [21]:
def fit(num_epochs, model, loss_fn, opt):
  for epoch in range(num_epochs):
    for xb, yb, in train_dl:

      # 1. Generate predictions
      preds = model(xb)

      # 2. Calculate Loss
      loss = loss_fn(preds, yb)

      # 3. Compute gradients
      loss.backward()

      # 4. Update parameters using gradient
      opt.step()

      # 5. Reset the gradients to zero
      opt.zero_grad()

    # Print the progress
    if(epoch+1) % 10 == 0:
      print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))


In [22]:
fit(1000, model, loss_fn, opt)

Epoch [10/1000], Loss: 687.7610
Epoch [20/1000], Loss: 319.8154
Epoch [30/1000], Loss: 291.4352
Epoch [40/1000], Loss: 202.7767
Epoch [50/1000], Loss: 39.0083
Epoch [60/1000], Loss: 91.9822
Epoch [70/1000], Loss: 168.4325
Epoch [80/1000], Loss: 84.1687
Epoch [90/1000], Loss: 86.1852
Epoch [100/1000], Loss: 138.0452
Epoch [110/1000], Loss: 73.6995
Epoch [120/1000], Loss: 118.3605
Epoch [130/1000], Loss: 74.5655
Epoch [140/1000], Loss: 89.9339
Epoch [150/1000], Loss: 107.0951
Epoch [160/1000], Loss: 70.0687
Epoch [170/1000], Loss: 66.9747
Epoch [180/1000], Loss: 66.6426
Epoch [190/1000], Loss: 45.8467
Epoch [200/1000], Loss: 47.1806
Epoch [210/1000], Loss: 42.2707
Epoch [220/1000], Loss: 39.1814
Epoch [230/1000], Loss: 52.6602
Epoch [240/1000], Loss: 20.3500
Epoch [250/1000], Loss: 35.8094
Epoch [260/1000], Loss: 21.6285
Epoch [270/1000], Loss: 43.1765
Epoch [280/1000], Loss: 30.9804
Epoch [290/1000], Loss: 29.4664
Epoch [300/1000], Loss: 25.4830
Epoch [310/1000], Loss: 35.7365
Epoch [32