# 1.2 Linear Regression with PyTorch

We will see now how to implement a really simple task with PyTorch, such as performing a linear regression on two small sets of points. 

The task consists of: given a set of *independent* x values, learn to estimate the relationship (beta) with the corresponding *dependent* y values.

More info: https://en.wikipedia.org/wiki/Regression_analysis

First we import our usual packages

In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

We now create our data points, you can copy the same numbers or change them a bit, that wouldn't change our aims to show how the process works

In [None]:
# Toy Dataset 
x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168], 
                    [9.779], [6.182], [7.59], [2.167], [7.042], 
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573], 
                    [3.366], [2.596], [2.53], [1.221], [2.827], 
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)

In [None]:
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.legend()
plt.show()

Now we create a really simple model, sub-classing `nn.Module` and thus reimplementing its `forward` method. Instances of `nn.Module` are `callable`s and their `forward` method is invoked when they are called (that is, given `m = MyModule()` being an instance of `nn.Module`, the `__call__` method of `m` will call the `MyModule.forward` method).

Our model for the linear regression will consists in just a single linear layer, also known as affine layer or fully connected layer, which applies a linear transformation to the incoming data: `y = Wx + b`. 

So we initialize the layer object and call it with argument `x` to the `forward` method.

Our Linear layer will have a single input and output value, that's because torch always assumes you send the data in batches through modules, so we will at each step send x_train, and that's the reason of the arrays having an additional dimension.

In [None]:
# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self, input_size, output_size):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)

    def forward(self, x):
        out = self.linear(x)
        return out

We can now instantiate our model object, loss function and optimizing algorithm

We use an MSELoss (which stands for Mean Squared Error Loss), that computes the mean squared error between two inputs (the model's output and the actual target/ground truth/etc, the one which should be the correct output).

As optimization algorithm we use the Stochastic Gradient Descent algorithm (SGD) which consists of the computations of the error and the derivative of each one of the models' parameters with respect to it (the gradients). The algorithm updates than each parameters applying `w' = w - lr * dl/dw`.

In [None]:
# single "neural unit" layer
model = LinearRegression(1, 1)
# same as:
# model = nn.Linear(1, 1)

# Loss function: we want to compute derivatives w.r.t. to this function
criterion = nn.MSELoss()
# Optimizer: will improve model parameters according to the computed derivatives
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

In order to train our simplest model, we loop trough the desired number of epochs, performing an optimization step at every run in the for loop, remember that the model actually sees every data point and its respective y in a single forward pass, so the loss (and the gradients) will be averaged at each step.

In [None]:
# Train the Model for 60 epochs
# One epoch is an iteration on the entire dataset
for epoch in range(60):
    # Convert numpy array to torch tensors
    inputs = torch.from_numpy(x_train)
    targets = torch.from_numpy(y_train)

    # Reset gradients, or they will be accumulated
    optimizer.zero_grad()
    outputs = model(inputs)  # Forward step
    loss = criterion(outputs, targets)  # Compute loss value
    loss.backward()  # Backward step
    optimizer.step()  # Modify parameters using gradients

    if (epoch + 1) % 5 == 0:
        print('Epoch [%d/60], Loss: %.4f' 
              % (epoch + 1, loss.item()))

We run the SGD algorithm for 60 epochs, let's see what the model learned by plotting the regression line (remember y = Wx + b from above?) and the ground truth points

In [None]:
# Plot the graph
model.eval()
predicted = model(torch.from_numpy(x_train)).data.numpy()
plt.plot(x_train, y_train, 'ro', label='Original data')
plt.plot(x_train, predicted, label='Fitted line')
plt.legend()
plt.show()

We are done with our task and we can export the model parameters to a file so that we could eventually load it later when needed.

In [None]:
# Save the Model
torch.save(model.state_dict(), 'reg-model.pth')