# Linear and Logistic Regression with PyTorch

Notebook created in PyTorch by [Santi Pascual](https://github.com/santi-pdp) for the [UPC School](https://www.talent.upc.edu/ing/estudis/formacio/curs/310400/postgrau-artificial-intelligence-deep-learning/) (2019) and updated by [Xavier Giro](https://imatge.upc.edu/web/people/xavier-giro) for [UPC TelecomBCN](https://telecombcn-dl.github.io/dlai-2019/) (2019).

The linear regression part is based on a previous version for the [Barcelona Technology School](https://barcelonatechnologyschool.com/master/master-in-big-data-solutions/) (BTS) created by [Victor Campos](https://scholar.google.com/citations?user=8fzVqSkAAAAJ&hl=en) (2018) and extended by [Daniel Fojo](https://www.linkedin.com/in/daniel-fojo/) (2019).

This lab is about linear and logistic regressions with PyTorch. These two components are fundamental in every machine learning toolbox. Linear regression allows us to predict real valued future trends. Logistic regression, on the other hand, allows us to tell classes apart by learning borders in the feature space that separate point clouds.


In [None]:
# We first import PyTorch and Numpy libraries as fundamental tools to work with arrays and tensors
import torch
import numpy as np
import matplotlib.pyplot as plt
# inline ensures we will automatically see the plots as soon as we operate with plot() calls
%matplotlib inline


## Data loading

We will first build a toy dataset. This will be based on the line $y = 5x + 3$, with additive noise $N(0, 3)$ that will add some distorting values. So out of this we know so far:
1. We have a line out of which we will get outcomes $y[x]$
2. These outcomes will be distorted with some noise that will blur the line, but the line is still the underlying generation process.

We will then use a linear model $\hat{y} = w\cdot x + b$ to learn the parameters $w$ and $b$ that lead to the predicted responses that have to resemble those of the real data. We will then be able to predict new values for arbitrary $x$ values as long as our regressor is trained properly to approximate the real values of the underlying line $w=5$, $b=3$.

In [None]:
# We generate NUM_SAMPLES data points between [0,X_SPAN] 
# by sampling from a uniform distribution
NUM_SAMPLES = 200
X_SPAN = 10
train_X = np.random.rand(NUM_SAMPLES) * X_SPAN
# generate noise
noise = np.random.randn(NUM_SAMPLES) * 3
train_Y = 5*train_X + 3 + noise
n_samples = train_X.shape[0]

plt.scatter(train_X, train_Y)

In [None]:
# We also build a testset to try new predictions with our model once trained
test_X = np.random.rand(NUM_SAMPLES) * X_SPAN
test_Y = 5*test_X + 3 + noise

## Building the Linear Regressor

The model we are pursuing behaves as a learnable line and has only 2 parameters. It is elegant to enclose everything within the `nn.Module` class of PyTorch. This special class only requires us to define the following:

1. The parameters of our model: regardless of whether it is a complicated neural layer, a simple scalar parameter, or a pizza (whutt?), everything is declared in the `__init__` of our class (normally).
2. The forward function of our model: we declare a signature for the `def forward(self, x)` function, or the `def forward(self, x, conditioning)` function, or the `def forward(self, x, y, z, t, w, o, z2, superdupervariable)`, it is up to us what parameters we pass in. The only matter is to have the skeleton defined in the next cell.

In [None]:
# first, we import the neural net module
import torch.nn as nn

# Now we define the linear regression class
class LinearRegression(nn.Module):
  
  def __init__(self):
    # ALWAYS CALL THE SUPERCLASS INIT OR PYTORCH WILL COMPLAIN
    super().__init__()
    # we define two parameters, w and b (randomly initialized)
    self.w = nn.Parameter(torch.randn(1))
    self.b = nn.Parameter(torch.randn(1))
    
  def forward(self, x):
    # we build the forward pass by writing the line equation
    y = self.w * x + self.b
    return y

In [None]:
# Predict the points of our erratic model for the test set 
# WATCH OUT FOR THE TORCH CONVERSION =D, You learned something useful in the
# previous lab about tensors!
lreg = LinearRegression()
test_X = torch.FloatTensor(test_X)
y_ = lreg(test_X)

# Check the parameters of our model
print('w initial value: ', lreg.w.item())
print('b initial value: ', lreg.b.item())

# Plot real data on a new figure (use the one from the training set we built)
# together with the predicted output for the test data
plt.scatter(train_X, train_Y)
plt.scatter(test_X.data.numpy(), y_.data.numpy())


I bet the line above looks a bit out of fit with respect to the data as the parameters $w$ and $b$ in our model are initialied randomly. If it fits the training points only after random init, perhaps trying lottery for today might be a cool idea, luck might be on your side.

### Exercise 1

Can you tell what distribution do $w$ and $b$ follow in our randomly initialized model?

Hint: understand the code in the defined `class`, the answer is there!

### Now let's iterate step by step to update the parameters of this line with BACKPROPAGATION and STOCHASTIC GRADIENT DESCENT!

<figure>

<img src='https://pbs.twimg.com/media/D4cxqsNWkAAB9he?format=png&name=900x900' width=250/>
<figcaption>Gradient descent updates the model by optimizing the parameters on pieces of data.</figcaption>
</figure>

In [None]:
from random import shuffle
# Shuffle datasets
def shuffle_dataset(X, Y):
  joint = torch.cat((X, Y), dim=1)
  joint = list(torch.chunk(joint, len(joint), dim=0))
  shuffle(joint)
  joint = torch.cat(joint, dim=0)
  return torch.chunk(joint, 2, dim=1)

### Exercise 2

Finish the following code to succesfully train the linear regressor. To do that, consider that you have to do the forward pass with the training batch, compute the cost with the loss function, do the backpropagation and update your model's parameters.

In order to solve this exercise, you may want to review how a neural network was trained in the Backpropagation lab.

In [None]:
import torch.nn.functional as F
import torch.optim as optim

lreg = LinearRegression()

# Fit all training data
X = torch.FloatTensor(train_X).view(-1, 1)
Y = torch.FloatTensor(train_Y).view(-1, 1)
NUM_EPOCHS = 10
BATCH_SIZE=5
LR = 1e-2

# define MSE as the cost function
cost = F.mse_loss

opt = optim.SGD(lreg.parameters(), lr=LR)
avg_loss = None
avg_weight = 0.1
losses = []
for epoch in range(1, NUM_EPOCHS + 1):
    for (beg_i, end_i) in zip(range(0, len(X) - BATCH_SIZE + 1, BATCH_SIZE),
                              range(BATCH_SIZE, len(X), BATCH_SIZE)):
      x = X[beg_i:end_i]
      y = Y[beg_i:end_i]
      
      # TODO: finish the training steps here
      y_ = ...
      loss = ...
      loss...
      opt...
      opt...

      # Smooth the loss value that is saved to be plotted later
      if avg_loss:
        avg_loss = avg_weight * loss.item() + (1 - avg_weight) * avg_loss
      else:
        avg_loss = loss.item()
      losses.append(avg_loss)

    # Shuffle the data in the training batch to regularize training  
    X, Y = shuffle_dataset(X, Y)
    
plt.ylabel('Smoothed loss by factor {:.2f}'.format(1 - avg_weight))
plt.xlabel('Iteration step')
plt.plot(losses)

Now that the linear regressor is trained, we can plot again the predicted values compared to those from the training set.



In [None]:
test_X = torch.FloatTensor(test_X)
y_ = lreg(test_X)
plt.scatter(train_X, train_Y)
plt.scatter(test_X.data.numpy(), y_.data.numpy())
print('w trained value: ', lreg.w.item())
print('b trained value: ', lreg.b.item())

### Exercise 3:

Write the code to predict the values at $x=[0.5, 5, 8.75]$ and plot them overlayed with the previous plot.

In [None]:
# TODO: build the code to predict the y values corresponding to the above vector
# now that lreg is trained and plot them overlayed on the plot with scale 200 in 
# scatter plot (check the doc on scatter function https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html)
x = ...
y = ...

print(y.data.numpy())

plt.scatter(train_X, train_Y)
plt.scatter(test_X.data.numpy(), y_.data.numpy())
# This is the scatter you have to do with your 3 (x, y) points
plt.scatter(x.data.numpy(), y.data.numpy(), s=200)

### Grand Finale, the Logistic regression

The **Logistic regression** follows the same principle than linear regression as a linear model. The difference is that we use it to **establish a boundary between two classes**. Hence, it can be used to determine if for an input, we have either one outcome or the other one, so used in binary classification problems. 

Down bellow we generate a toy dataset with two bi-dimensional Gaussian distributions. Each one will belong to a different class (0 or 1). You have to build a Logistic regressor, following the scheme above but using the proper output activation function (it's non-linear, contrary to the `LinearRegressor`) and the proper loss function (NOPE, MSELoss IS NOT VALID). In this classification problem with two classes, we will use the binary cross-entropy loss (introduced in [these slides](https://github.com/telecombcn-dl/dlai-2019/raw/master/slides/dlai_2019_d04l1_losses.pdf)).

In [None]:
# We first generate some training data points
NUM_SAMPLES = 1000

class_0 = np.random.randn(NUM_SAMPLES, 2)
class_1 = np.random.randn(NUM_SAMPLES,2 ) + 2.5

train_X = np.concatenate((class_0, class_1), axis=0)
train_Y = np.concatenate((np.zeros((NUM_SAMPLES,)), np.ones((NUM_SAMPLES,))), axis=0)


_ = plt.scatter(class_0[:, 0], class_0[:, 1], alpha=0.15)
_ = plt.scatter(class_1[:, 0], class_1[:, 1], alpha=0.15)
_ = plt.scatter([0], [0], s=200, color='blue')
_ = plt.scatter([2.5], [2.5], s=200, color='red')

In the image above, two clouds of points are shown. The blue one corresponds to the zero-class points, whereas the orange ones belong to the one-class. The big blue dot and the big red dot are their respective centroids.

**Exercise 4: Define the LogisticRegression Module below. We have 2-D data points now, so use the `nn.Linear` layer to do the linear operation (https://pytorch.org/docs/stable/nn.html#torch.nn.Linear).**

In [None]:
class LogisticRegression(nn.Module):
  
  def __init__(self):
    super().__init__()
    # NOTE: BEWARE WITH THE NUMBER OF FEATURES IN THE INPUT!
    
    # TODO: Linear projection
    # https://pytorch.org/docs/stable/nn.html#torch.nn.Linear
    self.proj = nn...

    # TODO: Sigmoid activation
    self.act = nn...
    
  def forward(self, x):

    # TODO:Combine the linear layer with the sigmoid activation    
    y = ...
    
    return y

**Exercise 4.2: Complete the training code based on the code for the linear regression. Use the following parameters to have a quick and effective convergence:**

* **NUM_EPOCHS=200**
* **BATCH_SIZE=15**
* **LR=1e-1 (with SGD)**


In [None]:
import torch.nn.functional as F
import torch.optim as optim

# Create an instance of a logistic regressor
loreg = LogisticRegression()

# Fit all training data
X = torch.FloatTensor(train_X).view(-1, 2)
Y = torch.FloatTensor(train_Y).view(-1, 1)
NUM_EPOCHS = 100
BATCH_SIZE=15
LR = 1e-1

# define binary cross entropy as the cost function
cost = F.binary_cross_entropy

# TODO: Complete the rest of the training code to perform the logistic regression
# You can use as a reference the one previously used for linear regression
# WARNING: If you copy & paste, update the reference of the optimizer to the
# logistic regressor (loreg) instead of the linear one (lreg).
...
    
# Plot the trainig curve of the loss function    
plt.ylabel('Smoothed loss by factor {:.2f}'.format(1 - avg_weight))
plt.xlabel('Iteration step')
plt.plot(losses)

In the following, the `make_logistic_surface` function is provided. We will pass in a random logistic regression model and your trained logistic regression model. If all works fine, you should see a good fit for your logistic probability surface over the 2 centroid classes: zero-class should have probability close to zero (floor), and one-class should have probability close to one (elevated).


In [None]:
from matplotlib import cm
%matplotlib inline
from mpl_toolkits.mplot3d import Axes3D


def make_logistic_surface(logistic):
  fig = plt.figure()
  ax = Axes3D(fig)
  ax.set_title('Logistic regression probability surface. Blue dot: zero-class centroid. Orange dot: one-class centroid.')
  X = []
  y_coords = np.linspace(-4, 6, 100)
  x_coords = np.linspace(-4, 6, 100)
  for n in y_coords:
    for m in x_coords:
      X.append([n, m])
  X = torch.FloatTensor(X)
  Y_ = logistic(X)
  Y_ = Y_.data.numpy()
  sidx = 0
  surface = np.zeros((100, 100))
  xc, yc = np.meshgrid(x_coords, y_coords)
  for n in range(100):
    for m in range(100):
      surface[n, m] = Y_[sidx]
      sidx += 1

  surf = ax.plot_surface(xc, yc, surface, cmap=cm.coolwarm,
                         linewidth=0, antialiased=True,
                         alpha=0.5)
  _ = ax.scatter([0], [0], [1], s=200)
  _ = ax.plot([0, 1e-3], [0, 1e-3], [0, 1], linewidth=3, color='blue')
  _ = ax.scatter([2.5], [2.5], [1], s=200)
  _ = ax.plot([2.5, 2.5 + 1e-3], [2.5, 2.5 + 1e-3], [0, 1], linewidth=3, color='red')

**Random logistic regression**

We use a random seed that we know can give a very bad initialization completely giving zeros to class 1, and vice-versa with class 0.

**Advice:** play a bit with the seed value and re-run the function call to see how the surface moves randomly

In [None]:
_ = torch.manual_seed(5)
loreg_nt = LogisticRegression()
make_logistic_surface(loreg_nt)

**Trained logistic regression**

Now we plot the resulting probability weights we learned.

In [None]:
make_logistic_surface(loreg)