## PyTorch - Linear Regression

In [1]:
import torch
import numpy as np
import sys

In [2]:
#We can check whether we have gpu
device = torch.device("cuda:0" if (torch.cuda.is_available()) else "cpu")
print("Device: ", device)

Device:  cuda:0


Let's have linear regression as a case study to study the different components of PyTorch. These are the following components we will be covering:

1. Specifying input and target
2. Dataset and DataLoader
3. `nn.Linear` (Dense)
4. Define loss function
5. Define optimizer function
6. Train the model

### 1. Specifiying input and target

In [4]:
# Input (temp, rainfall, humidity)
X_train = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58], 
                   [102, 43, 37], [69, 96, 70], [73, 67, 43], 
                   [91, 88, 64], [87, 134, 58], [102, 43, 37], 
                   [69, 96, 70], [73, 67, 43], [91, 88, 64], 
                   [87, 134, 58], [102, 43, 37], [69, 96, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
Y_train = np.array([[56, 70], [81, 101], [119, 133], 
                    [22, 37], [103, 119], [56, 70], 
                    [81, 101], [119, 133], [22, 37], 
                    [103, 119], [56, 70], [81, 101], 
                    [119, 133], [22, 37], [103, 119]], 
                   dtype='float32')

# tensors from these to numpy array
# torch.form_numpy (copy) or torch.tensor (not a copy!)
# inputs = torch.tensor(X_train)
# targets = torch.tensor(Y_train)
inputs = torch.from_numpy(X_train)
targets = torch.from_numpy(Y_train)

# print the shape of these tensors
# use either .size() or .shape
inputs.shape, targets.shape

(torch.Size([15, 3]), torch.Size([15, 2]))

### 2. Dataset and DataLoader

We'll create a `TensorDataset`, which allows access to rows from inputs and targets as tuples, and if we want to use `DataLoader` (will talk shortly) from numpy array, we have to first make `TensorDataset`.

In [5]:
from torch.utils.data import TensorDataset

In [7]:
# put this dataset on top of our inputs and targets
# format: TensorDataset(X, y) where X.shape is (m, n) and y.shape is (m, k)
ds = TensorDataset(inputs, targets)
ds[0] # this is a tuple of two tensors, the x and the corresponding y 
# this IS THE FORMAT that PyTorch wants!!!

(tensor([73., 67., 43.]), tensor([56., 70.]))

In [10]:
#The data loader is typically used in a for-in loop. Let's look at an example
for i in ds:
    print(i)
    break

(tensor([73., 67., 43.]), tensor([56., 70.]))


### DataLoader

By default, PyTorch works in batch (remember the mini-batch gradient descent!).
In simple words, it will ALWAYS take some mini-batch, and perform gradient descent.
Why PyTorch assume mini-batch; because PyTorch assumes you won't be able to fit in ~1M samples into your GPU ram ...

We'll now create a `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [12]:
# this dataloader will automatically create an enumerator, look at each batch
# means, you can simply perform a for loop onto this dataloader
# if you DON'T WANT TO use this DataLoader, it's fine! but you have
# to manually select the mini-batch (just like we do in our LR mini-batch class)
from torch.utils.data import DataLoader

In [13]:
# Define data loader
batch_size = 3 # this is any number you like
#too small then your code runs slow
#too big then you may get "out of memory" error
dl = DataLoader(ds, batch_size, shuffle=True)

In [14]:
# now, this dl is basically an enumerator, in which we can loop on ....
for something in dl:
    print(something)
    break

[tensor([[ 69.,  96.,  70.],
        [ 87., 134.,  58.],
        [ 91.,  88.,  64.]]), tensor([[103., 119.],
        [119., 133.],
        [ 81., 101.]])]


In [16]:
for x, y in dl:
    print(f"X: {x}")
    print(f"Y: {y}")
    break

#this dl keep on running; which is intentional; because  we have the concept of "epochs"
#"epochs" means that how many times we "exhaust" the whole dataset

X: tensor([[ 69.,  96.,  70.],
        [ 73.,  67.,  43.],
        [102.,  43.,  37.]])
Y: tensor([[103., 119.],
        [ 56.,  70.],
        [ 22.,  37.]])


## 3. Modeling

### 3.1 Define our neural network

Instead of initializing the weights & biases manually, we can define the model using the `nn.Linear` class from PyTorch, which does it automatically.

In [18]:
import torch.nn as nn   # stands for neural network; modules that contains many possible layer

# Define model
# format: nn.Linear(in_features, out_features)
# format: nn.Linear(temp;rainfall;hum , orange;apples)
model = nn.Linear(3, 2)  #nn.Linear assume this shape (in_features, out_features)
print(model.weight)  #by default, these weight are uniformly close to 0
print(model.weight.size()) # (out_features, in_features)
print(model.bias)
print(model.bias.size()) #(out_features)

Parameter containing:
tensor([[-0.3479, -0.0461,  0.2091],
        [ 0.2365,  0.2009,  0.4819]], requires_grad=True)
torch.Size([2, 3])
Parameter containing:
tensor([ 0.0682, -0.2729], requires_grad=True)
torch.Size([2])


In [28]:
# Parameters
list(model.parameters())  #model.param returns a generator # weight and bias

[Parameter containing:
 tensor([[-0.4408,  0.8783,  0.7804],
         [-0.2983,  0.7925,  0.9063]], requires_grad=True),
 Parameter containing:
 tensor([ 0.0636, -0.2776], requires_grad=True)]

In [30]:
#we can print the complexity by the number of parameters
#p.numel() just flatten everything...
print(sum(p.numel() for p in model.parameters() if p.requires_grad))
# 6 weights and 2 bias

8


In [32]:
# Generate predictions, perform a forward pass
# format: model(inputs)
preds = model(inputs)
preds

tensor([[ 60.2941,  70.0181],
        [ 87.1939, 100.3242],
        [124.6786, 132.5363],
        [ 21.7492,  36.9094],
        [108.5999, 118.6643],
        [ 60.2941,  70.0181],
        [ 87.1939, 100.3242],
        [124.6786, 132.5363],
        [ 21.7492,  36.9094],
        [108.5999, 118.6643],
        [ 60.2941,  70.0181],
        [ 87.1939, 100.3242],
        [124.6786, 132.5363],
        [ 21.7492,  36.9094],
        [108.5999, 118.6643]], grad_fn=<AddmmBackward0>)

### 3.2 Define the loss function

In [22]:
criterion_mse = nn.MSELoss()
mse = criterion_mse(preds, targets)
print(mse)
print(mse.item())  ##print out the loss number

tensor(6051.5464, grad_fn=<MseLossBackward0>)
6051.54638671875


### 3.3 Define the optimizer

* Gradient Descent

We use `optim.SGD` to perform stochastic gradient descent where samples are selected in batches (often with random shuffling) instead of as a single group. Note that `model.parameters()` is passed as an argument to `optim.SGD`.

In [23]:
# Define optimizer
#momentum update the weight based on past gradients also, which will be useful for getting out of local max/min
#If our momentum parameter was $0.9$, we would get our current grad + the multiplication of the gradient 
#from one time step ago by $0.9$, the one from two time steps ago by $0.9^2 = 0.81$, etc.
opt = torch.optim.SGD(model.parameters(), lr=0.0001, momentum=0.9) 

### 3.4 Actually train the model

- 1. Predict
- 2. Loss
- 3. Gradient
- 4. Update the weights

In [33]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            #x and y are the minibatch of X_train and y_train (batch size=3)
            #x and y will have 3 samples each, but the number of features are the same!!
            
            xb.to(device) #move them to gpu if possible, if not, it will be cpu  (batch, features) = (3, 3)
            yb.to(device)                    # (batch, target) = (3, 2)
                    
            # 1. Predict (forward pass)
            pred = model(xb)
                      
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Calculate gradient
            # 3.1 clear out the previous gradients
            # format: optimizer.zero_grad()
            opt.zero_grad()  #if not, the gradients will accumulate
            
            # 3.2 call backward() on loss to retrieve all the gradients (backpropagation)
            loss.backward()   #why called backward on loss??
            #backward DOED NOT adjust the weight YET... just backpropagation
            #we want to calculate the gradients of all parameters
            #IN RESPECT TO THELOSS
            
            # Print out the gradients.
            #print ('dL/dw: ', model.weight.grad) 
            #print ('dL/db: ', model.bias.grad)
            
            # 4. Update parameters using gradients
            opt.step()
            
        # Print the progress
        if (epoch+1) % 10 == 0:
            sys.stdout.write("\rEpoch [{}/{}], Loss: {:.4f}".format(epoch+1, num_epochs, loss.item()))

In [26]:
#train for 100 epochs
fit(100, model, criterion_mse, opt, dl)

Epoch [100/100], Loss: 0.1913

In [27]:
# Generate predictions
preds = model(inputs)
loss = criterion_mse(preds, targets)
print(loss.item())

12.126470565795898
