# **E1 - Neural Networks with PyTorch**

## **Tensors**

PyTorch is an open-source deep learning framework developed by Facebook. It is primarily used for building and training neural networks and is known for its ease of use, flexibility, and dynamic computational graph system.

PyTorch is a complete library that has the capability to train a deep learning model as well as run a model in inference mode, and supports using GPU for faster training and inference. It is a platform that we cannot ignore.

### Task 0: Make sure you have PyTorch correctly installed

In [None]:
# Import libraries
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim

In [None]:
print(torch.__version__)

### Task 1: Create Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.
Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators.

In [None]:
# Initialize Tensors
# from data:
data = [[5, 1, 7, 9],[4, 12, 19, 0]]
x_data = torch.tensor(data)

# from a numpy array:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

# from another tensor:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data

# with random or constant values:
shape = (4,3,)    # shape is a tuple of tensor dimensions
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

**Questions:**
1. Print all tensors to explore different initialization possibilities
2. What are the attributes of a tensor? *(Hint: print the shape and the data type of one of the tensors)*

**Solution**

In [None]:
#Solution Question 1:


In [None]:
#Solution Question 2:


Additional attribute of each tensor is the device on which it is stored.

In [None]:
print(f"Tensor is stored on: {rand_tensor.device}")

In [None]:
# There are two ways to specify the device of the tensor
# You can switch the device of an already existing tensor by using the .to("[cpu|cuda]") command
if torch.cuda.is_available():
    rand_tensor = rand_tensor.to("cuda")
    print(f"Tensor is stored now on: {rand_tensor.device}")

# Or specify the device while creating the tensor
if torch.cuda.is_available():
    gpu_tensor = torch.tensor([3,4], device='cuda')
    print(f"New tensor is stored on: {gpu_tensor.device}")

## Task 2: Operations on Tensors

In [None]:
# Indexing and Slicing
rand_tensor = torch.rand(5,5)
print(rand_tensor)

**Questions:**
1. Print first row or column of rand_tensor. Print last row or column. Change every value in middle column to be equal to 0.

In [None]:
# Joining tensors
tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor_b = torch.tensor([[7, 8, 9], [10, 11, 12]])

# You can join tensors with torch.cat
join_cat = torch.cat((tensor_a, tensor_b), dim=0)
# or with torch.stack
join_stack = torch.stack((tensor_a, tensor_b), dim=0)

print(f"Concatenated tensor: \n {join_cat} \n")
print(f"Stacked tensor: \n {join_stack} \n")

2. What is the difference between torch.cat and torch.stack?

In [None]:
# Arithmetic operations

tensor = torch.rand(3,4)
tensor_trans = tensor.T
print(f"Tensor: \n {tensor} \n")
print(f"Transpose tensor: \n {tensor_trans} \n")

# Matrix multiplication between two tensors
product1 = tensor @ tensor_trans
print(f"Product 1: \n {product1} \n")
product2 = tensor.matmul(tensor_trans)
print(f"Product 2: \n {product2} \n")
product3 = torch.rand_like(product1)
torch.matmul(tensor, tensor_trans, out=product3)

# Element-wise product
el_product1 = tensor * tensor
print(f"Element product 1: \n {el_product1} \n")
el_product2 = tensor.mul(tensor)
print(f"Element product 2: \n {el_product2} \n")
el_product3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=el_product3)

**Solutions**

In [None]:
#Solution for Question 1


## **Neural Networks**

### Task 3: Loading Dataset

Download dataset from Opal and save it to the same folder as the notebook.

Dataset contains information about Breast Cancer. (ref.: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic)

Attributes are: ID, diagnosis, 30 real-valued input features. The diagnosis can be M = malignant or B = benign.
- ignore column 1 = ID
- set the target to column 2 = diagnosis

In [None]:
dataset = np.loadtxt('wdbc.data', delimiter=',', dtype=str) #change name to path of the file, if needed
X = dataset[:, 2:].astype(np.float32)
y = dataset[:, 1]
y = np.where(y == 'M', 1, 0).astype(np.float32) # Convert diagnosis (M/B) to numerical labels (e.g., M -> 1, B -> 0)

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

### Task 3: Build A Multilayer Perceptron Model

The network model is just a few layers of fully-connected perceptrons.
In this particular model, the dataset has 12 inputs or predictors and the output is a single value of 0 or 1. Therefore, the network model should have 12 inputs (at the first layer) and 1 output (at the last layer). This is a network with 3 fully-connected layers. Each layer is created in PyTorch using the nn.Linear(x, y) syntax which the first argument is the number of input to the layer and the second is the number of output. Between each layer, a rectified linear activation is used, but at the output, sigmoid activation is applied such that the output value is between 0 and 1. This is a typical network. A deep learning model is to have a lot of such layers in a model.

In [None]:
model = nn.Sequential(
    nn.Linear(30, 12),  # 30 input features, 12 hidden units
    nn.ReLU(),
    nn.Linear(12, 8),   # 12 input to 8 hidden units
    nn.ReLU(), 
    nn.Linear(8, 1),    # 8 input units to 1 output (binary classification)
    nn.Sigmoid()
)
print(model)

**Questions:**
1. Try to add another layer that outputs 20 values after the first Linear layer above. What should you change?

**Solution**

### Task 4: Train a PyTorch Model

Building a neural network in PyTorch does not tell how you should train the model for a particular job. In fact, there are many variations in this aspect as described by the hyperparameters. In PyTorch, or all deep learning models in general, you need to decide the following on how to train a model:

- What is the dataset, specifically how the input and target looks like
- What is the loss function to evaluate the goodness of fit of the model to the data
- What is the optimization algorithm to train the model, and the parameters to the optimization algorithm such as learning rate and number of iterations to train

Since it is a binary classification problem, the loss function should be binary cross entropy. It means that the target of the model output is 0 or 1 for the classification result. But in reality the model may output anything in between. The closer it is to the target value, the better (i.e., lower loss).

Gradient descent is the algorithm to optimize neural networks. There are many variations of gradient descent and Adam is one of the most used.

Implementing all the above, the following is the code of the training process:


In [None]:
loss_fn = nn.BCELoss() # binary cross-entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)
 
n_epochs = 100
batch_size = 10
for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')

The for-loop above is to get a batch of data and feed into the model. Then observe the model’s output and calculate the loss function. Based on the loss function, the optimizer will fine-tune the model for one step, so it can match better to the training data. After a number of update steps, the model should be close enough to the training data that it can predict the target at a high accuracy.

*Play around with the values of the number of epochs and the batch size*

**Questions:**
1. How does the loss change with different number of epochs *(e.g., 10, 50, 100, 200)*
2. What is the influence of the batch size? What happens if you remove it?

**Solutions**

### Task 5: Test the Model

Some model will behave differently between training and inference.

In [None]:
i = 5
X_sample = X[i:i+1]
model.eval()    #signal the model that the intention is to run the model for inference
with torch.no_grad():    #create context for the model, that running the gradient is not required --> consumes less resources
    y_pred = model(X_sample)
print(f"{X_sample[0]} -> {y_pred[0]}")

Evaluating the model: the model outputs a sigmoid value, which is between 0 and 1. You can interpret the value by rounding off the value to the closest integer (i.e., Boolean label). Comparing how often the prediction after round off match the target, you can assign an accuracy percentage to the model, as follows:

In [None]:
with torch.no_grad():
    y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

**Questions:**
1. What is your accuracy?
2. Does it change if you change the number of epochs during the training?

### Bonus Task: Build hand written digit recogniser

-- Look at bonus notebook from crash course