# Assignment 2.

## Student Name/ID: Yuki Tome/m5271046

---

## Create MLP using PyTorch `nn` Modules

The aim of this assignment is to create similar structure as in exercise 2 using PyTorch library

In [1]:
import torch
import numpy as np
import h5py

%matplotlib inline

Define default data type and device for tensors.

In [2]:
torch.manual_seed(2)  # we set up a seed so that your output matches ours although the initialization is random.
dtype = torch.float

Helper functions for data loading. 

In [3]:
def load_dataset():
    train_dataset = h5py.File('/datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labels

    test_dataset = h5py.File('/datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labels

    classes = np.array(test_dataset["list_classes"][:])  # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

In this assignment, we will use the same image dataset that has two classes cat and non-cat. Every image is represented with numpy array of shape \[num_pixels, num_pixels, 3\] (3 is number of RGB channels). 

In [4]:
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()

# Figuring out the dimensions and shapes of the problem
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]

print("Number of training examples: m_train = " + str(m_train))
print("Number of testing examples: m_test = " + str(m_test))
print("Height/Width of each image: num_px = " + str(num_px))
print("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print("train_set_x shape: " + str(train_set_x_orig.shape))
print("train_set_y shape: " + str(train_set_y_orig.shape))
print("test_set_x shape: " + str(test_set_x_orig.shape))
print("test_set_y shape: " + str(test_set_y_orig.shape))

Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)


Now you need to reshape these images in a numpy-array of shape (num_samples, num_pixels $*$ num_pixels $*$ 3) and labels in a numpy array of shape (num_samples, num_labels).

In [5]:
# Reshape the training and test examples
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1) 

# Reshape labels dataset
train_set_y = train_set_y_orig.T
test_set_y = test_set_y_orig.T

# "Standardize" the data
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

Then you can create PyTorch Tensors to use it further to train and test our PyTorch model (same as in part 2).

**Hint**
Here we use previously defined `dtype`. And we set `requires_grad` parameter to False since it's only data holder.   

In [6]:
X_train = torch.tensor(train_set_x, dtype=dtype, requires_grad=False)
Y_train = torch.tensor(train_set_y, dtype=dtype, requires_grad=False)
X_test = torch.tensor(test_set_x, dtype=dtype, requires_grad=False)
Y_test = torch.tensor(test_set_y, dtype=dtype, requires_grad=False)

In this assignment you will follow similar methodology as in parts 1 and 2.

**The general methodology to build a Neural Network using PyTorch is to:**
1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    - Obtain input and output layer sizes
    - **Problem 1:** Define type of layers and activations you want to use in your model, which will be used for
        forward propagation.
2. Initialize the model's parameters
3. **Problem 2:** Define loss function
4. Training loop:
    - Implement backward propagation to get the gradients
    - Update parameters (gradient descent)

And finally implement `train_model()` by using previous functions in the right order. This will be your **Problem 3**.

### Step 1.1: Defining sizes of input and output layers ###

First, we define sizes of input and output layers.

In [7]:
def layer_sizes(X, Y):
    """
    Arguments:
    X -- input tensor of shape (input size, number of examples)
    Y -- labels tensor of shape (output size, number of examples)

    Returns:
    n_x -- the size of the input layer
    n_y -- the size of the output layer
    """
    n_x = X.size(1)
    n_y = Y.size(1)

    return n_x, n_y

---

## Problem 1 ##
### Step 1.2: Defining structure of the neural network ###

Implement function `create_model`. It should have the same functionality as `forward_propagation` function in parts 1 and 2. Your model should consists of:
    - one hidden layer with tanh activation function,
    - one output layer with sigmoid activation function

**Tips**

You can define your model as a sequence of Modules (layers, activations, etc.) using [sequential module](https://pytorch.org/docs/master/nn.html#torch.nn.Sequential), which contains other Modules and applies them in sequence to produce its output.

In parts 1 and 2 of this exercise, we used Dense or [Linear](https://pytorch.org/docs/master/nn.html#linear-layers) layers as hidden and output. You can check in [documentation](https://pytorch.org/docs/master/nn.html#non-linear-activations-weighted-sum-nonlinearity) which activation functions are available in PyTorch library.

In [8]:
def create_model(n_x, n_h, n_y):
    """
    Arguments:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    Returns:
    model -- model consisting of one hidden layer and one output layer
    """

    # Your code here !!!
    model = torch.nn.Sequential(
        torch.nn.Linear(n_x, n_h),
        torch.nn.Tanh(),
        torch.nn.Linear(n_h, n_y),
        torch.nn.Sigmoid()
    )

    return model

In [9]:
# Check implementation
model_tmp = create_model(10, 5, 1)
print('Model structure is:')
print(model_tmp)
# Expected output: Sequential( (0):..., (1):..., (2):..., (3):...)

Model structure is:
Sequential(
  (0): Linear(in_features=10, out_features=5, bias=True)
  (1): Tanh()
  (2): Linear(in_features=5, out_features=1, bias=True)
  (3): Sigmoid()
)


---

### Step2: Initialize parameters of the model ###
When you create a layer using PyTorch Module, weights and biases are created and initialized automatically. However, default initialization algorithm might be not optimal for your task. 
1. Initialize weight matrices using orthogonal initialization scheme from [torch.nn.init](https://pytorch.org/docs/master/nn.html#torch-nn-init).
2. Initialize biases with zeros.
    
**Hint:** You can use `model.named_parameters()` function to access your model parameters and their respective names, so that you can distinguish weights and biases.

In [10]:
def initialize_parameters(model):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    """
    for name, param in model.named_parameters():
        if name.find('weight') != -1:
            torch.nn.init.orthogonal_(param)
        elif name.find('bias') != -1:
            torch.nn.init.constant_(param, 0)


In [11]:
initialize_parameters(model_tmp)

for name, param in model_tmp.named_parameters():
    print(name, param.max().item())

0.weight 0.768763542175293
0.bias 0.0
2.weight 0.6325703859329224
2.bias 0.0


___

## Problem 2 ##

### Step 3: Define loss function ###

Here, you need to declare loss function, that will be used later in `train_model()`. You should also use binary cross-entropy as in parts 1 and 2, but you should define it using built-in PyTorch loss function.

**Tips** Check in documentation, which [loss functions](https://pytorch.org/docs/master/nn.html#loss-functions) are available.

In [12]:
def create_cost_function():
    """
    Returns:
    loss - binary cross-entropy loss
    """
    # Your code here !!!
    return torch.nn.BCELoss()

---

### **Step 4.1** ###

Same as in part 2, we will use automatic differentiation to automate the computation of backward passes in neural networks. So you will define it in your training loop.

The `autograd` package in PyTorch provides exactly this functionality. When using `autograd`, the forward pass of your network will define a computational graph. Backpropagating through this graph then allows you to easily compute gradients. See documentation for more details: https://pytorch.org/docs/master/autograd.html


### Step 4.2 ###

Implement the manual update rule using gradient descent. 

**General gradient descent rule**: 

$ \theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter.

**Hint**

  You can access gradients using `parameter.grad`.


In [13]:
def update_parameters(model, learning_rate):
    """
    Updates parameters using the gradient descent update rule given above

    Arguments:
    model -- python dictionary containing your parameters 
    """
    # Since we use manual update of parameters, we need to wrap in torch.no_grad()
    # because all parameters have requires_grad=True, but we don't need to track this.

    with torch.no_grad():
        for param in model.parameters():
            # Your code here !!!
            param -= learning_rate*param.grad


### Predictions

Use model created by `create_model()` to predict by building `predict()` function. 

As you can recall from step 1.2, your model is a sequential Module. Module objects override the `__call__` operator so you can call them like functions. When doing so you pass a Tensor of input data to the Module and it produces a Tensor of output data.

**Hint**: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise}
    \end{cases}$  
    
As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: ```X_new = (X > threshold)```. Remember that it will return `LongTensor`, so you need to cast it back to `FloatTensor`.

In [14]:
def predict(model, X):
    """
    Using network output, predicts a class for each example in X

    Arguments:
    model -- trained model
    X -- network output data of size (m, 1)

    Returns
    predictions -- vector of predictions of our model (non-cat: 0 / cat: 1)
    """

    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    Y_predicted = model(X)
    predictions = (Y_predicted > 0.5).float()

    return predictions

***

## Problem 3 ##

### Integrate all steps to train your model ###

Implement full training process of your neural network model in `train_model()` function.

**Instructions**: The neural network model has to use previous functions in the right order.

**Hint:** Check out [Module functions](https://pytorch.org/docs/master/nn.html#torch.nn.Module) to find one for zeroing gradients.

In [15]:
def train_model(X_train, Y_train, X_test, Y_test, n_h, num_iterations=10000,
                learning_rate=0.5, print_cost=False):
    """
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    n_h -- size of the hidden layer
    num_iterations -- number of iterations in gradient descent loop
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- if True, print the cost every 200 iterations

    Returns:
    d -- dictionary containing information about the model.
    """
    n_x, n_y = layer_sizes(X_train, Y_train)

    # Replace None with your own code !!!

    # Create model
    model = create_model(n_x, n_h, n_y)

    # Initialize parameters
    parameters = initialize_parameters(model)

    # Cost function
    cost_fn = create_cost_function()

    # Loop (gradient descent)
    for i in range(0, num_iterations):

        # Forward propagation: compute model outputs by passing input data to the model.
        A2 = model(X_train)

        # Cost function. Inputs: predicted and true values. Outputs: "cost".
        cost = cost_fn(A2,Y_train)

        # Print the cost every 100 iterations
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost.item()))

        # Zero the gradients before running the backward pass. See hint in problem description
        model.zero_grad()

        # Backpropagation. Compute gradient of the cost function with respect to all the 
        # learnable parameters of the model. Use autograd to compute the backward pass.
        # Same as in part 2 of Exercise 2.
        cost.backward()

        # Gradient descent parameter update.
        parameters = update_parameters(model, learning_rate)

    d = {"model": model,
         "learning_rate": learning_rate,
         "num_iterations": num_iterations}

    return d

In [16]:
d = train_model(X_train, Y_train, X_test, Y_test, 15,
                num_iterations=2000, learning_rate=0.05, print_cost=True)

# Predict test/train set examples
Y_prediction_test = predict(d["model"], X_test)
Y_prediction_train = predict(d["model"], X_train)
# Print train/test Errors
print("train accuracy: {} %".format(100 - torch.mean(torch.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - torch.mean(torch.abs(Y_prediction_test - Y_test)) * 100))

Cost after iteration 0: 0.775971
Cost after iteration 100: 0.523547
Cost after iteration 200: 0.496528
Cost after iteration 300: 0.455826
Cost after iteration 400: 0.493387
Cost after iteration 500: 0.252120
Cost after iteration 600: 0.176568
Cost after iteration 700: 0.198569
Cost after iteration 800: 0.075340
Cost after iteration 900: 0.048445
Cost after iteration 1000: 0.032887
Cost after iteration 1100: 0.025090
Cost after iteration 1200: 0.020666
Cost after iteration 1300: 0.017462
Cost after iteration 1400: 0.015119
Cost after iteration 1500: 0.012856
Cost after iteration 1600: 0.011347
Cost after iteration 1700: 0.010207
Cost after iteration 1800: 0.009280
Cost after iteration 1900: 0.008502
train accuracy: 100.0 %
test accuracy: 68.0 %


___

# Optional #

1. Implement automatic update of model parameters using [torch.optim](https://pytorch.org/docs/master/optim.html) package.
2. Add layers to your model and change activation functions.
3. Add regularization.