# Homework 5 (Full mark: 100pt)

You can use ``Google Colab`` if you would like to use GPU.
- https://colab.research.google.com/notebooks/welcome.ipynb
- https://theorydb.github.io/dev/2019/08/23/dev-ml-colab/

# 1. Regression (50pt)

**For this question, using PyTorch, implement the 1) ridge regression and 2) Lasso. You can refer to the tutorial link below for how to implement linear regression. Note that the ridge regression is the linear regression with L2 penalty, and the Lasso is the linear regression with L1 penalty. You should use ````Boston```` dataset as shown in the code below. You should not only write the code for the models, but also train them and show the test MSE.**
- https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/linear_regression/main.py

<div>
<img src="figures/regressions.png" width="700"/>
</div>

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import numpy as np
# To fix the random seed
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# load data
boston = pd.read_csv('data/Boston.csv').drop('Unnamed: 0', axis=1)
data = torch.FloatTensor(boston.values)
X = data[:,:-1] # Input (X)
y = data[:,-1].reshape(-1, 1) # Ground Truth (y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [2]:
num_epochs = 100
learning_rate = 0.00003
l2_lambda = 1

class Ridge(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        ## Write your answer here
        self.model = nn.Linear(input_dim, input_dim)
    def forward(self, X):
        ## Write your answer here
        return self.model(X)

# Linear regression model
model = Ridge(X_train.shape[1])

# Loss and optimizer
criterion = nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

# Train the model
for epoch in range(num_epochs):
    l2_norm = sum(p.pow(2.0).sum() for p in model.parameters())
    inputs = X_train
    targets = y_train

    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets) + l2_lambda*l2_norm
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 5 == 0:
        print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
        
print("Test loss: ", float(criterion(model(X_test), y_test)))
        

Epoch [5/100], Loss: 408.6328
Epoch [10/100], Loss: 205.7512
Epoch [15/100], Loss: 141.6315
Epoch [20/100], Loss: 120.5482
Epoch [25/100], Loss: 113.3427
Epoch [30/100], Loss: 110.6189
Epoch [35/100], Loss: 109.3489
Epoch [40/100], Loss: 108.5565
Epoch [45/100], Loss: 107.9269
Epoch [50/100], Loss: 107.3585
Epoch [55/100], Loss: 106.8181
Epoch [60/100], Loss: 106.2949
Epoch [65/100], Loss: 105.7851
Epoch [70/100], Loss: 105.2873
Epoch [75/100], Loss: 104.8007
Epoch [80/100], Loss: 104.3250
Epoch [85/100], Loss: 103.8597
Epoch [90/100], Loss: 103.4045
Epoch [95/100], Loss: 102.9593
Epoch [100/100], Loss: 102.5235
Test loss:  87.6945571899414


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)


In [3]:
num_epochs = 100
learning_rate = 0.000003
l1_lambda = 1
        
class Lasso(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        ## Write your answer here
        self.model = nn.Linear(input_dim, 1)
        
    def forward(self, X):
        ## Write your answer here
        return self.model(X)

model = Lasso(X_train.shape[1])
# Loss and optimizer
criterion = nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

# Train the model
for epoch in range(num_epochs):
    l1_norm = sum(p.abs().sum() for p in model.parameters())
    inputs = X_train
    targets = y_train

    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets) + l1_lambda*l1_norm
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 5 == 0:
        print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
        
print("Test loss: ", float(criterion(model(X_test), y_test)))

Epoch [5/100], Loss: 950.6422
Epoch [10/100], Loss: 305.8843
Epoch [15/100], Loss: 169.9936
Epoch [20/100], Loss: 139.9421
Epoch [25/100], Loss: 132.0015
Epoch [30/100], Loss: 128.7459
Epoch [35/100], Loss: 126.5395
Epoch [40/100], Loss: 124.6196
Epoch [45/100], Loss: 122.8235
Epoch [50/100], Loss: 121.1141
Epoch [55/100], Loss: 119.4806
Epoch [60/100], Loss: 117.9181
Epoch [65/100], Loss: 116.4228
Epoch [70/100], Loss: 114.9914
Epoch [75/100], Loss: 113.6210
Epoch [80/100], Loss: 112.3086
Epoch [85/100], Loss: 111.0515
Epoch [90/100], Loss: 109.8470
Epoch [95/100], Loss: 108.6927
Epoch [100/100], Loss: 107.5862
Test loss:  76.2603530883789


# 2. Autoencoder (50pt)
**Autoencoder is an unsupervised neural network model for learning representations of the input. In the figure below, you can see the structure of an autoencoder network. Given the original input image, we first encode the image using ``Encoder`` to a compressed representation, and reconstruct the image by using ``Decoder`` given the compressed representation. The compressed represesntation can be used as the dimension reduced representation of the original input. In this regard, autoencoder is also known as a model for dimensionality reduction (Recall PCA was also a method for dimensionality reduction). Note that the encoder and the decoder can be any neural network model such as MLP, CNN, MLP, etc.**

**For this question, you will use MNIST dataset to implement two versions of autoencoders: 1) An MLP-based autoencoder, and 2) A CNN-based autoencoder. The figure below shows an example of the MLP-based autoencoder. For the MLP-based autoencoder, you should follow the structure of the autoencoder shown in the figure below. For the CNN-based autoencoder, you are free to choose the architecture. You only need to implement ``Autoencoder_MLP`` class and ``Autoencoder_CNN`` class. Note that for ``Autoencoder_MLP``, you should flatten the original image into a vector, whereas for ``Autoencoder_CNN``, you can use the original image without any modification. After implementing these two classes, save the figures and observe the results. Write a few sentences to describe your findings. You do not need to submit the saved figure and the dataset for your final submission.**

<div>
<img src="figures/autoencoder.png" width="700"/>
</div>

In [6]:
import os

import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import save_image

# Make a directory "saved_img" if it does not exist
if not os.path.exists('./saved_img'):
    os.mkdir('./saved_img')


def to_img(x):
    x = 0.5 * (x + 1)
    x = x.clamp(0, 1)
    x = x.view(x.size(0), 1, 28, 28)
    return x


num_epochs = 100
batch_size = 128
learning_rate = 1e-3

img_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])

dataset = MNIST('./data', transform=img_transform)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)


class Autoencoder_MLP(nn.Module):
    def __init__(self):
        super(Autoencoder_MLP, self).__init__()
        ## Write your answer here
        self.encoder = nn.Sequential(nn.Linear(784,128),
                                     nn.ReLU(),
                                     nn.Linear(128,64),
                                     nn.ReLU(),
                                     nn.Linear(64,12),
                                     nn.ReLU(),
                                     nn.Linear(12,3))
        self.decoder = nn.Sequential(nn.Linear(3,12),
                                     nn.ReLU(),
                                     nn.Linear(12,64),
                                     nn.ReLU(),
                                     nn.Linear(64,128),
                                     nn.ReLU(),
                                     nn.Linear(128,784))

    def forward(self, x):
        ## Write your answer here
        latent = self.encoder(x)
        out = self.decoder(latent)
        return out

class Autoencoder_CNN(nn.Module):
    def __init__(self):
        super(Autoencoder_CNN, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 8, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(8, 16, 3, stride=2, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=0),
            nn.ReLU(),
            nn.Flatten(start_dim=1),
            nn.Linear(3 * 3 * 32, 128),
            nn.ReLU(),
            nn.Linear(128, 4)
        )
        self.decoder = nn.Sequential(
            nn.Linear(4, 128),
            nn.ReLU(),
            nn.Linear(128, 3 * 3 * 32),
            nn.ReLU(),
            nn.Unflatten(dim=1, unflattened_size=(32, 3, 3)),
            nn.ConvTranspose2d(32, 16, 3, stride=2, output_padding=0),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 8, 3, stride=2, 
            padding=1, output_padding=1),
            nn.BatchNorm2d(8),
            nn.ReLU(),
            nn.ConvTranspose2d(8, 1, 3, stride=2, padding=1, output_padding=1)
        )
        
    def forward(self, x):
        latent = self.encoder(x)
        out = self.decoder(latent)
        return out

# Uncomment below correspondingly

#model = Autoencoder_MLP()
model = Autoencoder_CNN()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)

for epoch in range(num_epochs):
    for data in dataloader:
        img, _ = data
        
        # Flatten the image for Autoencoder_MLP (You can remove this line for Autoencoder_CNN)
        #img = img.view(img.size(0), -1)
        
        # forward pass
        output = model(img)
        loss = criterion(output, img)
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Print log and save images
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss.item()))
    if epoch % 10 == 0:
        pic = to_img(output.cpu().data)
        save_image(pic, './saved_img/cnn_image_{}.png'.format(epoch))

epoch [1/100], loss:0.1694
epoch [2/100], loss:0.1441
epoch [3/100], loss:0.1350
epoch [4/100], loss:0.1269
epoch [5/100], loss:0.1250
epoch [6/100], loss:0.1211
epoch [7/100], loss:0.1239
epoch [8/100], loss:0.1241
epoch [9/100], loss:0.1203
epoch [10/100], loss:0.1139
epoch [11/100], loss:0.1216
epoch [12/100], loss:0.1027
epoch [13/100], loss:0.1156
epoch [14/100], loss:0.1162
epoch [15/100], loss:0.1086
epoch [16/100], loss:0.1079
epoch [17/100], loss:0.1083
epoch [18/100], loss:0.1053
epoch [19/100], loss:0.1129
epoch [20/100], loss:0.1071
epoch [21/100], loss:0.1071
epoch [22/100], loss:0.1098
epoch [23/100], loss:0.1007
epoch [24/100], loss:0.1052
epoch [25/100], loss:0.1005
epoch [26/100], loss:0.1018
epoch [27/100], loss:0.1064
epoch [28/100], loss:0.1040
epoch [29/100], loss:0.1027
epoch [30/100], loss:0.1069
epoch [31/100], loss:0.1048
epoch [32/100], loss:0.1021
epoch [33/100], loss:0.0966
epoch [34/100], loss:0.1066
epoch [35/100], loss:0.1001
epoch [36/100], loss:0.0999
e

<table><tr>
<td> <img src="figures/cnn_image_90.png" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="figures/mlp_image_50.png" alt="Drawing" style="width: 250px;"/> </td>
</tr></table>
Left: CNN best epoch output, right: MLP best epoch output

As we can see, the CNN output is much clearer than MLP output. In CNN output, the digits are very clear, with practically no noise, whereas in MLP output image, the digits are "washed out" and there are sometimes "ghost" of another digit along with the actual digit. We can probably improve the MLP output by changing the architecture (changing the dimensions of the layers or the activation functions) or by hyperparameter (learning rate) tuning, but by design, CNN will most likely perform as it, by definition, can learn the spatial structure between the nearby pixels, which improves its denoising capabilities and the images are clearer and more accurate than those denoised by MLP