# Notebook for recreation of results from 3 different model types
This notebook is used to recreate our results for our best models for each of the different model types used in the project.

We assess the performance on the same test dataset for all models, namely the gtex dataset which only includes Artery tissue types.




OBS.
* The model checkpoints are placed on a blackhole storage `/dtu/blackhole/0b/155947/models/` on the DTU HPC infrastructure, which is open till the end of january 2024.


In [1]:
import pickle
import torch
from torch.utils.data import DataLoader
import numpy as np
from tqdm import tqdm
import IsoDatasets
from VAE2 import VAE_lf
from FFNN import FeedForwardIsoform_small, FeedForwardIsoform_XL

### Initialize common functionalities

In [2]:
# Setup dataset
gtex_test = IsoDatasets.GtexDataset("/dtu-compute/datasets/iso_02456/hdf5-row-sorted/", include='Artery')

# Check gpu availability
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f">> Using device: {device}")

# Setup MSE loss
criterion = torch.nn.MSELoss()

>> Using device: cpu


### Standalone DNN performance
Here we load in a checkpoint for out best performing standalone DNN and checks the performance on the artery test dataset

In [3]:
# Init
STANDALONE_DNN_MODEL_PATH = f"/dtu/blackhole/0b/155947/models/Best_STANDALONE_DENSE"
gtx_test_dataloader = DataLoader(gtex_test, batch_size=10, shuffle=True)

# Grab a sample to initialize output size for DNN class
gene_expr, isoform_expr, _ = next(iter(gtx_test_dataloader))

# DNN model
dnn = FeedForwardIsoform_XL(input_shape = gene_expr[0].size(), 
                            output_shape = isoform_expr[0].size())
checkpoint = torch.load(STANDALONE_DNN_MODEL_PATH, map_location=torch.device('cpu'))
dnn.load_state_dict(checkpoint['model_state_dict'])

dnn = dnn.to(device)

In [4]:
dnn.eval()
test_loss = []
for x, y, _ in tqdm(gtx_test_dataloader):
    x = x.to(device)
    y = y.to(device)

    # Run through network
    x = dnn.forward(x)

    loss = criterion(x, y)

    test_loss.append(loss.item())

100%|██████████| 134/134 [02:15<00:00,  1.01s/it]


In [5]:
mean_test_loss_standaloneDNN = np.mean(test_loss)
print('Mean test loss of standalone DNN is:', mean_test_loss_standaloneDNN)

Mean test loss of standalone DNN is: 0.19647285976071857


### PCA-DNN performance
Here we load in a checkpoint for out best performing PCA-DNN and checks the performance on the artery test dataset

In [6]:
# Init
PCA_SIZE = 1024
PCA_MODEL_PATH = f"/dtu/blackhole/0b/155947/models/ipca_model_n{PCA_SIZE}.pkl"
DNN_MODEL_PATH = f"/dtu/blackhole/0b/155947/models/Best_PCA_DENSE"
gtx_test_dataloader = DataLoader(gtex_test, batch_size=10, shuffle=True)

# Loading the PCA
with open(PCA_MODEL_PATH, 'rb') as file:
    ipca = pickle.load(file)

# Grab a sample to initialize output size for DNN class
gene_expr, isoform_expr, _ = next(iter(gtx_test_dataloader))

# DNN model
dnn = FeedForwardIsoform_XL(input_shape = PCA_SIZE, 
                             output_shape = isoform_expr[0].size())
checkpoint = torch.load(DNN_MODEL_PATH, map_location=torch.device('cpu'))
dnn.load_state_dict(checkpoint['model_state_dict'])

dnn = dnn.to(device)

In [7]:
dnn.eval()
test_loss = []
for x, y, _ in tqdm(gtx_test_dataloader):
    # Perform PCA
    x = ipca.transform(x)

    # Datatype handling
    x = torch.from_numpy(x).float()
    x = x.to(device)
    y = y.to(device)

    # Run through network
    x = dnn.forward(x)

    loss = criterion(x, y)

    test_loss.append(loss.item())

100%|██████████| 134/134 [01:14<00:00,  1.81it/s]


In [8]:
mean_test_loss_PCADNN = np.mean(test_loss)
print('Mean test loss of PCADNN is:', mean_test_loss_PCADNN)

Mean test loss of PCADNN is: 0.20582015589991612


For the PCA-DNN we were not able to recreate the lowest test loss of ~0.183, the best performance we got from a saved model with identical parameters was ~0.206

### Encoder-DNN performance
Here we load in a checkpoint for out best performing Encoder-DNN and checks the performance on the artery test dataset.

In [9]:
# Init
LATENT_FEATURES = 256
ENCODER_MODEL_PATH = f"/dtu/blackhole/0b/155947/models/Best_VAE"
DNN_MODEL_PATH = f"/dtu/blackhole/0b/155947/models/Best_ENCODER_DENSE"
gtx_test_dataloader = DataLoader(gtex_test, batch_size=10, shuffle=True)

# Grab a sample to initialize input size for encoder and output size for DNN class
gene_expr, isoform_expr, _ = next(iter(gtx_test_dataloader))

# Loading VAE checkpoint to be utilized as encoder
vae = VAE_lf(input_shape=gene_expr[0].size(),
                       latent_features=LATENT_FEATURES)
checkpoint = torch.load(ENCODER_MODEL_PATH, map_location=torch.device('cpu'))
vae.load_state_dict(checkpoint['model_state_dict'])

# DNN model
dnn = FeedForwardIsoform_XL(input_shape = LATENT_FEATURES, 
                             output_shape = isoform_expr[0].size())
checkpoint = torch.load(DNN_MODEL_PATH, map_location=torch.device('cpu'))
dnn.load_state_dict(checkpoint['model_state_dict'])

vae = vae.to(device)
dnn = dnn.to(device)

In [10]:
vae.eval()
dnn.eval()
test_loss = []
for x, y, _ in tqdm(gtx_test_dataloader):
    # Send to device
    x = x.to(device)
    y = y.to(device)

    # Encode input to latent space
    mu, logvar = vae.encode_mu_var(x)
    z = vae.reparameterize(mu, logvar)

    # Run through network
    x = dnn.forward(z)

    # Caculate loss and backprop
    loss = criterion(x, y).double()

    test_loss.append(loss.item())

100%|██████████| 134/134 [01:18<00:00,  1.70it/s]


In [11]:
mean_test_loss_encDNN = np.mean(test_loss)
print('Mean test loss of PCADNN is:', mean_test_loss_encDNN)

Mean test loss of PCADNN is: 0.42892515392445807


Upon recreation of the Encoder-DNN with the optimal parameters we achieved an even better score test loss score of ~0.429, which was better than when doing hyperparameters search where we got 0.491.

### Comparison of the 3 models

In [13]:
print(f"Comparison of best performing models...\nStandlone DNN:\t{mean_test_loss_standaloneDNN}\nPCADNN:\t\t{mean_test_loss_PCADNN}\nencDNN:\t\t{mean_test_loss_encDNN}")

Comparison of best performing models...
Standlone DNN:	0.19647285976071857
PCADNN:		0.20582015589991612
encDNN:		0.42892515392445807
