# Testing the MOGONET paper

The link to the paper is [here](https://www.nature.com/articles/s41467-021-23774-w).

The link to the code repo is [here](https://www.nature.com/articles/s41467-021-23774-w)

Authors: Tongxin Wang, Wei Shao, Zhi Huang, Haixu Tang, Jie Zhang, Zhengming Ding, Kun Huang.

## Table of Content
- [DATA](#data)
- [Main Biomarker](#main-biomarker)
- [Main Mogonet](#main-mogonet)
- [Models](#models)
- [Train_Test](#train-test)
- [Feat Importance](#feat-importance)

### Data

To demonstrate effectiveness of MOGONET, authors applied proposed method on **four different classification tasks** using **four different datasets** (CHECK):

Three types of omics data for each dataset:
 - mRNA expression data (mRNA)
 - DNA methylation data (meth)
 - miRNA expression data (miRNA)

Datasets:
 1) BReast invasive CArnicoma (**BRCA**)
     1) mRNA, 1000 features
     2) meth, 1000 features
     3) miRNA, 611 observations $\times$ 502 features (**NOT 503** as shown in [paper](https://www.nature.com/articles/s41467-021-23774-w/tables/1))
 2) Religious Orders Study/Memory and Aging Project(**ROSMAP**)
 3) Low Grade Glicoma (LGG) --- Missing
 4) KIPAN --- Missing


    

In [5]:
# Load libraries 
import os
import numpy as np
import torch
from train_test import train_test, prepare_trte_data
from utils import graph_from_dist_tensor, cosine_distance_torch
import time

In [2]:
# Check if cuda is available
torch.cuda.is_available()

False

#### BRCA

In [5]:
# Loading the data for BRCA
BRCA_FOLDER = "BRCA/"
BRCA_view = [1, 2, 3]

# 1st step is to get prepare up the training data list
# all data list of tensors, their index dictionary, and
# their corresponding class label.
brca_train_list, brca_all_list, brca_idx_dict, brca_labels = prepare_trte_data(data_folder=BRCA_FOLDER,
                                                                        view_list=BRCA_view)

#### ROSMAP

In [40]:
# Loading the data for ROSMAP
ROSMAP_FOLDER = "ROSMAP/"
ROSMAP_view = [1, 2, 3]
rosmap_train_list, rosmap_all_list, rosmap_idx_dict, ros_labels = prepare_trte_data(data_folder=BRCA_FOLDER,
                                                                        view_list=BRCA_view)


#### Sizes

In [41]:
# Helper function to print size of dataset (num of tensors)
# size of each tensor
def print_size(train_list, all_list, dataset = "BRCA"):
    if dataset == "BRCA":
        tr_name = dataset + " Train List"
        all_name = dataset + " All List"

    elif dataset == "ROSMAP":
        tr_name = dataset + " Train List"
        all_name = dataset + " All List"

    else:
        return("Wrong dataset input")
    print("#" * 50)
    print(f"The dataset is: {dataset}")
    # Prints size of tensor in the train data list
    print(f"Number of tensors in {tr_name}: {len(train_list)}")
    for i in train_list:
        print(f"The size of each tensor is: {i.shape}")
    print("#" * 50)

    # Prints size of tensor in the all data list
    print(f"Number of tensors in {all_name}: {len(all_list)}")
    for j in all_list:
        print(f"The size of each tensor is: {j.shape}")


print_size(brca_train_list, brca_all_list)
print_size(rosmap_train_list, rosmap_all_list, dataset = "ROSMAP")


##################################################
The dataset is: BRCA
Number of tensors in BRCA Train List: 3
The size of each tensor is: torch.Size([612, 1000])
The size of each tensor is: torch.Size([612, 1000])
The size of each tensor is: torch.Size([612, 503])
##################################################
Number of tensors in BRCA All List: 3
The size of each tensor is: torch.Size([875, 1000])
The size of each tensor is: torch.Size([875, 1000])
The size of each tensor is: torch.Size([875, 503])
##################################################
The dataset is: ROSMAP
Number of tensors in ROSMAP Train List: 3
The size of each tensor is: torch.Size([612, 1000])
The size of each tensor is: torch.Size([612, 1000])
The size of each tensor is: torch.Size([612, 503])
##################################################
Number of tensors in ROSMAP All List: 3
The size of each tensor is: torch.Size([875, 1000])
The size of each tensor is: torch.Size([875, 1000])
The size of each tensor

### Main Biomarker

### Main Mogonet 

In [3]:
def main_runner(data_folder, view_list=None, lr_e_pretrain = 1e-3, lr_e = 5e-4, lr_c = 1e-3, num_epoch_pretrain = 500, num_epoch = 500):
    """
    Main runner of the MOGONET algorithm, takes several hyperparameters, and does train-test split of the data_folder,
    where transforms the data to epochs for GNN. <---- EDIT here
    
    Args:
    """
    
    if data_folder == 'ROSMAP':
        num_class = 2
    elif data_folder == 'BRCA':
        num_class = 5
    else:
        return("Wrong dataset input")
    
    train_test(data_folder, view_list, num_class,
               lr_e_pretrain, lr_e, lr_c, 
               num_epoch_pretrain, num_epoch)             

In [6]:
# Running on BRCA
i = time.time()
main_runner("BRCA", [1, 2, 3])
e = time.time()
print(f"Time taken to train: {e - i}")


Pretrain GCNs...

Training...

Test: Epoch 0
Test ACC: 0.110
Test F1 weighted: 0.129
Test F1 macro: 0.108


Test: Epoch 50
Test ACC: 0.715
Test F1 weighted: 0.639
Test F1 macro: 0.461


Test: Epoch 100
Test ACC: 0.760
Test F1 weighted: 0.731
Test F1 macro: 0.648


Test: Epoch 150
Test ACC: 0.779
Test F1 weighted: 0.752
Test F1 macro: 0.678


Test: Epoch 200
Test ACC: 0.802
Test F1 weighted: 0.779
Test F1 macro: 0.718


Test: Epoch 250
Test ACC: 0.810
Test F1 weighted: 0.790
Test F1 macro: 0.736


Test: Epoch 300
Test ACC: 0.798
Test F1 weighted: 0.771
Test F1 macro: 0.710


Test: Epoch 350
Test ACC: 0.802
Test F1 weighted: 0.774
Test F1 macro: 0.722


Test: Epoch 400
Test ACC: 0.798
Test F1 weighted: 0.781
Test F1 macro: 0.727


Test: Epoch 450
Test ACC: 0.802
Test F1 weighted: 0.781
Test F1 macro: 0.726


Test: Epoch 500
Test ACC: 0.806
Test F1 weighted: 0.795
Test F1 macro: 0.746

Time taken to train: 65.84119582176208


In [8]:
# Running on ROSMAP
main_runner("ROSMAP", [1, 2, 3])


Pretrain GCNs...

Training...

Test: Epoch 0
Test ACC: 0.425
Test F1: 0.000
Test AUC: 0.315


Test: Epoch 50
Test ACC: 0.472
Test F1: 0.606
Test AUC: 0.463


Test: Epoch 100
Test ACC: 0.670
Test F1: 0.737
Test AUC: 0.795


Test: Epoch 150
Test ACC: 0.783
Test F1: 0.800
Test AUC: 0.865


Test: Epoch 200
Test ACC: 0.811
Test F1: 0.815
Test AUC: 0.879


Test: Epoch 250
Test ACC: 0.792
Test F1: 0.807
Test AUC: 0.894


Test: Epoch 300
Test ACC: 0.821
Test F1: 0.822
Test AUC: 0.892


Test: Epoch 350
Test ACC: 0.811
Test F1: 0.804
Test AUC: 0.884


Test: Epoch 400
Test ACC: 0.821
Test F1: 0.819
Test AUC: 0.906


Test: Epoch 450
Test ACC: 0.821
Test F1: 0.812
Test AUC: 0.894


Test: Epoch 500
Test ACC: 0.802
Test F1: 0.800
Test AUC: 0.887



### Models

### Train Test

In [6]:
# Loading the data for BRCA
BRCA_FOLDER = "BRCA/"
BRCA_view = [1, 2, 3]

# 1st step is to get prepare up the training data list
# all data list of tensors, their index dictionary, and
# their corresponding class label.
brca_train_list, brca_all_list, brca_idx_dict, brca_labels = prepare_trte_data(data_folder=BRCA_FOLDER,
                                                                        view_list=BRCA_view)

In [88]:
type(brca_labels)

numpy.ndarray

### Feat Importance

## Others

### prepare_trte_data

In [1]:
# Helper function to prepare the data
cuda = True if torch.cuda.is_available() else False


def prepare_trte_data(data_folder, view_list):
    """
    Gets all the *tr.csv and *te.csv in the data_folder, and transforms these to list of tensors, then
    storing it on several returned objects

    Args: 
        data_folder: path to read the data
        view_list: list of files to be viewed [1,2,3] here
    Returns:
        data_train_list: list of tensors of the train data
        data_all_list: list of tensors of combined train and test data
        idx_dict: dict that corresponds to the label (id) of both train,
                  and test data
        labels:  numpy array that stores the actual class of each observation
    """
    num_view = len(view_list)
    # Get the labels and transform it to integer to map it
    labels_tr = np.loadtxt(os.path.join(data_folder, "labels_tr.csv"), delimiter=',')
    labels_te = np.loadtxt(os.path.join(data_folder, "labels_te.csv"), delimiter=',')
    labels_tr = labels_tr.astype(int)
    labels_te = labels_te.astype(int)
    
    # Initialize list to store results
    data_tr_list = []
    data_te_list = []

    # Reads the data in the csv files with _tr / _te
    # And append it correspondently to its list
    for i in view_list:
        data_tr_list.append(np.loadtxt(os.path.join(data_folder, str(i)+"_tr.csv"), delimiter=','))
        data_te_list.append(np.loadtxt(os.path.join(data_folder, str(i)+"_te.csv"), delimiter=','))
    num_tr = data_tr_list[0].shape[0]
    num_te = data_te_list[0].shape[0]
    data_mat_list = []
    for i in range(num_view):
        data_mat_list.append(np.concatenate((data_tr_list[i], data_te_list[i]), axis=0))
    data_tensor_list = []

    for i in range(len(data_mat_list)):
        data_tensor_list.append(torch.FloatTensor(data_mat_list[i]))
        if cuda:
            data_tensor_list[i] = data_tensor_list[i].cuda()
    idx_dict = {}
    idx_dict["tr"] = list(range(num_tr))
    idx_dict["te"] = list(range(num_tr, (num_tr+num_te)))
    data_train_list = []
    data_all_list = []
    for i in range(len(data_tensor_list)):
        data_train_list.append(data_tensor_list[i][idx_dict["tr"]].clone())
        data_all_list.append(torch.cat((data_tensor_list[i][idx_dict["tr"]].clone(),
                                       data_tensor_list[i][idx_dict["te"]].clone()),0))
    labels = np.concatenate((labels_tr, labels_te))
    
    return data_train_list, data_all_list, idx_dict, labels

NameError: name 'torch' is not defined