# Testing the MOGONET paper

The link to the paper is [here](https://www.nature.com/articles/s41467-021-23774-w).

The link to the code repo is [here](https://www.nature.com/articles/s41467-021-23774-w)

Authors: Tongxin Wang, Wei Shao, Zhi Huang, Haixu Tang, Jie Zhang, Zhengming Ding, Kun Huang.

## Table of Content
- [DATA](#Data)
- [Main Biomarker](#Main-Biomarker)
- [Main Mogonet](#Main-Mogonet)
- [Models](#Models)
- [Train_Test](#Train_Test)
- [Feat Importance](#Feat-Importance)

### Data

To demonstrate effectiveness of MOGONET, authors applied proposed method on **four different classification tasks** using **four different datasets** (CHECK):

Three types of omics data for each dataset:
 - mRNA expression data (mRNA)
 - DNA methylation data (meth)
 - miRNA expression data (miRNA)

Datasets:
 1) BReast invasive CArnicoma (**BRCA**)
     1) mRNA, 1000 features
     2) meth, 1000 features
     3) miRNA, 611 observations $\times$ 502 features (**NOT 503** as shown in [paper](https://www.nature.com/articles/s41467-021-23774-w/tables/1))
 2) Religious Orders Study/Memory and Aging Project(**ROSMAP**)
 3) Low Grade Glicoma (LGG) --- Missing
 4) KIPAN --- Missing


    

In [1]:
# Load libraries 
import os
import numpy as np
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Helpers function to read data
cuda = True if torch.cuda.is_available() else False


def prepare_trte_data(data_folder, view_list):
    """
    Gets all the *tr.csv and *te.csv in the data_folder, and transforms these to list of tensors, then
    storing it on several returned objects

    Args: 
        data_folder: path to read the data
        view_list: list of files to be viewed [1,2,3] here
    Returns:
        data_train_list: list of tensors of the train data
        data_all_list: list of tensors of combined train and test data
        idx_dict: dict that corresponds to the label (id) of both train,
                  and test data
        labels:  numpy array that stores the actual class of each observation
    """
    num_view = len(view_list)
    # Get the labels and transform it to integer to map it
    labels_tr = np.loadtxt(os.path.join(data_folder, "labels_tr.csv"), delimiter=',')
    labels_te = np.loadtxt(os.path.join(data_folder, "labels_te.csv"), delimiter=',')
    labels_tr = labels_tr.astype(int)
    labels_te = labels_te.astype(int)
    
    # Initialize list to store results
    data_tr_list = []
    data_te_list = []
    
    for i in view_list:
        data_tr_list.append(np.loadtxt(os.path.join(data_folder, str(i)+"_tr.csv"), delimiter=','))
        data_te_list.append(np.loadtxt(os.path.join(data_folder, str(i)+"_te.csv"), delimiter=','))
    num_tr = data_tr_list[0].shape[0]
    num_te = data_te_list[0].shape[0]
    data_mat_list = []
    for i in range(num_view):
        data_mat_list.append(np.concatenate((data_tr_list[i], data_te_list[i]), axis=0))
    data_tensor_list = []
    for i in range(len(data_mat_list)):
        data_tensor_list.append(torch.FloatTensor(data_mat_list[i]))
        if cuda:
            data_tensor_list[i] = data_tensor_list[i].cuda()
    idx_dict = {}
    idx_dict["tr"] = list(range(num_tr))
    idx_dict["te"] = list(range(num_tr, (num_tr+num_te)))
    data_train_list = []
    data_all_list = []
    for i in range(len(data_tensor_list)):
        data_train_list.append(data_tensor_list[i][idx_dict["tr"]].clone())
        data_all_list.append(torch.cat((data_tensor_list[i][idx_dict["tr"]].clone(),
                                       data_tensor_list[i][idx_dict["te"]].clone()),0))
    labels = np.concatenate((labels_tr, labels_te))
    
    return data_train_list, data_all_list, idx_dict, labels

#### BRCA

In [6]:
# Loading the data for BRCA
BRCA_FOLDER = "BRCA/"
BRCA_view = [1,2,3]


brca_train_list, brca_all_list, brca_idx_dict, brca_labels = prepare_trte_data(data_folder=BRCA_FOLDER,
                                                                        view_list=BRCA_view)

In [9]:
brca_labels

array([2, 4, 0, 3, 1, 3, 3, 1, 1, 3, 1, 3, 0, 1, 3, 0, 3, 2, 3, 4, 0, 1,
       3, 4, 0, 4, 3, 4, 4, 0, 0, 1, 4, 3, 0, 3, 1, 4, 0, 3, 1, 4, 3, 3,
       2, 3, 0, 2, 3, 4, 3, 3, 1, 1, 3, 1, 1, 4, 3, 3, 4, 4, 3, 4, 3, 2,
       3, 3, 3, 3, 3, 3, 1, 0, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 3, 3,
       3, 2, 3, 4, 3, 3, 3, 3, 3, 1, 3, 4, 3, 3, 3, 1, 0, 1, 0, 4, 3, 3,
       1, 0, 3, 1, 4, 3, 3, 2, 1, 3, 2, 3, 3, 4, 0, 3, 3, 1, 1, 1, 1, 4,
       3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 0, 3, 4, 1, 3, 0, 1, 3, 0, 3, 1,
       0, 3, 3, 3, 1, 0, 2, 4, 3, 1, 3, 3, 1, 1, 3, 1, 3, 3, 3, 3, 3, 1,
       3, 3, 4, 3, 4, 3, 3, 0, 4, 4, 3, 1, 3, 4, 3, 4, 3, 0, 3, 4, 3, 3,
       3, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 4, 1, 3, 3, 3, 3, 4,
       3, 3, 3, 4, 0, 2, 3, 3, 1, 3, 3, 1, 1, 1, 0, 0, 3, 3, 4, 3, 0, 1,
       3, 0, 3, 3, 3, 3, 3, 4, 3, 4, 3, 0, 3, 3, 4, 4, 4, 3, 3, 3, 4, 3,
       4, 3, 3, 4, 2, 1, 3, 3, 3, 3, 3, 3, 4, 3, 3, 1, 3, 1, 1, 4, 3, 3,
       0, 3, 4, 3, 4, 3, 3, 3, 3, 4, 3, 0, 3, 4, 1,

#### ROSMAP

In [38]:
# Loading the data for ROSMAP
ROSMAP_FOLDER = "ROSMAP/"
ROSMAP_view = [1, 2 , 3]
rosmap_train_list, rosmap_all_list, rosmap_idx_dict, ros_labels = prepare_trte_data(data_folder=BRCA_FOLDER,
                                                                        view_list=BRCA_view)


### Main-Biomarker

### Main-Mogonet 

### Models

### Train_Test

### Feat-Importance