### GraphAnnDataModule Use case

This is an interactive notebook that allows the user to train their choice of model on different Squidpy datasets. By walking through the notebook step-by-step, the user can see how the GraphAnnDataModule class can be applied to train different types of models. For more information on the example models to choose from, see https://www.biorxiv.org/content/10.1101/2021.07.11.451750v1

In [7]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [8]:
import os
from argparse import ArgumentParser, Namespace
from random import choices
import pytorch_lightning as pl
from typing import Callable, List, Optional, Sequence, Union
import squidpy as sq
import torch
from torch_geometric.loader import RandomNodeSampler
import pandas as pd
from torch_geometric.data import Data
from anndata import AnnData
from gpu_spatial_graph_pipeline.utils import adata2data
from gpu_spatial_graph_pipeline.data.datamodule import GraphAnnDataModule
from gpu_spatial_graph_pipeline.models.linear_ncem import LinearNCEM
from gpu_spatial_graph_pipeline.models.non_linear_ncem import NonLinearNCEM
from gpu_spatial_graph_pipeline.models.graph_embedding import GraphEmbedding
from gpu_spatial_graph_pipeline.data.datasets import DatasetHartmann

from ipywidgets import widgets

### Choose/Upload dataset

Run the cell below to choose the dataset to train on

In [12]:
dropvals1 = widgets.Dropdown(options=[('custom', 1),('mibitof', 2)], value=1,description="dataset")
dropvals1


Dropdown(description='dataset', options=(('custom', 1), ('mibitof', 2)), value=1)

Run the cell below to choose the learning type for splitting the data

In [13]:
dropvals2= widgets.Dropdown(options=[('nodewise', 1),('graphwise', 2)], value=1,description="learning type")
dropvals2

Dropdown(description='learning type', options=(('nodewise', 1), ('graphwise', 2)), value=1)

Run the cell below to instantiate your datamodule

In [14]:
if dropvals1.value==1:

    raise NotImplementedError


elif dropvals1.value==2:
    #adata = sq.datasets.mibitof()
    dataset = DatasetHartmann(data_path='./example_data/hartmann/')
    adata = list(dataset.img_celldata.values())
    feature_names=['Cluster_preprocessed','donor']

    def anndata2data(adata): 
        return adata2data(adata,feature_names)

    
    #input of datamodule
    num_features=(len(set(dataset.celldata.obs[feature_names[0]])),len(set(dataset.celldata.obs[feature_names[1]])))

    num_genes=dataset.celldata.shape[1]

dm = GraphAnnDataModule(adata=adata, adata2data_fn=anndata2data, num_workers = 8, batch_size=100,learning_type=dropvals2.options[dropvals2.value-1][0])

dm.setup()
itr = 2
print(f"Sample of batches from custom datamodule:")
for batch in dm.train_dataloader():
    print(batch)
    itr -= 1
    if itr<0:
        break


Loading data from raw files
registering celldata




collecting image-wise celldata
adding graph-level covariates
Loaded 58 images with complete data from 4 patients over 63747 cells with 36 cell features and 8 distinct celltypes.
Sample of batches from custom datamodule:
DataBatch(x=[699, 12], edge_index=[2, 600], y=[699, 36], Xd=[699, 76], batch=[699], ptr=[59], train_mask=[699], val_mask=[699], test_mask=[699], batch_size=100)
DataBatch(x=[697, 12], edge_index=[2, 600], y=[697, 36], Xd=[697, 76], batch=[697], ptr=[59], train_mask=[697], val_mask=[697], test_mask=[697], batch_size=100)
DataBatch(x=[696, 12], edge_index=[2, 600], y=[696, 36], Xd=[696, 76], batch=[696], ptr=[59], train_mask=[696], val_mask=[696], test_mask=[696], batch_size=100)


## Choose model

Run the cell below to choose the model to train

In [16]:
dropvals_model = widgets.Dropdown(options=[('custom', 1),('linear_ncem', 2),('nonlinear_ncem', 3),('graph_embedding', 4)], value=1,description="model")
dropvals_model

Dropdown(description='model', options=(('custom', 1), ('linear_ncem', 2), ('nonlinear_ncem', 3), ('graph_embed…

Run the cell below to instantiate your datamodule

In [21]:
if dropvals_model.value==1:
    raise NotImplementedError
elif dropvals_model.value==2:
    model = LinearNCEM(in_channels=num_features,out_channels=num_genes, model_type='spatial', lr=0.0001,weight_decay=0.000001)
elif dropvals_model.value==3:
    model = NonLinearNCEM(in_channels=num_features,encoder_hidden_dims=10,latent_dim=30,decoder_hidden_dims=10,out_channels=num_genes, lr=0.0001,weight_decay=0.000001)
elif dropvals_model.value==4:
    model = GraphEmbedding(num_features=11,latent_dim=30,lr=0.0001,weight_decay=0.000001)

model

LinearNCEM(
  (model_sigma): LinearSpatial(
    (linear): Linear(in_features=76, out_features=36, bias=True)
  )
  (model_mu): LinearSpatial(
    (linear): Linear(in_features=76, out_features=36, bias=True)
  )
  (loss_module): GaussianNLLLoss()
)

Run the cell below to choose accelerator 

In [18]:
dropvals_gpu= widgets.Dropdown(options=[('gpu', 1),('cpu', 2)], value=1,description="accelerator")
dropvals_gpu

Dropdown(description='accelerator', options=(('gpu', 1), ('cpu', 2)), value=1)

Run the cell below to train your model

In [22]:
if dropvals_gpu.value==1:
    trainer:pl.Trainer = pl.Trainer(accelerator='gpu',max_epochs=10,log_every_n_steps=10)
else:
    trainer:pl.Trainer = pl.Trainer(accelerator='cpu',max_epochs=10,log_every_n_steps=10)
    
trainer.fit(model,datamodule=dm)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name        | Type            | Params
------------------------------------------------
0 | model_sigma | LinearSpatial   | 2.8 K 
1 | model_mu    | LinearSpatial   | 2.8 K 
2 | loss_module | GaussianNLLLoss | 0     
------------------------------------------------
5.5 K     Trainable params
0         Non-trainable params
5.5 K     Total params
0.022     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=10` reached.


Run the cell below to test your model

In [23]:
trainer.test(model, datamodule=dm)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: 0it [00:00, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_loss           -0.5218654400910083
      test_r2_score         -0.2679919666638514
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'test_r2_score': -0.2679919666638514, 'test_loss': -0.5218654400910083}]