### Perform tensor decomposition with GPUs on Google Colab

Liana runs much faster with GPUs, so we will use that for the tensor factorization step.

In [1]:
%%time
!pip install liana cell2cell decoupler omnipath seaborn==0.11

Collecting liana
  Downloading liana-1.4.0-py3-none-any.whl.metadata (6.4 kB)
Collecting cell2cell
  Downloading cell2cell-0.7.4-py3-none-any.whl.metadata (5.3 kB)
Collecting decoupler
  Downloading decoupler-1.8.0-py3-none-any.whl.metadata (4.6 kB)
Collecting omnipath
  Downloading omnipath-1.0.8-py3-none-any.whl.metadata (6.5 kB)
Collecting seaborn==0.11
  Downloading seaborn-0.11.0-py3-none-any.whl.metadata (2.2 kB)
Collecting anndata>=0.7.4 (from liana)
  Downloading anndata-0.11.1-py3-none-any.whl.metadata (8.2 kB)
Collecting docrep>=0.3.1 (from liana)
  Downloading docrep-0.3.2.tar.gz (33 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mudata (from liana)
  Downloading mudata-0.3.1-py3-none-any.whl.metadata (8.3 kB)
Collecting pre-commit>=3.0.0 (from liana)
  Downloading pre_commit-4.0.1-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting scanpy>=1.8.0 (from liana)
  Downloading scanpy-1.10.4-py3-none-any.whl.metadata (9.3 kB)
Collecting umap-learn (from cell2ce

In [2]:
import pandas as pd
import scanpy as sc
import plotnine as p9
import pickle

import liana as li
import cell2cell as c2c
import decoupler as dc # needed for pathway enrichment

import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict

%matplotlib inline
import os
import torch

from collections import Counter



In [3]:
import seaborn as sns
import matplotlib.pyplot as plt

import matplotlib
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
!ls '/content/drive/MyDrive/Lab_Research'

bams			    hg38.chrom.sizes		  MultiVI_test
blacklist.bed.gz	    hg38.chrom.subset.sizes	  MultiVI_tutorial
chrombpnet_model	    hg38.fa			  nohup.out
chrombpnet_tutorial	    hg38.fa.fai			  output_auxiliary
ENCODE_MultiVI_analysis     liana_all_D_ND_fetal	  output_negatives.bed
ENCODE_RNA_analysis	    liana_disease_v_non_diseased  Penn_RNA_scvi_integration
ENCSR868FGK_bias_fold_0.h5  liana_final			  splits
HDAC3_project		    MultiVI_all_nuclei


In [6]:
working_dir = "/content/drive/MyDrive/Lab_Research/liana_final/"

# change to the working directory
os.chdir(working_dir)

In [7]:
# load in the tensor from step 1 (performed on methylome3 server)
with open("disease_binary_cardiac_tensor.pkl", 'rb') as file:
    tensor = pickle.load(file)

# load in the tensor metadata
with open("disease_binary_cardiac_tensor_metadata.pkl", 'rb') as file:
    tensor_meta = pickle.load(file)

Check the device information, and specify GPU for significantly faster computation (difference between a week and 1 hour!)

In [8]:
# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True

if use_gpu:
    import torch
    import tensorly as tl

    device = "cuda" if torch.cuda.is_available() else "cpu"
    if device == "cuda":
        tl.set_backend('pytorch')
else:
    device = "cpu"

device

'cuda'

### Run tensor decomposition with GPU

This will take days without a GPU, so running this on Google Colab significantly speeds this up. Tensor composition takes about 1.5 to 2 hours to run with a T4 GPU.

In [9]:
tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                    tensor_meta,
                                                    copy_tensor=True, # Whether to output a new tensor or modifying the original
                                                    tf_optimization='regular', # To define how robust we want the analysis to be.
                                                    random_state=0, # Random seed for reproducibility
                                                    device=device, # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                    elbow_metric='error', # Metric to use in the elbow analysis.
                                                    smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
                                                    upper_rank=15, # Max number of factors to try in the elbow analysis
                                                    tf_init='random', # Initialization method of the tensor factorization
                                                    tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
                                                    cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
                                                    sample_col='Element', # Columns containing the elements in the tensor metadata
                                                    group_col='Category', # Columns containing the major groups in the tensor metadata
                                                    output_fig=False, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
                                                    )

Running Elbow Analysis


100%|██████████| 15/15 [1:13:00<00:00, 292.02s/it]


The rank at the elbow is: 7
Running Tensor Factorization


In [10]:
c2c.io.export_variable_with_pickle(tensor2, "post_factorization_cardiac_tensor.pkl")

post_factorization_cardiac_tensor.pkl  was correctly saved.


### Proceed to the script 03 for visualizing the results