# Benchmarking of integration methods
This notebook provides a short overview on how to use the scIB module and performs a short analysis of tabula muris thymus and bone marrow data.

In [1]:
import scanpy as sc
import scIB
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import copy

In [4]:
%matplotlib inline

In [5]:
file = '../merged_adata.h5ad'
batch = 'method'
hvg = None

## Read the data

In [6]:
adata = sc.read(file)

In [7]:
methods = {}
adatas = {}

## Run the integration methods
The functions for the integration methods are in `scIB.integration`. Generally, the methods expect an anndata object and the batch key as an input. The runtime and memory usage of the functions are meaured using `scIB.metrics.measureTM`. This function returns memory usage in MB, runtime in s and the output of the tested function.

In [8]:
methods['scanorama'] = scIB.metrics.measureTM(scIB.integration.runScanorama, adata, batch)

Found 12914 genes among all datasets
[[0.         0.53899291]
 [0.         0.        ]]
Processing datasets (0, 1)
memory usage:6695.0 MB
runtime: 65.0 s


In [9]:
methods['scanorama'][2][0][1].obsm['X_pca'] = methods['scanorama'][2][0][0]
adatas['scanorama'] = methods['scanorama'][2][0][1]
sc.pp.neighbors(adatas['scanorama'])

In [10]:
methods['bbknn'] = scIB.metrics.measureTM(scIB.integration.runBBKNN, adata, batch)

memory usage:731.0 MB
runtime: 5.0 s


In [11]:
#scgen = scIB.metrics.measureTM(scIB.integration.runScGen, adata, batch = batch)

In [None]:
methods['mnn'] = scIB.metrics.measureTM(scIB.integration.runMNN, adata, batch)

Performing cosine normalization...
Starting MNN correct iteration. Reference batch: 0
Step 1 of 1: processing batch 1
  Looking for MNNs...


In [None]:
adatas['mnn'] = methods['mnn'][2][0]
sc.tl.pca(adatas['mnn'], svd_solver='arpack')
sc.pp.neighbors(adatas['mnn'])

In [None]:
methods['harmony'] = scIB.metrics.measureTM(scIB.integration.runHarmony, adata, batch)

In [None]:
adatas['harmony'] = copy.deepcopy(adata)
adatas['harmony'].obsm['X_pca'] = methods['harmony'][2][0]
sc.pp.neighbors(adatas['harmony'])

In [None]:
methods['seurat'] = scIB.metrics.measureTM(scIB.integration.runSeurat, adata, batch)

In [None]:
adatas['seurat'] = methods['seurat'][2][0]
sc.tl.pca(adatas['seurat'], svd_solver='arpack')
sc.pp.neighbors(adatas['seurat'])

## Runtime analysis
Here, we compare the runtimes and the memory usage of all tested methods

In [None]:
mem = pd.Series()
time = pd.Series()
for i in methods.keys():
    mem[i]=methods[i][0]
    time[i]=methods[i][1]

In [None]:
mem.plot.bar()
plt.show()

## Quantifying quality of Integration

### Silhouette score

In [None]:
sil = pd.Series()
for i in adatas.keys():
    sil[i] = np.mean(scIB.metrics.silhouette_score(adatas[i], 'method', 'cell_ontology_class')[0])

In [None]:
sil.plot.bar()

In [None]:
scIB.metrics.plot_silhouette_score(adatas, verbose=False)
plt.show()

In [None]:
nmi = pd.Series()
for i in adatas.keys():
    sc.tl.louvain(adatas[i], key_added='louvain_post')
    nmi[i]= scIB.metrics.nmi(adatas[i], 'cell_ontology_class', 'louvain_post')

In [None]:
nmi