Fig. 5: Benchmark STEMNET
---

In this notebook, we extract STEMNET's fate probabilities.

# Preliminaries

## Import packages

In [1]:
# import standard packages
import os
import sys

import pandas as pd

# import single-cell packages
import cellrank as cr
import scanpy as sc
import scvelo as scv
import anndata2ri

anndata2ri.activate()
%load_ext rpy2.ipython

In [4]:
%%R
library(STEMNET)

## Print package versions for reproducibility

In [3]:
cr.logging.print_versions()

cellrank==1.0.0-rc.12+gb3e00a8 scanpy==1.6.0 anndata==0.7.4 numpy==1.19.2 numba==0.51.2 scipy==1.5.2 pandas==1.1.3 scikit-learn==0.23.2 statsmodels==0.12.0 python-igraph==0.8.3 scvelo==0.2.3.dev26+g8351f46 pygam==0.8.0 matplotlib==3.3.2 seaborn==0.11.0


In [5]:
%%R
packageVersion("STEMNET")

[1] ‘0.1’


## Set up paths

In [9]:
sys.path.insert(0, "../../../../")  # this depends on the notebook depth and must be adapted per notebook

from paths import DATA_DIR

## Load the data

In [3]:
adata = cr.datasets.pancreas(DATA_DIR / "pancreas" / "pancreas.h5ad")
del adata.uns['neighbors']  # crashes anndata2ri
adata

AnnData object with n_obs × n_vars = 2531 × 27998
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta'
    var: 'highly_variable_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'day_colors', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

### Preprocess the data

In [4]:
scv.pp.filter_and_normalize(adata, min_shared_counts=10, n_top_genes=3000)

Filtered out 20788 genes that are detected 10 counts (shared).
Normalized count data: X, spliced, unspliced.
Exctracted 3000 highly variable genes.
Logarithmized X.


### Extract cluster information for STEMNET

In [5]:
clusters = ['Alpha', 'Beta', 'Epsilon', 'Delta']
cluster_pop = pd.DataFrame(dict(zip(clusters, [adata.obs['clusters_fine'].isin([c]) for c in clusters])))

# Analysis

## Convert the data

In [7]:
%%R -i cluster_pop -i adata
pop <- booleanTable2Character(cluster_pop, other_value=NA)
expression <- t(as.matrix(adata@assays@data[['X']]))  # cells x gene

## Run STEMNET

In [8]:
%%R
result <- runSTEMNET(expression, pop)

R[write to console]: At an optimal value of lambda, the misclassification rate for mature populations is 2.19%.



## Print the results

In [9]:
%%R
print(result)

Object of class stemnet with 1755 stem cells and 776 mature cells assigned to one of 4 target populations of the following sizes:

  Alpha    Beta   Delta Epsilon 
    259     308      70     139 
At an optimal value of lambda, the misclassification rate for mature populations is  2.19 %.
Posterior probability matrix (truncated):
          Alpha       Beta      Delta    Epsilon
[1,] 0.30170666 0.32956571 0.17114554 0.19758209
[2,] 0.43379233 0.23734492 0.09273911 0.23612364
[3,] 0.18556265 0.37417405 0.06436661 0.37589668
[4,] 0.88067863 0.03276495 0.02337865 0.06317778
[5,] 0.04970928 0.15549984 0.74896418 0.04582670
[6,] 0.11588795 0.78105586 0.04637508 0.05668112


## Extract the probabilities

In [10]:
%%R -o probs
probs <- (result@posteriors)

Create a CellRank Lineage object.

In [11]:
slin = cr.tl.Lineage(probs, names=clusters)
adata.obsm['stemnet_final_states'] = slin

## Save the results

In [15]:
sc.write(DATA_DIR / "benchmarking" / "stemnet" / "adata.h5ad", adata)