Example Notebook for Interactive Plotting in Scanpy
---

In this Notebook, we demonstrate the use of some interactive plotting functions which perfectly harmonize with the analysis framework [scanpy](https://scanpy.readthedocs.io/en/latest/). Please note: the interactive plots will **not load** when you view this notebook online on GitHub - you need to clone the repo to your machine and run the notebook there if you would like to see the plots.

Structure of this notebook
---
This notebok contains typical data analysis workflow for single cell RNA-seq data. It contains the following sections:
* Getting Started: import relevant packages, import some examplary data, annotate the data, filter out low-quality cells and normalize the count depth. This section is mostly technical, however, it demonstrates the perfomance of the plotting function ```interactive_hist``` when searching for filtering thresholds.
* Interactive Plotting: this section demonstrates how to use the remaining interactive plotting functions

# Getting Started

## Import Packages

In [1]:
import warnings
warnings.filterwarnings(action='ignore')

import numpy as np
import scanpy.api as sc
import re

sc.logging.print_versions()
sc.settings.verbosity = 0



scanpy==1.4.4.post1 anndata==0.6.22.post1 umap==0.3.8 numpy==1.17.2 scipy==1.3.1 pandas==0.25.1 scikit-learn==0.21.3 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1


In [2]:
import interactive_plotting as ipl

import holoviews as hv
hv.extension('bokeh')

from bokeh.io import output_notebook
output_notebook()

In [3]:
np.random.seed(42)

## Import Example Data

In [4]:
adata = sc.datasets.paul15()
adata.var_names_make_unique()
adata

... storing 'paul15_clusters' as categorical
Trying to set attribute `.uns` of view, making a copy.


AnnData object with n_obs × n_vars = 2730 × 3451 
    obs: 'paul15_clusters'
    uns: 'iroot'

## Additional Annotations

Annotate mitochondrial genes.

In [5]:
regex = re.compile('^(mt).*', re.IGNORECASE)
mito_genes = [l for l in adata.var_names for m in [regex.search(l)] if m]
adata.var['mito'] = False
adata.var.loc[mito_genes, 'mito'] = True
print('Found {} mito genes and annotated.'.format(len(mito_genes)))

sc.pp.calculate_qc_metrics(adata, qc_vars=['mito'], inplace=True)

Found 15 mito genes and annotated.


Here we create artificial batch labels - we do this here because we would like to use batch labels later.

In [6]:
adata.obs['batch'] = np.random.choice(['batch_1', 'batch_2'], adata.n_obs)

adata.obs['group'] = np.random.choice(['group_1', 'group_2'], adata.n_obs)
adata.obs['group'] = adata.obs['group'].astype('category')

adata.obs['plate'] = np.random.choice(['plate_1', 'plate_2', 'plate_3'], adata.n_obs)
adata.obs['plate'] = adata.obs['plate'].astype('category')

Plot interactive histograms.** These can be used for finding filtering threshholds.

## Filtering

Based on the histograms above, filter out low-quality cells.

In [7]:
sc.pp.filter_cells(adata, min_genes=200)
adata = adata[adata.obs['total_counts'] < 8000].copy()
adata = adata[adata.obs['pct_counts_mito'] < 2]

## Normalization

We use a standard normalization workflow.

In [8]:
adata.raw = adata.copy()
sc.pp.recipe_zheng17(adata, plot=False)

## Embedding and Clustering

In [9]:
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=7, random_state=42)
sc.tl.louvain(adata, resolution=0.45, random_state=42)
sc.tl.umap(adata, random_state=42)

In [10]:
sc.tl.dpt(adata)

## General scatterplot

This is the most general scaterplot - `x` and `y` options can be either indices, gene names, components. Here's an illustration `≻`:

In [11]:
ipl.ex.scatter(adata, 'pca:1', 'umap:0', color='dpt_pseudotime', subsample='decimate', keep_frac=0.8,
               perc=[0, 95])

Gene vs. gene, note the jittering applied:

In [12]:
ipl.ex.scatter(adata, adata.var_names[0], adata.var_names[1], jitter=(0.1, 0.1),
               color='louvain', subsample='datashade', keep_frac=0.8, use_raw=True)

It can even plot `DPT`! Note that `y` doesn't have to be a gene in this case.

In [13]:
ipl.ex.scatter(adata, None, adata.var_names[121], order_key='dpt_pseudotime',
               hover_keys=['louvain', 'dpt_pseudotime'],
               color='paul15_clusters', subsample='decimate', keep_frac=0.8, use_raw=False)

# Heatmap

By default, it's just a innocent-looking heatmap, nothing special. I would perhaps add a dropdown to select `.raw` or processed, if available.

In [14]:
ipl.ex.heatmap(adata, adata.var_names[:100], use_raw=True)

However, if `show_highlight=True`, second one appears - it's a highlight of what's selected in the first one when using box-select. Note that the ranges for colorbar also change:

In [15]:
ipl.ex.heatmap(adata, adata.var_names[:100], agg_fns=['mean', 'var', 'min', 'max'],
               show_highlight=True)

When clicking the second heatmap and `show_scatter=True`, a scatterplot will be created based on the clicked cell. Type of the scatterplot depends on the argument `compare={'basis', 'order', 'genes'}`.

* if `basis`, plots the gene expression of clicked gene on first 2 components of selected basis
* if `order`, plot the gene expression over specified ordering (e.g. `DPT`)
* if `genes`, plots the clicked gene vs the selected one from the drop-down menu

All these options can be done in group-wise or group-agnostic fasion. The former means that only cells belonging to the group of clicked cell are visualized, the latter means all cells in second heatmap's groups are plotted. I.e., if second heatmap contains all groups and `groupwise=False`, you should see all the groups in the scatterplot, if it contains only some subset, you should see the subset. This is interesting namely when `compare='basis'`.

In [16]:
ipl.ex.heatmap(adata, adata.var_names[:50], use_raw=False, agg_fns=['mean', 'var', 'min', 'max'],
               compare='order', order_keys=['dpt_pseudotime'],
               show_highlight=True, show_scatter=True,
               hover_keys=['louvain', 'dpt_pseudotime'])

In [17]:
ipl.ex.heatmap(adata, adata.var_names[:50], use_raw=False, agg_fns=['mean', 'var', 'min', 'max'],
               compare='basis', order_keys=['dpt_pseudotime'],
               show_highlight=True, show_scatter=True,
               hover_keys=['louvain', 'dpt_pseudotime'])

If you have `groupwise=True`, you should see cells only corresponding to the clicked cell's group, otherwise
all the groups selected from the second heatmap.

In [18]:
ipl.ex.heatmap(adata, adata.var_names[:100], group_key='paul15_clusters')