Example Notebook for Interactive Plotting in Scanpy
---

In this Notebook, we demonstrate the use of some interactive plotting functions which perfectly harmonize with the analysis framework [scanpy](https://scanpy.readthedocs.io/en/latest/). Please note: the interactive plots will **not load** when you view this notebook online on GitHub - you need to clone the repo to your machine and open the notebook there if you would like to see the plots.

Structure of this notebook
---
This notebok contains a typical data analysis workfolw for single cell RNA-seq data. It contains the following sections:
* Getting Started: import relevant packages, import some examplary data, annotate the data, filter out low-quality cells and normalize the count depth. This section is mostly technical, however, it demonstrates perfomance of the plotting function ```interactive_hist``` to find good filtering thresholds.
* Interactive Plotting: this section demonstrated how to use the remaining interactive plotting functions

# Getting Started

## Import Packages

In [11]:
# import standart packages
import warnings
warnings.filterwarnings(action='once')
import numpy as np
import re
import scanpy.api as sc
import os
import sys

sc.logging.print_versions()
sc.settings.verbosity = 0

scanpy==1.4 anndata==0.6.18 numpy==1.15.4 scipy==1.2.0 pandas==0.23.4 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1 


In [9]:
# add the parent directory to the python path and import the interactive plotting functions
sys.path.insert(0, os.path.dirname(os.getcwd()))
import interactive_plotting as ipl  
from bokeh.io import output_notebook
output_notebook()

In [10]:
np.random.seed(42)

## Import Data

In [12]:
# import example data
adata = sc.datasets.paul15()
adata.var_names_make_unique()
adata

... storing 'paul15_clusters' as categorical


AnnData object with n_obs × n_vars = 2730 × 3451 
    obs: 'paul15_clusters'
    uns: 'iroot'

## Additional Annotations

In [14]:
# annotate mitochondrial genes
regex = re.compile('^(mt).*', re.IGNORECASE)
mito_genes = [l for l in adata.var_names for m in [regex.search(l)] if m]
adata.var['mito'] = False
adata.var.loc[mito_genes, 'mito'] = True
print('Found {} mito genes and annotated.'.format(len(mito_genes)))

sc.pp.calculate_qc_metrics(adata, qc_vars=['mito'], inplace=True)

Found 15 mito genes and annotated.


In [15]:
# create artificial batch labels - we do this here because we would like to use batch labels later
adata.obs['batch'] = np.random.choice(['batch_1', 'batch_2'], adata.n_obs)

adata.obs['group'] = np.random.choice(['group_1', 'group_2'], adata.n_obs)
adata.obs['group'] = adata.obs['group'].astype('category')

adata.obs['plate'] = np.random.choice(['plate_1', 'plate_2', 'plate_3'], adata.n_obs)
adata.obs['plate'] = adata.obs['plate'].astype('category')

In [17]:
# plot interactive histograms
# these can be used for finding filgering threshholds
ipl.interactive_hist(adata, groups=['plate'],
                     keys=['n_genes_by_counts', 'total_counts', 'pct_counts_mito'], 
                     fill_alpha=0.3,
                     plot_width=400, plot_height=400)

## Filtering

In [18]:
# based on the histograms above, filter out low-quality cells
sc.pp.filter_cells(adata, min_genes=200)
adata = adata[adata.obs['total_counts'] < 8000].copy()
adata = adata[adata.obs['pct_counts_mito'] < 2]

## Normalization

In [19]:
# we use a standart normalization workflow
adata.raw = adata.copy()
sc.pp.recipe_zheng17(adata, plot=False)

## Embedding and Clustering

In [20]:
# compute embedding and clustering
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=7, random_state=42)
sc.tl.louvain(adata, resolution=0.45, random_state=42)
sc.tl.umap(adata, random_state=42)

  return matrix(data, dtype=dtype, copy=False)


# Interactive Plotting

## Interactive Histogram - Visualise Binning on Embedding

The following is another interactive plotting function for histograms. It allows you to define threshholds on quality variables such as the count depth. The cells falling into different bins based on these cutoffs can then be visualised in an embedding.

In [22]:
ipl.thresholding_hist(adata, key='n_counts', categories=dict(cat_1=[0, 700], cat_2=[700, 1200]))

## Highlighting Differentially Expressed Genes

In [23]:
sc.tl.rank_genes_groups(adata, groupby='louvain')
ipl.highlight_de(adata, cell_keys='batch', legend_loc='top_right')

... storing 'batch' as categorical
Defaulting to column, but this will raise an ambiguity error in a future version
  xs, ys, ks = zip(*conv_hulls.groupby(key).apply(lambda df: list(map(list, (df['x'], df['y'], df[key])))))


## Finding the Root Cell for DPT (Pseudotime Computation)

In [26]:
# when computing pseudotime (DPT), we have to choose a root cell. The following plotting function makes this 
# easy. Just zoom into the area where you would like to select the root cell and hover over the cells
# to see their indices.
sc.tl.diffmap(adata)
ipl.highlight_indices(adata, key='group', basis='diffmap', components=[1, 3])

In [32]:
# using the root cell spotted above, compute a DPT
adata.uns['iroot'] = 840
sc.tl.dpt(adata)

In [33]:
ipl.link_plot(adata, bases=['diffmap', 'umap'], components=[[1, 2], [1, 2]],
              genes=list(map(lambda r: r[0], adata.uns['rank_genes_groups']['names']))[:10],
              cutoff=True,
              key='louvain', distance='dpt', legend_loc='top_right')



IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


## Visualise Individual Gene Trends along Lineages

This plotting function is an easy way to visualise gene trends along paths. These paths correspond to differentiation trajectories and can be found by using PAGA or DPT with n_branchings>0. The following plotting function allows you to observe changes in gene expression of one path relative to another.

In [31]:
ipl.velocity_plot(adata,
                  genes=list(map(lambda r: r[0], adata.uns['rank_genes_groups']['names']))[:1],
                  paths=[['0', '1'], ['0', '2']])

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
