In the notebook, we preprocess the [Morris data](https://doi.org/10.1038/s41586-018-0744-4) and compute the velocities using scVelo.

# Preliminaries

## Dependency notebooks

1. [MK_2020-10-16_merge_data.ipynb](MK_2020-10-16_merge_data.ipynb) - only if merging the data (otherwise we download the merged data from figshare)

## Import packages

In [1]:
# import standard packages
import sys

# import single-cell packages
import scanpy as sc
import scvelo as scv

# set verbosity levels
sc.settings.verbosity = 2
scv.settings.verbosity = 3 

## Print package versions for reproducibility

In [2]:
scv.logging.print_versions()

scvelo==0.2.2  scanpy==1.6.0  anndata==0.7.4  loompy==3.0.6  numpy==1.19.2  scipy==1.5.2  matplotlib==3.3.2  sklearn==0.23.2  pandas==1.1.3  


## Set up paths

In [3]:
sys.path.insert(0, "../../..")  # this depends on the notebook depth and must be adapted per notebook

from paths import DATA_DIR, CACHE_DIR

## Set up caching

Note: we use a caching extension called `scachepy` for this analysis, see [here](https://github.com/theislab/scachepy). We do this to speed up the runtime of this notebook by avoiding the most expensive computations. Below, we check whether you have scachepy installed and if you don't, then we automatically recompute all results. 

In [4]:
try:
    import scachepy
    c = scachepy.Cache(CACHE_DIR / "morris_data", separate_dirs=True)
except ImportError:
    c = None
    
use_caching = c is not None
c

Cache(root=/home/icb/marius.lange/python_projects/cellrank_reproducibility/cache/morris_data, ext='.pickle', compression='None')

## Set global parameters

In [5]:
# should cashed values be used, or recompute?
force_recompute = False

## Load the data

In [6]:
adata = sc.read(DATA_DIR / "morris_data" / "adata.h5ad",
                backup_url="https://ndownloader.figshare.com/files/25120694?private_link=a187bbb4aa21f7223523")
adata

AnnData object with n_obs × n_vars = 104679 × 22630
    obs: 'batch'
    layers: 'spliced', 'unspliced'

# Preprocessing

## Filter, normalize and calculate PCA

In [7]:
scv.pp.filter_and_normalize(adata, min_shared_counts=40, n_top_genes=1500)
sc.tl.pca(adata)

Filtered out 11419 genes that are detected 40 counts (shared).
Normalized count data: X, spliced, unspliced.
Exctracted 1500 highly variable genes.
Logarithmized X.
computing PCA
    on highly variable genes
    with n_comps=50
    finished (0:00:19)


## Calculate neighbors and velocity moments

In [8]:
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

computing neighbors
    using 'X_pca' with n_pcs = 30
    finished (0:00:45)
computing moments based on connectivities
    finished (0:00:10) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)


## Recover the velocity dynamics

In [None]:
if not use_caching:
    scv.tl.recover_dynamics(adata)
else:
    c.tl.recover_dynamics(adata, force=force_recompute, fname="recover_dynamics.pickle")

No cache found in `recover_dynamics.pickle.pickle`, computing values.
recovering dynamics
... 54%

## Calculate the velocities and velocity graph

In [None]:
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)

## Write the results

In [None]:
sc.write(DATA_DIR / "morris_data" / "adata_preprocessed.h5ad", adata)