# Preliminaries

## Dependency notebooks

1. [MK_2020-10-16_load_data.ipynb](MK_2020-10-16_load_data.ipynb)

## Import packages

In [4]:
# import standard packages
import sys

# import single-cell packages
import scanpy as sc
import scvelo as scv

# set verbosity levels
sc.settings.verbosity = 2
scv.settings.verbosity = 3 

## Print package versions for reproducibility

In [5]:
scv.logging.print_versions()

scvelo==0.2.2  scanpy==1.6.0  anndata==0.7.4  loompy==3.0.6  numpy==1.19.2  scipy==1.5.2  matplotlib==3.3.2  sklearn==0.23.2  pandas==1.1.3  
 Your version: 		 0.2.2 
Latest version: 	 modeling


## Set up paths

In [12]:
sys.path.insert(0, "../../..")  # this depends on the notebook depth and must be adapted per notebook

from paths import DATA_DIR, CACHE_DIR

## Set up caching

Note: we use a caching extension called `scachepy` for this analysis, see [here](https://github.com/theislab/scachepy). We do this to speed up the runtime of this notebook by avoiding the most expensive computations. Below, we check whether you have scachepy installed and if you don't, then we automatically recompute all results. 

In [None]:
try:
    import scachepy
    c = scachepy.Cache(CACHE_DIR / "morris_data", separate_dirs=True)
except ImportError:
    c = None
    
use_caching = c is not None
c

## Set global parameters

In [11]:
# should cashed values be used, or recompute?
force_recompute = False

## Load the data

In [2]:
adata = sc.read(DATA_DIR / "morris_data" / "adata.h5ad')
adata

AnnData object with n_obs × n_vars = 104679 × 22630
    obs: 'batch'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

# Preprocessing

## Fitter, normalize and calculate PCA

In [3]:
scv.pp.filter_and_normalize(adata, min_shared_counts=40, n_top_genes=1500)
sc.tl.pca(adata)

Filtered out 11419 genes that are detected 40 counts (shared).
Normalized count data: X, spliced, unspliced.
Exctracted 1500 highly variable genes.
Logarithmized X.
computing moments based on connectivities
    finished (0:00:21) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)


## Calculate neighbors and velocity moments

In [None]:
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

## Recover the velocity dynamics

In [None]:
if not use_caching:
    scv.tl.recover_dynamics(adata)
else:
    c.tl.recover_dynamics(adata, force=force_recompute, fname="recover_dynamics.pickle")

recovering dynamics
... 90%

## Calculate the velocities and velocity graph

In [9]:
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)

computing velocities
    finished (0:02:53) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
    finished (0:10:35) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)


## Write the results

In [10]:
sc.write(DATA_DIR / "morris_data" / "adata_preprocessed.h5ad", adata)

TypeError: write() missing 1 required positional argument: 'adata'