# Scachepy - caching extension for Scanpy

This notebook shows the usage of Scachepy - a caching extension for Scanpy. What is does is basically pickles the result of an operation that takes too long to compute (or is required to be reproducible).

Please **beware** that this version is self-proclaimed beta version and that tests are being **slowly** written just now.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import scanpy as sc
import scvelo as scv
import scachepy 
import tempfile



# Data loading and cache creation

In [3]:
handle = tempfile.TemporaryDirectory()
c = scachepy.Cache(handle.name, backend='pickle')
c

Cache(backend=PickleBackend(dir='/tmp/tmp8q7net25'), ext='.pickle')

Currently, only `pickle` backend is implemented. To change the backend directory, do the following:

```python
c.backend.dir = ...
```

In [4]:
!rm -rf $c.backend.dir/*

In [5]:
adata = sc.datasets.paul15()
adata.var_names_make_unique()



... storing 'paul15_clusters' as categorical


# Usage principles

In [6]:
c.pp.pca(adata)

No cache found, computing values.


In [7]:
c.pp.pca(adata, force=False, verbose=True)

Loading data from: `pca.pickle`.


Difference between the `c.pp.pca` and `c.pp.pcarr` is that the former operates on `anndata.AnnData`, whereas the latter takes `np.ndarray` and caches only the `X_pca` attribute of `obsm`.

In [8]:
_ = c.pp.pcarr(adata.X)
_.shape

No cache found, computing values.


(2730, 50)

## Extra arguments

Each functions has some useful arguments:
+ `verbose=...` - be verbose (default: `True`)
+ `fname=...` - overrides the default filename
+ `force=...` - whether to force recaching (default: `False`)

In [9]:
c.pp.neighbors(adata, force=True, fname='foo', verbose=False)

We also need to specify the used `fname` to load it back again.

In [10]:
c.pp.neighbors(adata, fname='foo')

Loading data from: `foo.pickle`.


## Copy argument

These functions also work with `copy=True`, where applicable.

In [11]:
assert 'louvain' not in adata.uns

In [12]:
adata_louvain = c.tl.louvain(adata, copy=True)

No cache found, computing values.


In [13]:
assert 'louvain' not in adata.uns
assert 'louvain' in adata_louvain.uns

## Default functions

Caching functions usually cache the values based on their default function (and the keys specified). However, they can be used more flexibly, such as using a custom function (note that this example is just a demonstration):

In [14]:
def test(*args, **kwargs):
    print('Look Ma, no hands!')
    return sc.pp.log1p(args[0])

By default, `c.pp.expression` has no default function, but we can easily change that (the function needs to be supplied as the first argument, even before the `adata` object).

In [15]:
adata.X

array([[0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 0., 1., 0.],
       [1., 0., 3., ..., 2., 3., 0.],
       ...,
       [0., 0., 1., ..., 0., 0., 0.],
       [3., 0., 3., ..., 0., 1., 0.],
       [0., 0., 4., ..., 1., 1., 1.]], dtype=float32)

In [16]:
c.pp.expression(test, adata, force=True, fname='test')

Forced computing values.
Look Ma, no hands!


In [17]:
c.pp.expression(adata, fname='test')

Loading data from: `test.pickle`.


The values have changed as expected. `anndata.Anndata` objects is either the first one in `args` or under key `adata` in `kwargs`.

In [18]:
adata.X

array([[0.       , 0.       , 0.       , ..., 0.       , 1.0986123,
        0.       ],
       [0.       , 0.       , 0.6931472, ..., 0.       , 0.6931472,
        0.       ],
       [0.6931472, 0.       , 1.3862944, ..., 1.0986123, 1.3862944,
        0.       ],
       ...,
       [0.       , 0.       , 0.6931472, ..., 0.       , 0.       ,
        0.       ],
       [1.3862944, 0.       , 1.3862944, ..., 0.       , 0.6931472,
        0.       ],
       [0.       , 0.       , 1.609438 , ..., 0.6931472, 0.6931472,
        0.6931472]], dtype=float32)

In [19]:
!ls $c.backend.dir

foo.pickle  louvain.pickle  pca_arr.pickle  pca.pickle	test.pickle


# Implemented caching functions

Here are all the functions which have caching version implemented. Future improvements will mostly likely be adding the default docstrings, and simplifying logic/refactoring the code and possibly adding support for `.h5ad` as storage backend.

In [20]:
c.pp

pp(pcarr=<caching function of "scanpy.preprocessing._simple.pca">, expression=<caching function of "scachepy.scachepy.<lambda>">, moments=<caching function of "scvelo.preprocessing.moments.moments">, pca=<caching function of "scanpy.preprocessing._simple.pca">, neighbors=<caching function of "scanpy.neighbors.neighbors">)

In [21]:
c.tl

tl(louvain=<caching function of "scanpy.tools._louvain.louvain">, umap=<caching function of "scanpy.tools._umap.umap">, diffmap=<caching function of "scanpy.tools._diffmap.diffmap">, paga=<caching function of "scanpy.tools._paga.paga">, velocity=<caching function of "scvelo.tools.velocity.velocity">, velocity_graph=<caching function of "scvelo.tools.velocity_graph.velocity_graph">, draw_graph=<caching function of "scanpy.tools._draw_graph.draw_graph">)

In [22]:
c.pl

pl()

## Creating caching functions

Creating new functions is relatively simple, as seen below. Note that `_cache1` is only present because `dict` needs to have unique keys, it gets stripped down down the line (it can be specified in other ways too, but this is the most convenient one).

In [23]:
c.foo = c.cache(dict(uns='pca',                                                                                                                                                                                      
                     uns_cache1='neighbors',                                                                                                                                                                         
                     obsm='X_pca',                                                                                                                                                                                   
                     varm='PCs',                                                                                                                                                                                     
                     layers='Ms',                                                                                                                                                                                    
                     layers_cache1='Mu'),                                                                                                                                                                            
                default_fn=scv.pp.moments,                                                                                                                                                                           
                default_fname='moments')
c.foo

<caching function of "scvelo.preprocessing.moments.moments">