(Courtesy of Martin Prete)

I've build the docker and singularity images for cell2location taking the files from github. Only thing I changed for the environment is added jupyerlab and ipykernel to create a kernel for the environment, and the Dockerfile now creates that kernel, that's on my fork.

First you need to login in to the farm:

``
ssh lg18@farm5-login
``

Go to the notebooks folder and create a file.sh with:

```
#!/usr/bin/env bash
                    
                    
bsub -q gpu-normal -M200000 \
  -G team292 \
  -R"select[mem>200000] rusage[mem=200000, ngpus_physical=1.00] span[hosts=1]"  \
  -gpu "mode=shared:j_exclusive=yes" -Is \
  /software/singularity-v3.5.3/bin/singularity exec \
  --no-home  \
  --nv \
 -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks \
  -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/data \
  /nfs/cellgeni/singularity/images/cell2location-latest.sif \
  /bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"
```

The first part, launches an interacrive job with gpu on the farm, you probably do this already with your own scripts or with the same command

Breaking it further a part it tells
- singularity to execute something `/software/singularity-v3.5.3/bin/singularity exec`
- don't mount my home folder by default `--no-home`
- use `gpus --nv`
- mount this folder as `/notebooks` inside the container `-B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks`
- launch this particular image file `/nfs/cellgeni/singularity/images/cell2location-latest.sif`
- now run bash, set my home folder to a temp folder and start jupyter `/bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"`


In [1]:
import sys
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from os import listdir
from os.path import isfile, join



data_type = 'float32'
sc.settings.set_figure_params(dpi = 100, color_map = 'RdPu', dpi_save = 100,
                              vector_friendly = True, format = 'pdf',
                              facecolor='white')


# this line forces theano to use the GPU and should go before importing cell2location
os.environ["THEANO_FLAGS"] = 'device=cuda0,floatX=' + data_type + ',force_device=True'
# if using the CPU uncomment this:
#os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
#os.environ["OMP_NUM_THREADS"] = '8'


import cell2location

from matplotlib import rcParams
import seaborn as sns

# silence scanpy that prints a lot of warnings
import warnings
# warnings.filterwarnings('ignore')

Using cuDNN version 7605 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-32GB (0000:07:00.0)


In [10]:
path = '/nfs/users/nfs_l/lg18/team292/lg18/cell2location/'
sample_IDs = ["secretory", "all", "proliferative"]
# sample_IDs = ["secretory"]

In [12]:
%%time

for sample in sample_IDs:
    print(sample)
    
    # Reading Visium data in anndata format in anndata format
    adata_raw_spatial = sc.read(path+sample+'/' + sample + '_visium_with_spatial.h5ad')
#     adata_raw_spatial.var_names_make_unique()
    sc.pp.filter_cells(adata_raw_spatial, min_genes=1000)
    # adata_raw_spatial.obs.head()
    # adata_raw_spatial.var.head()
#     # Using ENSEMBL
#     adata_raw_spatial.var['SYMBOL'] = adata_raw_spatial.var_names
#     adata_raw_spatial.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
#     adata_raw_spatial.var_names = adata_raw_spatial.var['ENSEMBL']
#     adata_raw_spatial.var.drop(columns='ENSEMBL', inplace=True)
    # Make raw
    adata_raw_spatial.raw = adata_raw_spatial.copy()
    adata_raw_spatial.obs['spotID'] = adata_raw_spatial.obs.index
    
    # Reading scRNA data in anndata format
#     scRNAseq_file = [f for f in listdir(path+sample+'/cell2location/') if isfile(join(path+sample+'/cell2location/', f)) and 'scRNAseq' in f]
    adata_raw_sc = sc.read(path+sample+'/' + sample + '_scRNAseq.h5ad')
    adata_raw_sc.raw = adata_raw_sc.copy()
    # adata_raw_sc.obs.head()
    # adata_raw_sc.var.head()
    
    # Running cell2location
    results_folder = path+sample+'/cell2location/'
    os.system('mkdir -p '+ results_folder + '/std_model')

    r = cell2location.run_cell2location(

          # Single cell reference signatures as anndata
          # (could also be data as anndata object for estimating signatures analytically - `sc_data=adata_snrna_raw`)
          sc_data=adata_raw_sc,
          # Spatial data as anndata object
          sp_data=adata_raw_spatial,

          # the column in sc_data.obs that gives cluster idenitity of each cell
          summ_sc_data_args={'cluster_col': "labels"},

          train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                      'n_iter': 30000, # Increase the number of iterations if needed (see below)

                      # Whe analysing the data that contains multiple samples,
                      # cell2location will select a model version which pools information across samples
                      # For details see https://cell2location.readthedocs.io/en/latest/cell2location.models.html#module-cell2location.models.CoLocationModelNB4E6V2
                      'sample_name_col': 'sample'}, # Column in sp_data.obs with Sample ID

          # Number of posterios samples to use for estimating parameters,
          # reduce if not enough GPU memory
          posterior_args={'n_samples': 1000},


          export_args={'path': results_folder + 'std_model/', # path where to save results
                       'run_name_suffix': sample # optinal suffix to modify the name the run
                      },

          model_kwargs={ # Prior on the number of cells, cell types and co-located combinations

                        'cell_number_prior': {
                            # Use visual inspection of the tissue image to determine
                            # the average number of cells per spot,
                            # an approximate count is good enough:
                            'cells_per_spot': 8,
                            # Prior on the number of cell types (or factors) in each spot
                            'factors_per_spot': 4,
                            # Prior on the number of correlated cell type combinations in each spot
                            'combs_per_spot': 2.5
                        },

                         # Prior on change in sensitivity between technologies
                        'gene_level_prior':{
                            # Prior on average change in expression level from scRNA-seq to spatial technology,
                            # this reflects your belief about the sensitivity of the technology in you experiment
                            'mean': 1/2,
                            # Prior on how much individual genes differ from that average,
                            # a good choice of this value should be lower that the mean
                            'sd': 1/4
                        }
          }
    )


secretory




### Summarising single cell clusters ###
### Creating model ### - time 0.02 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_13clusters_2111locations_14454genessecretory
### Training model ###


Finished [100%]: Average Loss = 1.8515e+07


[<matplotlib.lines.Line2D object at 0x7f88411569d0>]


Finished [100%]: Average Loss = 1.8515e+07


[<matplotlib.lines.Line2D object at 0x7f88eec84c50>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 21.5 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f7f92a7a290>]
[<matplotlib.lines.Line2D object at 0x7f889ec2d750>]
[<matplotlib.lines.Line2D object at 0x7f889edfe450>]
[<matplotlib.lines.Line2D object at 0x7f88adffa450>]
### Plotting posterior of W / cell locations ###
Some error in plotting with scanpy or `cell2location.plt.plot_factor_spatial()`
 IndexError('index 0 is out of bounds for axis 0 with size 0')
### Done ### - time 22.15 min
all




### Summarising single cell clusters ###
### Creating model ### - time 0.06 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_17clusters_6663locations_15393genesall
### Training model ###


Finished [100%]: Average Loss = 5.7291e+07


[<matplotlib.lines.Line2D object at 0x7f7f7647a210>]


Finished [100%]: Average Loss = 5.7291e+07


[<matplotlib.lines.Line2D object at 0x7f88d09f9c10>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))






### Sampling posterior ### - time 65.64 min


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f889ecbd790>]
[<matplotlib.lines.Line2D object at 0x7f88ae6aa9d0>]
[<matplotlib.lines.Line2D object at 0x7f88ae6b5410>]
[<matplotlib.lines.Line2D object at 0x7f88ae6db0d0>]
### Plotting posterior of W / cell locations ###
Some error in plotting with scanpy or `cell2location.plt.plot_factor_spatial()`
 IndexError('index 0 is out of bounds for axis 0 with size 0')
### Done ### - time 67.23 min
proliferative




### Summarising single cell clusters ###
### Creating model ### - time 0.04 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_14clusters_4552locations_15243genesproliferative
### Training model ###


Finished [100%]: Average Loss = 3.8556e+07


[<matplotlib.lines.Line2D object at 0x7f87ba04a710>]


Finished [100%]: Average Loss = 3.8552e+07


[<matplotlib.lines.Line2D object at 0x7f88f03af590>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 44.26 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f889f3bbf90>]
[<matplotlib.lines.Line2D object at 0x7f889a153910>]
[<matplotlib.lines.Line2D object at 0x7f889a12d410>]
[<matplotlib.lines.Line2D object at 0x7f889a2d5690>]
### Plotting posterior of W / cell locations ###




Some error in plotting with scanpy or `cell2location.plt.plot_factor_spatial()`
 IndexError('index 0 is out of bounds for axis 0 with size 0')
### Done ### - time 45.15 min
CPU times: user 1h 45min 9s, sys: 30min 11s, total: 2h 15min 21s
Wall time: 2h 16min 30s
