(Courtesy of Martin Prete)

I've build the docker and singularity images for cell2location taking the files from github. Only thing I changed for the environment is added jupyerlab and ipykernel to create a kernel for the environment, and the Dockerfile now creates that kernel, that's on my fork.

First you need to login in to the farm:

``
ssh lg18@farm5-login
``

Go to the notebooks folder and create a file.sh with:

```
#!/usr/bin/env bash
                    
                    
bsub -q gpu-normal -M200000 \
  -G team292 \
  -R"select[mem>200000] rusage[mem=200000, ngpus_physical=1.00] span[hosts=1]"  \
  -gpu "mode=shared:j_exclusive=yes" -Is \
  /software/singularity-v3.5.3/bin/singularity exec \
  --no-home  \
  --nv \
 -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks \
  -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/data \
  /nfs/cellgeni/singularity/images/cell2location-latest.sif \
  /bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"
```

The first part, launches an interacrive job with gpu on the farm, you probably do this already with your own scripts or with the same command

Breaking it further a part it tells
- singularity to execute something `/software/singularity-v3.5.3/bin/singularity exec`
- don't mount my home folder by default `--no-home`
- use `gpus --nv`
- mount this folder as `/notebooks` inside the container `-B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks`
- launch this particular image file `/nfs/cellgeni/singularity/images/cell2location-latest.sif`
- now run bash, set my home folder to a temp folder and start jupyter `/bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"`


In [1]:
import sys
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from os import listdir
from os.path import isfile, join



data_type = 'float32'
sc.settings.set_figure_params(dpi = 100, color_map = 'RdPu', dpi_save = 100,
                              vector_friendly = True, format = 'pdf',
                              facecolor='white')


# this line forces theano to use the GPU and should go before importing cell2location
os.environ["THEANO_FLAGS"] = 'device=cuda0,floatX=' + data_type + ',force_device=True'
# if using the CPU uncomment this:
#os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
#os.environ["OMP_NUM_THREADS"] = '8'


import cell2location

from matplotlib import rcParams
import seaborn as sns

# silence scanpy that prints a lot of warnings
import warnings
# warnings.filterwarnings('ignore')

Using cuDNN version 7605 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-32GB (0000:62:00.0)


In [2]:
# pip list

In [3]:
path = '/nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location/'
sample_IDs = ['Hrv15', 'Hrv69','Hrv3', 'Hrv27', "Hrv13", "Hrv41", "F81",
              'Hrv58', "F91","Hrv11", "F83", "F94"]

In [5]:
%%time

for sample in sample_IDs:
    print(sample)
    
    # Reading Visium data in anndata format in anndata format
    adata_raw_spatial = sc.read(path+sample+'/' + sample + '_visium.h5ad')
    adata_raw_spatial.raw = adata_raw_spatial.copy()
    adata_raw_spatial.obs['spotID'] = adata_raw_spatial.obs.index
    # adata_raw_spatial.obs.head()
    # adata_raw_spatial.var.head()
    
    # Reading scRNA data in anndata format
#     scRNAseq_file = [f for f in listdir(path+sample+'/cell2location/') if isfile(join(path+sample+'/cell2location/', f)) and 'scRNAseq' in f]
    adata_raw_sc = sc.read(path+sample+'/' + sample + '_scRNAseq.h5ad')
    adata_raw_sc.raw = adata_raw_sc.copy()
    # adata_raw_sc.obs.head()
    # adata_raw_sc.var.head()
    
    # Running cell2location
    results_folder = path+sample+'/cell2location/'
    os.system('mkdir -p '+ results_folder + '/std_model')

    r = cell2location.run_cell2location(

          # Single cell reference signatures as anndata
          # (could also be data as anndata object for estimating signatures analytically - `sc_data=adata_snrna_raw`)
          sc_data=adata_raw_sc,
          # Spatial data as anndata object
          sp_data=adata_raw_spatial,

          # the column in sc_data.obs that gives cluster idenitity of each cell
          summ_sc_data_args={'cluster_col': "labels"},

          train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                      'n_iter': 30000, # Increase the number of iterations if needed (see below)

                      # Whe analysing the data that contains multiple samples,
                      # cell2location will select a model version which pools information across samples
                      # For details see https://cell2location.readthedocs.io/en/latest/cell2location.models.html#module-cell2location.models.CoLocationModelNB4E6V2
                      'sample_name_col': 'sample'}, # Column in sp_data.obs with Sample ID

          # Number of posterios samples to use for estimating parameters,
          # reduce if not enough GPU memory
          posterior_args={'n_samples': 1000},


          export_args={'path': results_folder + 'std_model/', # path where to save results
                       'run_name_suffix': sample # optinal suffix to modify the name the run
                      },

          model_kwargs={ # Prior on the number of cells, cell types and co-located combinations

                        'cell_number_prior': {
                            # Use visual inspection of the tissue image to determine
                            # the average number of cells per spot,
                            # an approximate count is good enough:
#                             'cells_per_spot': 8,
                            'cells_per_spot': 20,
                            # Prior on the number of cell types (or factors) in each spot
                            'factors_per_spot': 4,
                            # Prior on the number of correlated cell type combinations in each spot
                            'combs_per_spot': 2.5
                        },

                         # Prior on change in sensitivity between technologies
                        'gene_level_prior':{
                            # Prior on average change in expression level from scRNA-seq to spatial technology,
                            # this reflects your belief about the sensitivity of the technology in you experiment
                            'mean': 1/2,
                            # Prior on how much individual genes differ from that average,
                            # a good choice of this value should be lower that the mean
                            'sd': 1/4
                        }
          }
    )


Hrv15


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_1experiments_13clusters_1338locations_14284genesHrv15
### Training model ###


Finished [100%]: Average Loss = 1.476e+07


[<matplotlib.lines.Line2D object at 0x7f37064f48d0>]


Finished [100%]: Average Loss = 1.476e+07


[<matplotlib.lines.Line2D object at 0x7f3974043cd0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))






### Sampling posterior ### - time 14.64 min


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f3705e4fc10>]
[<matplotlib.lines.Line2D object at 0x7f396cf66710>]
[<matplotlib.lines.Line2D object at 0x7f39d7b3e4d0>]
[<matplotlib.lines.Line2D object at 0x7f39d7b3ec50>]
### Plotting posterior of W / cell locations ###
### Done ### - time 15.51 min
Hrv69


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.02 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_21clusters_3765locations_16078genesHrv69
### Training model ###


Finished [100%]: Average Loss = 3.176e+07


[<matplotlib.lines.Line2D object at 0x7f39d86a2710>]


Finished [100%]: Average Loss = 3.1762e+07


[<matplotlib.lines.Line2D object at 0x7f39dae4dfd0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))






### Sampling posterior ### - time 39.15 min


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f38c5df1490>]
[<matplotlib.lines.Line2D object at 0x7f36f57cea90>]
[<matplotlib.lines.Line2D object at 0x7f39d8986610>]
[<matplotlib.lines.Line2D object at 0x7f39f80c8d90>]
### Plotting posterior of W / cell locations ###
### Done ### - time 41.68 min
Hrv3


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_1experiments_14clusters_759locations_15412genesHrv3
### Training model ###


Finished [100%]: Average Loss = 9.0513e+06


[<matplotlib.lines.Line2D object at 0x7f370b85ad50>]


Finished [100%]: Average Loss = 9.0512e+06


[<matplotlib.lines.Line2D object at 0x7f370b245b50>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 10.61 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f3a063dedd0>]
[<matplotlib.lines.Line2D object at 0x7f39da3e7190>]
[<matplotlib.lines.Line2D object at 0x7f3974c10390>]
[<matplotlib.lines.Line2D object at 0x7f39d78f8dd0>]
### Plotting posterior of W / cell locations ###
### Done ### - time 11.39 min
Hrv27


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_1experiments_15clusters_1150locations_14107genesHrv27
### Training model ###


Finished [100%]: Average Loss = 7.9492e+06


[<matplotlib.lines.Line2D object at 0x7f36f1623190>]


Finished [100%]: Average Loss = 7.9461e+06


[<matplotlib.lines.Line2D object at 0x7f30ea2ea810>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 13.12 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f36ed15c350>]
[<matplotlib.lines.Line2D object at 0x7f3a023edb90>]
[<matplotlib.lines.Line2D object at 0x7f3a01d37810>]
[<matplotlib.lines.Line2D object at 0x7f3a01d3fe10>]
### Plotting posterior of W / cell locations ###
### Done ### - time 14.03 min
Hrv13


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_1experiments_18clusters_448locations_15344genesHrv13
### Training model ###


Finished [100%]: Average Loss = 4.6538e+06


[<matplotlib.lines.Line2D object at 0x7f39d862a1d0>]


Finished [100%]: Average Loss = 4.6535e+06


[<matplotlib.lines.Line2D object at 0x7f36f79913d0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 7.81 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f39d6ec0210>]
[<matplotlib.lines.Line2D object at 0x7f39ddd65850>]
[<matplotlib.lines.Line2D object at 0x7f39d8d19310>]
[<matplotlib.lines.Line2D object at 0x7f36f5f57fd0>]
### Plotting posterior of W / cell locations ###
### Done ### - time 8.67 min
Hrv41


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
### Analysis name: LocationModelLinearDependentW_1experiments_13clusters_502locations_15097genesHrv41
### Training model ###


Finished [100%]: Average Loss = 5.2509e+06


[<matplotlib.lines.Line2D object at 0x7f396bf73a10>]


Finished [100%]: Average Loss = 5.2509e+06


[<matplotlib.lines.Line2D object at 0x7f373a1e9e10>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 8.31 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f3a04e74210>]
[<matplotlib.lines.Line2D object at 0x7f34dd09d610>]
[<matplotlib.lines.Line2D object at 0x7f3a0242ab90>]
[<matplotlib.lines.Line2D object at 0x7f3a04b52050>]
### Plotting posterior of W / cell locations ###
### Done ### - time 9.0 min
F81


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_1experiments_17clusters_351locations_15047genesF81
### Training model ###


Finished [100%]: Average Loss = 3.889e+06


[<matplotlib.lines.Line2D object at 0x7f38c8d25f10>]


Finished [100%]: Average Loss = 3.8891e+06


[<matplotlib.lines.Line2D object at 0x7f3a03c24f90>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 7.18 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f38c523ac50>]
[<matplotlib.lines.Line2D object at 0x7f3a062de950>]
[<matplotlib.lines.Line2D object at 0x7f396b03cad0>]
[<matplotlib.lines.Line2D object at 0x7f377cb26110>]
### Plotting posterior of W / cell locations ###
### Done ### - time 7.9 min
Hrv58


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_21clusters_2963locations_16078genesHrv58
### Training model ###


Finished [100%]: Average Loss = 1.8497e+07


[<matplotlib.lines.Line2D object at 0x7f39d94c6050>]


Finished [100%]: Average Loss = 1.8497e+07


[<matplotlib.lines.Line2D object at 0x7f377cdfa0d0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 31.59 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f36f11011d0>]
[<matplotlib.lines.Line2D object at 0x7f3a07c3a2d0>]
[<matplotlib.lines.Line2D object at 0x7f370589a390>]
[<matplotlib.lines.Line2D object at 0x7f370589aed0>]
### Plotting posterior of W / cell locations ###
### Done ### - time 34.0 min
F91


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.02 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_22clusters_797locations_15288genesF91
### Training model ###


Finished [100%]: Average Loss = 1.1119e+07


[<matplotlib.lines.Line2D object at 0x7f3969816c10>]


Finished [100%]: Average Loss = 1.1119e+07


[<matplotlib.lines.Line2D object at 0x7f397bf8b310>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 10.91 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f39f5e40e50>]
[<matplotlib.lines.Line2D object at 0x7f30e9e28d10>]
[<matplotlib.lines.Line2D object at 0x7f3a069d0490>]
[<matplotlib.lines.Line2D object at 0x7f3a069d0350>]
### Plotting posterior of W / cell locations ###
### Done ### - time 12.7 min
Hrv11


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_21clusters_1051locations_16065genesHrv11
### Training model ###


Finished [100%]: Average Loss = 9.9704e+06


[<matplotlib.lines.Line2D object at 0x7f3a0261ac90>]


Finished [100%]: Average Loss = 9.9708e+06


[<matplotlib.lines.Line2D object at 0x7f397b7a92d0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 14.1 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f396969c510>]
[<matplotlib.lines.Line2D object at 0x7f373a3a2510>]
[<matplotlib.lines.Line2D object at 0x7f36f6ea5110>]
[<matplotlib.lines.Line2D object at 0x7f36f6ea5390>]
### Plotting posterior of W / cell locations ###
### Done ### - time 15.74 min
F83


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.02 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_21clusters_4790locations_15228genesF83
### Training model ###


Finished [100%]: Average Loss = 4.0417e+07


[<matplotlib.lines.Line2D object at 0x7f39dc5841d0>]


Finished [100%]: Average Loss = 4.0416e+07


[<matplotlib.lines.Line2D object at 0x7f30e8060390>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 45.55 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f3a04b80610>]
[<matplotlib.lines.Line2D object at 0x7f38c6d30750>]
[<matplotlib.lines.Line2D object at 0x7f39d7b1d310>]
[<matplotlib.lines.Line2D object at 0x7f38c8ce3490>]
### Plotting posterior of W / cell locations ###
### Done ### - time 48.29 min
F94


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_2experiments_12clusters_1465locations_14203genesF94
### Training model ###


Finished [100%]: Average Loss = 1.3945e+07


[<matplotlib.lines.Line2D object at 0x7f39f803ae90>]


Finished [100%]: Average Loss = 1.3945e+07


[<matplotlib.lines.Line2D object at 0x7f39d830fdd0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 16.23 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###
### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7f395d3b0b10>]
[<matplotlib.lines.Line2D object at 0x7f3828206390>]
[<matplotlib.lines.Line2D object at 0x7f34dd180050>]
[<matplotlib.lines.Line2D object at 0x7f395b4cabd0>]
### Plotting posterior of W / cell locations ###
### Done ### - time 17.35 min
CPU times: user 3h 4min 8s, sys: 51min 11s, total: 3h 55min 19s
Wall time: 3h 57min 50s
