(Courtesy of Martin Prete)

I've build the docker and singularity images for cell2location taking the files from github. Only thing I changed for the environment is added jupyerlab and ipykernel to create a kernel for the environment, and the Dockerfile now creates that kernel, that's on my fork.

First you need to login in to the farm:

``
ssh lg18@farm5-login
``

Go to the notebooks folder and create a file.sh with:

```
#!/usr/bin/env bash
                    
                    
bsub -q gpu-normal -M200000 \
  -G team292 \
  -R"select[mem>200000] rusage[mem=200000, ngpus_physical=1.00] span[hosts=1]"  \
  -gpu "mode=shared:j_exclusive=yes" -Is \
  /software/singularity-v3.5.3/bin/singularity exec \
  --no-home  \
  --nv \
 -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks \
  -B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/data \
  /nfs/cellgeni/singularity/images/cell2location-latest.sif \
  /bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"
```

The first part, launches an interacrive job with gpu on the farm, you probably do this already with your own scripts or with the same command

Breaking it further a part it tells
- singularity to execute something `/software/singularity-v3.5.3/bin/singularity exec`
- don't mount my home folder by default `--no-home`
- use `gpus --nv`
- mount this folder as `/notebooks` inside the container `-B /nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location:/notebooks`
- launch this particular image file `/nfs/cellgeni/singularity/images/cell2location-latest.sif`
- now run bash, set my home folder to a temp folder and start jupyter `/bin/bash -c "HOME=$(mktemp -d) jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='cell2loc' --ip=0.0.0.0 --port=1234 --no-browser --allow-root"`


In [1]:
import sys
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from os import listdir
from os.path import isfile, join



data_type = 'float32'
sc.settings.set_figure_params(dpi = 100, color_map = 'RdPu', dpi_save = 100,
                              vector_friendly = True, format = 'pdf',
                              facecolor='white')


# this line forces theano to use the GPU and should go before importing cell2location
os.environ["THEANO_FLAGS"] = 'device=cuda0,floatX=' + data_type + ',force_device=True'
# if using the CPU uncomment this:
#os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
#os.environ["OMP_NUM_THREADS"] = '8'


import cell2location

from matplotlib import rcParams
import seaborn as sns

# silence scanpy that prints a lot of warnings
import warnings
# warnings.filterwarnings('ignore')

Using cuDNN version 7605 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-16GB (0000:07:00.0)


In [2]:
# pip list

In [5]:
path = '/nfs/users/nfs_l/lg18/team292/lg18/gonads/data/visium/cell2location/'
sample_IDs = ["F91","Hrv11",'females',  "F94", "Hrv11", "F83", "Hrv41", "F81"]

In [6]:
%%time

for sample in sample_IDs:
    print(sample)
    
    # Reading Visium data in anndata format in anndata format
    adata_raw_spatial = sc.read(path+sample+'/' + sample + '_visium.h5ad')
    adata_raw_spatial.raw = adata_raw_spatial.copy()
    adata_raw_spatial.obs['spotID'] = adata_raw_spatial.obs.index
    # adata_raw_spatial.obs.head()
    # adata_raw_spatial.var.head()
    
    # Reading scRNA data in anndata format
#     scRNAseq_file = [f for f in listdir(path+sample+'/cell2location/') if isfile(join(path+sample+'/cell2location/', f)) and 'scRNAseq' in f]
    adata_raw_sc = sc.read(path+sample+'/' + sample + '_scRNAseq.h5ad')
    adata_raw_sc.raw = adata_raw_sc.copy()
    # adata_raw_sc.obs.head()
    # adata_raw_sc.var.head()
    
    # Running cell2location
    results_folder = path+sample+'/cell2location/'
    os.system('mkdir -p '+ results_folder + '/std_model')

    r = cell2location.run_cell2location(

          # Single cell reference signatures as anndata
          # (could also be data as anndata object for estimating signatures analytically - `sc_data=adata_snrna_raw`)
          sc_data=adata_raw_sc,
          # Spatial data as anndata object
          sp_data=adata_raw_spatial,

          # the column in sc_data.obs that gives cluster idenitity of each cell
          summ_sc_data_args={'cluster_col': "labels"},

          train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                      'n_iter': 30000, # Increase the number of iterations if needed (see below)

                      # Whe analysing the data that contains multiple samples,
                      # cell2location will select a model version which pools information across samples
                      # For details see https://cell2location.readthedocs.io/en/latest/cell2location.models.html#module-cell2location.models.CoLocationModelNB4E6V2
                      'sample_name_col': 'sample'}, # Column in sp_data.obs with Sample ID

          # Number of posterios samples to use for estimating parameters,
          # reduce if not enough GPU memory
          posterior_args={'n_samples': 1000},


          export_args={'path': results_folder + 'std_model/', # path where to save results
                       'run_name_suffix': sample # optinal suffix to modify the name the run
                      },

          model_kwargs={ # Prior on the number of cells, cell types and co-located combinations

                        'cell_number_prior': {
                            # Use visual inspection of the tissue image to determine
                            # the average number of cells per spot,
                            # an approximate count is good enough:
#                             'cells_per_spot': 8,
                            'cells_per_spot': 20,
                            # Prior on the number of cell types (or factors) in each spot
                            'factors_per_spot': 4,
                            # Prior on the number of correlated cell type combinations in each spot
                            'combs_per_spot': 2.5
                        },

                         # Prior on change in sensitivity between technologies
                        'gene_level_prior':{
                            # Prior on average change in expression level from scRNA-seq to spatial technology,
                            # this reflects your belief about the sensitivity of the technology in you experiment
                            'mean': 1/2,
                            # Prior on how much individual genes differ from that average,
                            # a good choice of this value should be lower that the mean
                            'sd': 1/4
                        }
          }
    )


F91


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_24clusters_797locations_15186genesF91
### Training model ###


Finished [100%]: Average Loss = 1.1113e+07


[<matplotlib.lines.Line2D object at 0x7fedc21d3e90>]


Finished [100%]: Average Loss = 1.1114e+07


[<matplotlib.lines.Line2D object at 0x7fedc96f1d50>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 10.1 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fef7bfcc290>]
[<matplotlib.lines.Line2D object at 0x7fef73b64c10>]
[<matplotlib.lines.Line2D object at 0x7fef7c1c1350>]
[<matplotlib.lines.Line2D object at 0x7fef7c1c1d50>]
### Plotting posterior of W / cell locations ###




### Done ### - time 10.7 min
Hrv11


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_23clusters_1051locations_15427genesHrv11
### Training model ###


Finished [100%]: Average Loss = 9.8947e+06


[<matplotlib.lines.Line2D object at 0x7fef9fd733d0>]


Finished [100%]: Average Loss = 9.8947e+06


[<matplotlib.lines.Line2D object at 0x7fedc21e1a10>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 12.47 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fedb85537d0>]
[<matplotlib.lines.Line2D object at 0x7fef9f5bd710>]
[<matplotlib.lines.Line2D object at 0x7fedc164b9d0>]
[<matplotlib.lines.Line2D object at 0x7fedc164b650>]
### Plotting posterior of W / cell locations ###




### Done ### - time 13.04 min
females


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.05 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_25clusters_6638locations_15456genesfemales
### Training model ###


Finished [100%]: Average Loss = 6.1664e+07


[<matplotlib.lines.Line2D object at 0x7fef738ddc10>]


Finished [100%]: Average Loss = 6.1658e+07


[<matplotlib.lines.Line2D object at 0x7fef7386de90>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))






### Sampling posterior ### - time 57.36 min


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fef738e8f10>]
[<matplotlib.lines.Line2D object at 0x7fedc165dfd0>]
[<matplotlib.lines.Line2D object at 0x7fedc97c0810>]
[<matplotlib.lines.Line2D object at 0x7feccc11cd90>]
### Plotting posterior of W / cell locations ###




### Done ### - time 59.2 min
F94


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_17clusters_1465locations_13835genesF94
### Training model ###


Finished [100%]: Average Loss = 1.3877e+07


[<matplotlib.lines.Line2D object at 0x7fee968ad050>]


Finished [100%]: Average Loss = 1.388e+07


[<matplotlib.lines.Line2D object at 0x7fedb7d33d10>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 14.51 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fed80aad750>]
[<matplotlib.lines.Line2D object at 0x7fedc21258d0>]
[<matplotlib.lines.Line2D object at 0x7fefa25d73d0>]
[<matplotlib.lines.Line2D object at 0x7fefa1c96f50>]
### Plotting posterior of W / cell locations ###




### Done ### - time 15.03 min
Hrv11


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_23clusters_1051locations_15427genesHrv11
### Training model ###


Finished [100%]: Average Loss = 9.8947e+06


[<matplotlib.lines.Line2D object at 0x7fedc20306d0>]


Finished [100%]: Average Loss = 9.8946e+06


[<matplotlib.lines.Line2D object at 0x7fed861643d0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 12.53 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fefa54ea710>]
[<matplotlib.lines.Line2D object at 0x7fef8d8e8850>]
[<matplotlib.lines.Line2D object at 0x7fefa5e58590>]
[<matplotlib.lines.Line2D object at 0x7fefa5e33990>]
### Plotting posterior of W / cell locations ###




### Done ### - time 13.13 min
F83


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.03 min
### Analysis name: LocationModelLinearDependentWMultiExperiment_19clusters_4790locations_15184genesF83
### Training model ###


Finished [100%]: Average Loss = 4.041e+07


[<matplotlib.lines.Line2D object at 0x7fedb803a8d0>]


Finished [100%]: Average Loss = 4.0405e+07


[<matplotlib.lines.Line2D object at 0x7fee2f778ad0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 40.75 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fefa03022d0>]
[<matplotlib.lines.Line2D object at 0x7fef7bb03350>]
[<matplotlib.lines.Line2D object at 0x7feccbec0a10>]
[<matplotlib.lines.Line2D object at 0x7feccbec0d50>]
### Plotting posterior of W / cell locations ###




### Done ### - time 41.92 min
Hrv41


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
### Analysis name: LocationModelLinearDependentW_18clusters_502locations_15073genesHrv41
### Training model ###


Finished [100%]: Average Loss = 5.241e+06


[<matplotlib.lines.Line2D object at 0x7fedcddff110>]


Finished [100%]: Average Loss = 5.2412e+06


[<matplotlib.lines.Line2D object at 0x7fee965923d0>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 8.65 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fedb8cc8610>]
[<matplotlib.lines.Line2D object at 0x7fef7c7f7590>]
[<matplotlib.lines.Line2D object at 0x7fefa334e710>]
[<matplotlib.lines.Line2D object at 0x7fefa33582d0>]
### Plotting posterior of W / cell locations ###




### Done ### - time 9.03 min
F81


Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


### Summarising single cell clusters ###
### Creating model ### - time 0.01 min
### Analysis name: LocationModelLinearDependentW_21clusters_351locations_14668genesF81
### Training model ###


Finished [100%]: Average Loss = 3.8643e+06


[<matplotlib.lines.Line2D object at 0x7fee808f5410>]


Finished [100%]: Average Loss = 3.8643e+06


[<matplotlib.lines.Line2D object at 0x7fee2de1c210>]


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Sampling posterior ### - time 6.68 min




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=24.0), HTML(value='')))


### Saving results ###




### Ploting results ###
[<matplotlib.lines.Line2D object at 0x7fef9eec4510>]
[<matplotlib.lines.Line2D object at 0x7fef7382b690>]
[<matplotlib.lines.Line2D object at 0x7fedf8750ad0>]
[<matplotlib.lines.Line2D object at 0x7fedf8024ad0>]
### Plotting posterior of W / cell locations ###




### Done ### - time 7.05 min
CPU times: user 2h 35min 36s, sys: 38min 25s, total: 3h 14min 2s
Wall time: 3h 15min 54s
