 ![CellphoneDB Logo](https://www.cellphonedb.org/images/cellphonedb_logo_33.png) | CellphoneDB is a publicly available repository of curated receptors, ligands and their interactions.

# CellPhoneDB method 1

In this example we are using the method 1 (`cpdb_analysis_method`) to study how cell-cell interactions change between a subset of immune cells and trophoblast cells as the trophoblast differentiate and invade the maternal uterus. This method will calculate the mean expression of the interacting partners (proteins participating in the interaction) that are expressed in more than `threshold` percent of cells at each cluster. 

This notebook assumes that you either know how two download CellPhoneDB database or to create your own database. If this is not the case, please check `T0_BuildDBfromFiles.ipynb` or `T0_DownloadDB.ipynb`. In this notebook we will explain how to run CellPhoneDB for the **statistical method**.

> This method does not test for interaction significance, if you need this please use method 2 `cpdb_statistical_analysis_method`.

### Check python version

In [1]:
import pandas as pd
import sys
import os

pd.set_option('display.max_columns', 100)
os.chdir('/home/jovyan/cpdb_tutorial')

Checking that environment contains a Python >= 3.8 as required by CellPhoneDB.

In [2]:
print(sys.version)

3.9.13 (main, Oct 13 2022, 21:15:33) 
[GCC 11.2.0]


___
### Install CellPhoneDB

Installing last version of CellPhoneDB in the current conda enviroment. \
Remove the `--quiet` flag in case you want to see a detailed description of the installation process.

> pip install --quiet cellphonedb

___
### Input files
The statistial method accepts 4 input files (3 mandatory).
- **cpdb_file_path**: (mandatory) path to the database `cellphonedb.zip`.
- **meta_file_path**: (mandatory) path to the meta file linking cell barcodes to cluster labels `metadata.tsv`.
- **counts_file_path**: (mandatory) paths to normalized counts file (not z-transformed), either in text format or h5ad (recommended) `normalised_log_counts.h5ad`.
- **microenvs_file_path** (optional) path to microenvironment file that groups cell types/clusters by microenvironments. When providing a microenvironment file, CellphoneDB will restrict the interactions to those cells within the microenvironment.

The `microenvs_file_path` content will depend on the biological question that the researcher wants to answer.

> In this **example** we are studying how cell-cell interactions change between a subset of immune cells and trophoblast cells as the trophoblast differentiate and invade the maternal uterus. This module will randomly permute the cluster labels of all cells whitin each microenvironement (`microenvs_file_path`) 1,000 times (default) and determine the mean of the average receptor expression level in a cluster and the average ligand expression level in the interacting cluster. Then, we will obtain a P-value for the likelihood of cell-type specificity of a given receptor–ligand complex.

In [3]:
cpdb_file_path = 'db/v4.1.0/cellphonedb.zip'
meta_file_path = 'data/metadata.tsv'
counts_file_path = 'data/normalised_log_counts.h5ad'
microenvs_file_path = 'data/microenvironment.tsv'
out_path = 'results/method1'

### Inspect input files

<span style="color:green">**1)**</span> The **metadata** file is compossed of two columns:
- **barcode_sample**: this column indicates the barcode of each cell in the experiment.
- **cell_type**: this column denotes the cell label assigned.

In [4]:
metadata = pd.read_csv(meta_file_path, sep = '\t')
metadata.head(3)

Unnamed: 0,barcode_sample,cell_type
0,AGCGATTAGTCTAACC-1_Pla_HDBR10917733,B_cells
1,ATCCGTGAGGCTAGAA-1_Pla_Camb10714918,B_cells
2,AGTAACCCATTAAAGG-1_Pla_HDBR10917733,B_cells


<span style="color:green">**2)**</span>  The **counts** files is a scanpy h5ad object. The dimensions and order of this object must coincide with the dimensions of the metadata file (i.e. must have the same number of cells in both files).

In [5]:
import anndata

adata = anndata.read_h5ad(counts_file_path)
adata.shape

(3312, 30800)

Check barcodes in metadata and counts are the same.

In [6]:
list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()

True

<span style="color:green">**3)**</span> **Micronevironments** defines the cell types that belong to a a given microenvironemnt. CellPhoneDB will only calculate interactions between cells that belong to a given microenvironment. In this file we are defining two microenvionments.

In [7]:
microenv = pd.read_csv(microenvs_file_path,
                       sep = '\t')
microenv.head(3)

Unnamed: 0,cell_type,microenvironment
0,PV MMP11,Env1
1,PV MYH11,Env1
2,PV STEAP4,Env1


Displaying cells grouped per microenvironment

In [8]:
microenv.groupby('microenvironment', group_keys = False)['cell_type'] \
    .apply(lambda x : list(x.value_counts().index))

microenvironment
Env1    [PV MMP11, PV MYH11, PV STEAP4, EVT_1, EVT_2, ...
Name: cell_type, dtype: object

____
### Run basic analysis
The output of this method will be saved in `output_path` and also returned to the predefined variables.

In [9]:
from cellphonedb.src.core.methods import cpdb_analysis_method

means, deconvoluted = cpdb_analysis_method.call(
    cpdb_file_path = cpdb_file_path,           # mandatory: CellPhoneDB database zip file.
    meta_file_path = meta_file_path,           # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,       # mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',               # defines the gene annotation in counts matrix.
    output_path = out_path,                    # Path to save results    microenvs_file_path = None,
    separator = '|',                           # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    threshold = 0.1,                           # defines the min % of cells expressing a gene for this to be employed in the analysis.
    result_precision = 3,                      # Sets the rounding for the mean values in significan_means.
    debug = False,                             # Saves all intermediate tables emplyed during the analysis in pkl format.
    output_suffix = None                       # Replaces the timestamp in the output files by a user defined string in the  (default: None)
)

[ ][CORE][08/03/23-14:35:24][INFO] [Non Statistical Method] Threshold:0.1 Precision:3
Reading user files...
The following user files were loaded successfully:
data/normalised_log_counts.h5ad
data/metadata.tsv
[ ][CORE][08/03/23-14:35:27][INFO] Running Real Analysis
[ ][CORE][08/03/23-14:35:27][INFO] Building results
Saved means_result to results/method1/simple_analysis_means_result_03_08_2023_14:35:28.txt
Saved deconvoluted_result to results/method1/simple_analysis_deconvoluted_result_03_08_2023_14:35:28.txt


___
### Description of output files

**Means** fields:
- **id_cp_interaction**: Unique CellPhoneDB identifier for each interaction stored in the database.
- **interacting_pair**: Name of the interacting pairs separated by “|”.
- **partner A or B**: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)
- **gene A or B**: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.
- **secreted**: True if one of the partners is secreted.
- **Receptor A or B**: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.
- **annotation_strategy**: Curated if the interaction was annotated by the CellPhoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.
- **is_integrin**: True if one of the partners is integrin.
- **means**: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0.

In [10]:
means.head(3)

Unnamed: 0,id_cp_interaction,interacting_pair,partner_a,partner_b,gene_a,gene_b,secreted,receptor_a,receptor_b,annotation_strategy,is_integrin,B_cells|B_cells,B_cells|DC,B_cells|EVT_1,B_cells|EVT_2,B_cells|Endo_F,B_cells|Endo_L,B_cells|Endo_M,B_cells|GC,B_cells|Granulocytes,B_cells|HOFB,B_cells|ILC3,B_cells|M3,B_cells|MO,B_cells|NK,B_cells|PV MMP11,B_cells|PV MYH11,B_cells|PV STEAP4,B_cells|Plasma,B_cells|SCT,B_cells|T_cells,B_cells|VCT,B_cells|VCT_CCC,B_cells|VCT_fusing,B_cells|VCT_p,B_cells|dDC,B_cells|dEpi_lumenal,B_cells|dEpi_secretory,B_cells|dM1,B_cells|dM2,B_cells|dNK1,B_cells|dNK2,B_cells|dNK3,B_cells|dS1,B_cells|dS2,B_cells|dS3,B_cells|dT_cells,B_cells|dT_regs,B_cells|eEVT,B_cells|fF1,...,iEVT|dS3,iEVT|dT_cells,iEVT|dT_regs,iEVT|eEVT,iEVT|fF1,iEVT|fF2,iEVT|iEVT,iEVT|uSMC,uSMC|B_cells,uSMC|DC,uSMC|EVT_1,uSMC|EVT_2,uSMC|Endo_F,uSMC|Endo_L,uSMC|Endo_M,uSMC|GC,uSMC|Granulocytes,uSMC|HOFB,uSMC|ILC3,uSMC|M3,uSMC|MO,uSMC|NK,uSMC|PV MMP11,uSMC|PV MYH11,uSMC|PV STEAP4,uSMC|Plasma,uSMC|SCT,uSMC|T_cells,uSMC|VCT,uSMC|VCT_CCC,uSMC|VCT_fusing,uSMC|VCT_p,uSMC|dDC,uSMC|dEpi_lumenal,uSMC|dEpi_secretory,uSMC|dM1,uSMC|dM2,uSMC|dNK1,uSMC|dNK2,uSMC|dNK3,uSMC|dS1,uSMC|dS2,uSMC|dS3,uSMC|dT_cells,uSMC|dT_regs,uSMC|eEVT,uSMC|fF1,uSMC|fF2,uSMC|iEVT,uSMC|uSMC
0,CPI-CS0A5B6BD7A,12oxoLeukotrieneB4_byPTGR1_LTB4R,complex:12oxoLeukotrieneB4_byPTGR1,simple:Q15722,,LTB4R,True,False,True,curated,False,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.018,0.0,0.028,0.026,0.0,0.016,0.017,0.0,0.137,0.068,0.0,0.07,0.0,0.074,0.0,0.0,0.089,0.0,0.106,0.113,0.075,0.0,0.0,0.078,0.325,0.068,0.099,0.076,0.078,0.0,0.075,0.0,0.0,0.085,0.094,0.103,0.071,0.107,0.0,0.069,0.071,0.0,0.07,0.0,0.079,0.077,0.0,0.068,0.068
1,CPI-CS047D9C0D7,LeukotrieneB4_byLTA4H_LTB4R,complex:LeukotrieneB4_byLTA4H,simple:Q15722,,LTB4R,True,False,True,curated,False,0.0,0.149,0.08,0.0,0.082,0.0,0.086,0.0,0.0,0.101,0.0,0.117,0.125,0.087,0.0,0.0,0.09,0.337,0.08,0.111,0.088,0.09,0.0,0.087,0.0,0.0,0.096,0.106,0.115,0.083,0.119,0.0,0.081,0.083,0.0,0.082,0.0,0.091,0.089,...,0.0,0.017,0.0,0.027,0.024,0.0,0.015,0.016,0.0,0.092,0.024,0.0,0.025,0.0,0.03,0.0,0.0,0.045,0.0,0.061,0.069,0.03,0.0,0.0,0.033,0.28,0.024,0.055,0.032,0.034,0.0,0.03,0.0,0.0,0.04,0.05,0.058,0.026,0.063,0.0,0.024,0.027,0.0,0.025,0.0,0.035,0.032,0.0,0.023,0.024
2,CPI-CS04A56D5BE,12oxoLeukotrieneB4_byPTGR1_LTB4R2,complex:12oxoLeukotrieneB4_byPTGR1,simple:Q9NPC1,,LTB4R2,True,False,True,curated,False,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.021,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079,0.0,0.0,0.0,0.0,0.0,0.066,0.0,0.067,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068,0.0,0.069,0.0,0.0,0.066,0.0,0.0,0.0,0.0,0.073,0.0,0.0,0.0


**Deconvoluted** fields:
- **gene_name**: Gene identifier for one of the subunits that are participating in the interaction defined in “means.csv” file. The identifier will depend on the input of the user list.
- **uniprot**: UniProt identifier for one of the subunits that are participating in the interaction defined in “means.csv” file.
- **is_complex**: True if the subunit is part of a complex. Single if it is not, complex if it is.
- **protein_name**: Protein name for one of the subunits that are participating in the interaction defined in “means.csv” file.
- **complex_name**: Complex name if the subunit is part of a complex. Empty if not.
- **id_cp_interaction**: Unique CellPhoneDB identifier for each of the interactions stored in the database.
- **mean**: Mean expression of the corresponding gene in each cluster.

In [11]:
deconvoluted.head(4)

Unnamed: 0_level_0,gene_name,uniprot,is_complex,protein_name,complex_name,id_cp_interaction,B_cells,DC,EVT_1,EVT_2,Endo_F,Endo_L,Endo_M,GC,Granulocytes,HOFB,ILC3,M3,MO,NK,PV MMP11,PV MYH11,PV STEAP4,Plasma,SCT,T_cells,VCT,VCT_CCC,VCT_fusing,VCT_p,dDC,dEpi_lumenal,dEpi_secretory,dM1,dM2,dNK1,dNK2,dNK3,dS1,dS2,dS3,dT_cells,dT_regs,eEVT,fF1,fF2,iEVT,uSMC
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1
UBASH3B,UBASH3B,Q8TF42,True,UBS3B_HUMAN,Dehydroepiandrosterone_bySTS,CPI-CS09B8977D7,0.24,0.0,0.621,1.06,0.105,0.212,0.216,1.424,0.0,0.411,0.25,0.465,0.068,0.127,0.114,0.0,0.0,0.0,0.161,0.0,0.484,0.576,0.158,0.454,0.608,0.0,0.124,0.24,0.36,1.134,0.632,0.437,0.118,0.224,0.162,0.025,0.124,1.182,0.055,0.0,1.402,0.338
UBASH3B,UBASH3B,Q8TF42,True,UBS3B_HUMAN,Dehydroepiandrosterone_bySTS,CPI-CS05760BB78,0.24,0.0,0.621,1.06,0.105,0.212,0.216,1.424,0.0,0.411,0.25,0.465,0.068,0.127,0.114,0.0,0.0,0.0,0.161,0.0,0.484,0.576,0.158,0.454,0.608,0.0,0.124,0.24,0.36,1.134,0.632,0.437,0.118,0.224,0.162,0.025,0.124,1.182,0.055,0.0,1.402,0.338
UBASH3B,UBASH3B,Q8TF42,True,UBS3B_HUMAN,Dehydroepiandrosterone_bySTS,CPI-CS0259A0EB4,0.24,0.0,0.621,1.06,0.105,0.212,0.216,1.424,0.0,0.411,0.25,0.465,0.068,0.127,0.114,0.0,0.0,0.0,0.161,0.0,0.484,0.576,0.158,0.454,0.608,0.0,0.124,0.24,0.36,1.134,0.632,0.437,0.118,0.224,0.162,0.025,0.124,1.182,0.055,0.0,1.402,0.338
SULT1A1,SULT1A1,P50225,True,ST1A1_HUMAN,DHEAsulfate_bySULT2B,CPI-CS099F73A95,0.0,0.186,0.003,0.0,0.0,0.014,0.024,0.0,0.0,0.077,0.097,0.034,0.213,0.0,0.0,0.0,0.053,0.0,0.001,0.0,0.008,0.016,0.0,0.015,0.0,0.236,0.033,0.122,0.096,0.007,0.025,0.0,0.019,0.022,0.03,0.031,0.0,0.232,0.148,0.133,0.0,0.021
