# A Python package for sccomp - Tests differences in cell type proportions and variability from single-cell data
Cellular omics such as single-cell genomics, proteomics, and microbiomics allow the characterization of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to unveiling markers of disease progression in conditions such as cancer and pathogen infections.

For cellular omic data, no method for differential variability analysis exists, and methods for differential composition analysis only take a few fundamental data properties into account. Here we introduce sccomp, a generalised method for differential composition and variability analyses capable of jointly modelling data count distribution, compositionality, group-specific variability, and proportion mean-variability association, while being robust to outliers.

sccomp is an extensive analysis framework that allows realistic data simulation and cross-study knowledge transfer. We demonstrate that mean-variability association is ubiquitous across technologies, highlighting the inadequacy of the very popular Dirichlet-multinomial modeling and providing essential principles for differential variability analysis.

## installation
Here is a demo of installation in editable mode

In [1]:
%pip install -e .

Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/chzhan1/Python/SAiGENCI/sccompPy
Installing collected packages: sccompPy
  Attempting uninstall: sccompPy
    Found existing installation: sccompPy 0.1.0
    Uninstalling sccompPy-0.1.0:
      Successfully uninstalled sccompPy-0.1.0
  distutils: /home/chzhan1/.local/lib/python3.9/site-packages
  sysconfig: /home/chzhan1/.local/lib64/python3.9/site-packages[0m
  user = True
  home = None
  root = None
  prefix = None[0m
  Running setup.py develop for sccompPy
Successfully installed sccompPy-0.1.0
Note: you may need to restart the kernel to use updated packages.


## Import `sccompy` package 

In [2]:
import sccompPy

## Load embeded dataset

In [3]:
import pkg_resources
import pandas as pd

data_file_path = pkg_resources.resource_filename("sccompPy", "data/count_obj.csv")

count_obj = pd.read_csv(data_file_path)
count_obj

Unnamed: 0,sample,type,phenotype,count,cell_group,proportion
0,10x_6K,benign,b_cell_macrophage_precursor_or_follicular_LTB_...,42,BM,0.008350
1,10x_6K,benign,B_cell:immature,361,B1,0.071769
2,10x_6K,benign,B_cell:immature_IGLC3_IGLC2,57,B2,0.011332
3,10x_6K,benign,B_cell:Memory_ITM2C_IGHA1_MZB1_JCHAIN,40,B3,0.007952
4,10x_6K,benign,Dendritic_CD11_CD1_high_mito,75,Dm,0.014911
...,...,...,...,...,...,...
715,SRR7244582,benign,T_cell:CD8+_GZMK_DUSP2_LYAR_CCL5,197,CD8 2,0.060727
716,SRR7244582,benign,T_cell:CD8+_non_activated,320,CD8 3,0.098644
717,SRR7244582,benign,T_cell:CD8+_PPBP_SAT1,39,CD8 4,0.012022
718,SRR7244582,benign,T_cell:CD8+_S100B,88,CD8 5,0.027127


## `sccomp_estimate` function

In [4]:
estimate_res = sccompPy.sccomp_estimate(
    data = count_obj,
    formula_composition = '~ 0 + type', 
    sample = 'sample',
    cell_group = 'cell_group',
    count = 'count',
    verbose = False
)

17:23:36 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

17:23:45 - cmdstanpy - INFO - CmdStan done processing.
Exception: Exception: beta_binomial_lpmf: First prior sample size parameter[1] is inf, but must be positive finite! (in 'glm_multi_beta_binomial.stan', line 214, column 16 to line 219, column 19) (in 'glm_multi_beta_binomial.stan', line 653, column 3 to line 683, column 8)
Exception: Exception: beta_binomial_lpmf: First prior sample size parameter[1] is inf, but must be positive finite! (in 'glm_multi_beta_binomial.stan', line 214, column 16 to line 219, column 19) (in 'glm_multi_beta_binomial.stan', line 653, column 3 to line 683, column 8)
Exception: Exception: beta_binomial_lpmf: Second prior sample size parameter[5] is 0, but must be positive finite! (in 'glm_multi_beta_binomial.stan', line 214, column 16 to line 219, column 19) (in 'glm_multi_beta_binomial.stan', line 653, column 3 to line 683, column 8)
Exception: Exception: beta_binomial_lpmf: First prior sample size parameter[1] is inf, but must be positive finite! (in 'glm




In [5]:
estimate_res.keys()

dict_keys(['fit', 'model_input', 'truncation_df2', 'sample', 'cell_group', 'count', 'formula_composition', 'formula_variability', 'noise_model'])

## `sccomp_test` function

In [6]:
test_res = sccompPy.sccomp_test(estimate_res, contrasts= ['type[cancer] - type[benign]'])



  grouped = draws.groupby([cell_group, 'M', 'parameter'])
  summary["FDR"] = summary.groupby("parameter")[f"{prefix}pH0"].transform(lambda pH0: get_FDR(pH0))


### `sccomp_test` returns a `dict` where the first element - *result* contains the result table

In [7]:
test_res['result']

Unnamed: 0,cell_group,parameter,c_lower,c_effect,c_upper,c_pH0,FDR,N_Eff,R_k_hat,v_lower,v_effect,v_upper,v_pH0,factor,design_matrix_col,count_data
0,B1,type[benign],0.885494,1.147990,1.386073,0.00000,0.000000,4636.410,0.999638,,,,,type,type[benign],sample type phenotype co...
1,B1,type[cancer],0.149581,0.495901,0.817201,0.01325,0.000990,6088.560,0.999419,,,,,type,type[cancer],sample type phenotype co...
2,B1,type[cancer] - type[benign],-1.056506,-0.654636,-0.234130,0.00675,0.001275,,,,,,,,,sample type phenotype co...
3,B2,type[benign],0.393509,0.729689,1.017565,0.00025,0.000024,5653.390,0.999175,,,,,type,type[benign],sample type p...
4,B2,type[cancer],-0.426521,0.028897,0.444366,0.63525,0.087956,4994.700,0.999616,,,,,type,type[cancer],sample type p...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
103,TM2,type[cancer],-1.220657,-0.906117,-0.590917,0.00000,0.000000,3886.600,0.999885,,,,,type,type[cancer],sample type ...
104,TM2,type[cancer] - type[benign],-0.169545,0.274726,0.722179,0.23100,0.055646,,,,,,,,,sample type ...
105,TM3,type[benign],-1.706025,-0.801815,0.287057,0.09700,0.012589,913.663,0.999229,,,,,type,type[benign],sample type ...
106,TM3,type[cancer],-3.921500,-2.679785,-1.412898,0.00000,0.000000,738.823,1.000240,,,,,type,type[cancer],sample type ...
