This script aims to quantify the model's uncertainty in cell abundance estimates by calculating the difference between 5% quantile and 95% quantile of the posterior distribution of Ws,f. 

This strategy attempts to explain why the cell type localizations predicted by the model were not fully validated by cell markers. 

If the model was uncertain about its predictions, it becomes more understandable that the cell markers did not fully validate the cell type localizations predicted by the model. 

Please note that this script was only run on sample B1 for demonstration.

**Author:** Yiqing Wang

**Date:** 2024-7-31

INPUT: spatially mapped Cell2location model and AnnData

OUTPUT: plot that shows the 5% ~ 95% quantile range of the estimated cell abundance, separated by cell type; plot that shows the normalized-by-mean 5% ~ 95% quantile range

1. Load AnnData

In [1]:
import scanpy as sc
import cell2location
import numpy as np
import matplotlib.pyplot as plt

In [2]:
dir = "path/to/data"
sample = "B1" # specify sample name
run_name = f"{dir}/test_results/{sample}_run_name"
output_dir = f"{run_name}/other_qc"

In [None]:
# Load AnnData object
adata_file = f"{run_name}/sp_mapped.h5ad"
adata_vis = sc.read_h5ad(adata_file)

# Load model
mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)

2. Calculate the range between the 5% and the 95% quantile of the posterior distribution of estimated cell abundance, separated by cell type

In [8]:
def plot_interquantile(model):
    adata = model.adata

    # Subtract the 5th percentile from the 95th percentile
    # The result is a 2299 X 15 array of interquantile range values, corresponding to 2299 locations and 15 cell types.
    interq = adata.uns["mod"]["post_sample_q95"]["w_sf"] - adata.uns["mod"]["post_sample_q05"]["w_sf"]

    # Add the interquantile range to adata.obs, with each cell type as a column
    # adata.uns["mod"]["factor_names"] is cell type names. 
    # This has to be done for sc.pl.spatial to recognize the data.
    adata.obs[adata.uns["mod"]["factor_names"]] = interq

    cell_type = adata.obs[adata.uns["mod"]["factor_names"]]

    sc.pl.spatial(
            adata,
            cmap="magma",
            color=cell_type,
            ncols=4,
            size=1.3,
            img_key="hires",
            alpha_img=1,
            vmin=0,
            vmax="p99.2",
            )

In [None]:
plot_interquantile(mod)

3. Attempt to normalize the 5% ~ 95% quantile range by dividing it by the mean of the posterior distribution of the estimated cell abundance. 

The rationale is that, for a posterior distribution, the range between the 5% and the 95% quantile can be bigger if the mean is bigger. Therefore, we need a way to normalized the 5% ~ 95% quantile range, so that we can compare this range across different distributions (i.e. different locations and different cell types). One attempt I made was to divide this range by the mean of the posterior distribution. 

The caveat of this strategy is that, when the mean is very small, the normalized range becomes very large, so the normalized value in this case would be much higher than the case where the mean is bigger. However, we only care about the spots where the mean is considerable, because those are the spots with a certain amount of estimated cell abundance for which we want to quantify uncertainty. 

Therefore, a better normalization approach still needs to be developed, one that perhaps excludes spots with very low mean, i.e. very low estimated cell abundance. 

In [10]:
def plot_interquantile_norm(model):
    adata = model.adata

    # Subtract the 5th percentile from the 95th percentile
    interq = adata.uns["mod"]["post_sample_q95"]["w_sf"] - adata.uns["mod"]["post_sample_q05"]["w_sf"]

    # Divide the range by the mean of the posterior distribution of the estimated cell abundance
    interq_norm = interq / adata.uns["mod"]["post_sample_means"]["w_sf"]

    # Add the normalized interquantile range to adata.obs, with each cell type as a column
    adata.obs[adata.uns["mod"]["factor_names"]] = interq_norm

    cell_type = adata.obs[adata.uns["mod"]["factor_names"]]

    sc.pl.spatial(
            adata,
            cmap="magma",
            color=cell_type,
            ncols=4,
            size=1.3,
            img_key="hires",
            alpha_img=1,
            vmin=0,
            vmax="p99.2",
            )

In [None]:
plot_interquantile_norm(mod)