This script runs the spatial_QC_across_batches() function, which shows the spatial distribution of A. total raw RNA counts, B. estimated total cell abundance, and C. estimated RNA detection sensitivity.

This script also writes out the estimated cell type abundance table, for potential downstream use. 

**Author:** Yiqing Wang

**Date:** 2024-7-30

INPUT: spatially mapped AnnData and model

OUTPUT: 

1) QC plot output by spatial_QC_across_batches()

2) estimated cell type abundance table

1. Load AnnData and model

In [1]:
import scanpy as sc
import cell2location
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
dir = "path/to/data"
sample = "C1" # specify sample name
run_name = f"{dir}/test_results/{sample}_run_name"
output_dir = f"{run_name}/other_qc"

In [None]:
# Load AnnData and model

adata_file = f"{run_name}/sp_mapped.h5ad"
adata_vis = sc.read_h5ad(adata_file)

mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)

2. Run plot_spatial_QC_across_batches()

In [None]:
# Generate the QC plots
fig = mod.plot_spatial_QC_across_batches()

# Display the plot
plt.show()

# Save the plot
# fig.savefig(f"{output_dir}/spatial_QC_across_batches.png")

What this plot shows:

A. Total RNA counts. Based on the source code, this is most likely raw, unnormalized total RNA counts. We can understand the source code more to make sure of this.

B. Total estimated cell abundance. This is, for each location, the sum of estimated cell abundances of all cell types. Namely, this is the sum of Ws,f over f (s = location, f = cell type) (check supplementary methods for more details). 

C. Detection sensitivity (Ys). This is the estimated RNA detection sensitivity at each location. 

In theory, the total estimated cell abundance should be the result of adjucting raw total RNA counts with detection sensitivity. The distribution of total estimated cell abundance should mirror the cell density observed from histology. 

If there are multiple samples from different batches, you should observe similar total cell abundances across samples, but distinct detection sensitivity across samples. 

3. Write out cell type abundance table

This is a table with locations as rows and cell types as columns, showing the model-estimated abundance of each cell type at each location. This table can be used to develop further downstream analyses on cell abundance values. 

The mean of the posterior distribution of cell abundance was used here, but 5% and 95% quantile values can also be chosen.

In [None]:
# Extract mean of posterior of cell abundance
cell_abun = adata_vis.uns["mod"]["post_sample_means"]["w_sf"]
# cell_abun = adata_vis.uns["mod"]["post_sample_q05"]["w_sf"]
cell_abun_df = pd.DataFrame(cell_abun)

# Set row names to location barcodes
cell_abun_df.index = adata_vis.obs.index

# Set column names to cell types
cell_abun_df.columns = adata_vis.uns["mod"]["factor_names"]

cell_abun_df.to_csv(f"{run_name}/{sample}_cell_abundance_table.csv")