This script makes additional QC plots for the NB regression model for estimating cell type signatures, by running the plot_QC() function.

**Author:** Yiqing Wang
**Date:** 2024-7-27

INPUT:
1) NB regression model
2) Single cell data (AnnData) with estimated cell type signatures

OUTPUT: model QC plots
1) Reconstruction accuracy
2) Average expression of every gene in every cell type vs. model-estimated expression of every gene in every cell type

In [10]:
import scanpy as sc
import cell2location
import matplotlib.pyplot as plt

1. Loading the trained model and the single cell data (an AnnData object)

In [4]:
dir = "path/to/data"
ref_run_name = f"{dir}/test_results/run_name"

In [None]:
# Read single cell AnnData and load trained model
adata_file = f"{ref_run_name}/sc_trained.h5ad"
adata_ref = sc.read_h5ad(adata_file)
mod = cell2location.models.RegressionModel.load(f"{ref_run_name}", adata_ref)

2. Summarizing posterior distributions and exporting estimated cell type signatures to AnnData

Please note that export_posterior() needs to be run again, even though it has been run after model training in hpc_sc_regression.py.
Otherwise, mod.plot_QC() would return an error.

In [None]:
adata_ref = mod.export_posterior(
    adata_ref, sample_kwargs={"num_samples": 1000, "batch_size": 2048}
)

3. Making model QC plots

- Reconstruction Accuracy: observed RNA counts vs. posterior expected values of RNA counts estimated by the model
- Second plot: the average expression of every gene in every cell type vs. the model-estimated expression of every gene in every cell type

Note: since the function call makes two plots, I have not figured out a way to save the two plots using plt.savefig(). Currently, they need to be saved manually.

In [None]:
mod.plot_QC()