This script performs proprocessing by removing mitochondrial (MT) genes from the 10X Visium spatial data.

**Author:** Yiqing Wang

**Date:** 2024-6-18

INPUT: untransformed, unnormalized spatial data

OUTPUT: spatial AnnData with MT genes removed

1. Import libraries

In [None]:
import os
import scanpy as sc

2. Load Visium spatial data

In [None]:
dir = "path/to/data"
os.chdir(dir)
sample = "D1" # sample id

# Path to the spaceranger output. It has to be a folder that contains the filtered_feature_bc_matrix/ folder.
data_folder = f"path/to/data/spaceranger/count-{sample}/outs" 

results_folder = "./test_results/"
run_name = f"{results_folder}/{sample}_run_name"

In [None]:
# Load Visium spatial data

adata_vis = sc.read_visium(
    path=f"{data_folder}/filtered_feature_bc_matrix",
    library_id=sample,
    load_images=True,
)

In [None]:
print(f"Number of spatial locations in Visium data: {adata_vis.shape[0]}")
print(f"Number of genes in Visium data: {adata_vis.shape[1]}")

3. Set sample names

In [None]:
# Set the sample name (library_id)
# I am not sure if this is necessary for subsequent steps. I will keep it for now.
adata_vis.obs["sample"] = list(adata_vis.uns["spatial"].keys())[
    0
]  # library_id set before

4. (Optional) Convert gene row names from gene symbols to gene ids

This section is optional, depending on whether the conversion is needed for subsequent steps.

For this data, this was not necessary, as the gene names in the single cell reference data were also in gene symbol format.

In [None]:
adata_vis.var["SYMBOL"] = adata_vis.var_names  # save gene symbols
# Replace row names which were gene symbols to gene ids
adata_vis.var.set_index(
    "gene_ids", drop=True, inplace=True
)  # drop: delete the column after setting it as index; inplace: modify the original object

5. Remove mitochondrial genes (keeping their counts in the object) and save new AnnData object

In [None]:
adata_vis.var["mt_gene"] = [
    gene.startswith("mt-") for gene in adata_vis.var["SYMBOL"]
]  # identify mitochondrial genes (mouse MT genes start with "mt-")

print(f"Number of mitochondrial genes in Visium data: {sum(adata_vis.var['mt_gene'])}")

adata_vis.obsm["mt"] = adata_vis[
    :, adata_vis.var["mt_gene"].values
].X.toarray()  # store mitochondrial counts
adata_vis = adata_vis[:, ~adata_vis.var["mt_gene"].values]  # remove mitochondrial genes

print("mitochondrial genes removed")
print(
    f"Number of genes in Visium data after MT genes were removed: {adata_vis.shape[1]}"
)

adata_vis.write(f"{run_name}/adata_vis_MTremoved.h5ad")