# FuseMap Tutorial II: Integrating Spatial Transcriptomics Data Across Different Technology Platforms

In this tutorial, we'll demonstrate how to use FuseMap to integrate spatial transcriptomics data from different technology platforms - specifically, combining image-based (STARmap) and sequencing-based (Slide-seq) technologies. This integration is particularly challenging and important because:

1. Different technologies capture spatial information at different resolutions
2. The data formats and characteristics vary significantly between platforms
3. The gene capture efficiency and coverage differ between methods

We'll walk through each step carefully, explaining the rationale and technical details along the way.

## 1. Data preparation

For this tutorial, we'll use mouse brain data from two different platforms:

- STARmap data: A high-resolution image-based spatial transcriptomics method (Shi et al., [Nature paper](https://www.nature.com/articles/s41586-023-06569-5#data-availability))
- Slide-seq data: A sequencing-based spatial transcriptomics method with high throughput (Langlieb et al., [Nature paper](https://www.nature.com/articles/s41586-023-06818-7#data-availability))

Both datasets are from mouse brain tissue, which allows us to demonstrate cross-platform integration while maintaining biological relevance.

In [5]:
import warnings
warnings.filterwarnings("ignore")

In [6]:
import os
import scanpy as sc
import matplotlib.pyplot as plt
from easydict import EasyDict as edict
from fusemap import seed_all, spatial_integrate, setup_logging, ModelType
import logging
import pandas as pd
seed_all(0)

# Set plotting style
plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['font.size'] = 12

## 2. Data Loading and Preprocessing

When working with different technology platforms, careful preprocessing is crucial. Each platform has its own data characteristics:

In [7]:
# Set paths to data
data_dir_list = [
    '/Users/mingzeyuan/Workspace/FuseMap/data/starmap.h5ad',
    '/Users/mingzeyuan/Workspace/FuseMap/data/slideseq_Puck34.h5ad'
]
output_dir = '/Users/mingzeyuan/Workspace/FuseMap/output'
os.makedirs(output_dir, exist_ok=True)

In [8]:
args = edict(dict(output_save_dir=output_dir, 
                  keep_celltype="", 
                  keep_tissueregion="", 
                  use_llm_gene_embedding="false", 
                  pretrain_model_path=""))

In [None]:
setup_logging(args.output_save_dir)

arg_dict = vars(args)
dict_pd = {}
for keys in arg_dict.keys():
    dict_pd[keys] = [arg_dict[keys]]
pd.DataFrame(dict_pd).to_csv(args.output_save_dir  + "config.csv", index=False)
logging.info("\n\n\033[95mArguments:\033[0m \n%s\n\n", vars(args))
logging.info("\n\n\033[95mArguments:\033[0m \n%s\n\n", vars(ModelType))

### Loading and Processing the Data
For cross-platform integration, we need to handle the spatial coordinates carefully. Note that:
- STARmap provides cell-level coordinates
- Slide-seq provides bead/spot coordinates
We will standardize these coordinates while preserving their relative positions:

In [None]:
X_input = []
for ind, data_dir in enumerate(data_dir_list):
    print(f"Loading {data_dir}")
    data = sc.read_h5ad(data_dir)
    
    # Handle spatial coordinates with platform-specific considerations
    if "x" not in data.obs.columns:
        if "col" in data.obs.columns and "row" in data.obs.columns:
            data.obs["x"] = data.obs["col"]
            data.obs["y"] = data.obs["row"]
        elif "spatial" in data.obsm.keys():
            data.obs["x"] = data.obsm["spatial"][:,0]
            data.obs["y"] = data.obsm["spatial"][:,1]
        elif 'Raw_Slideseq_X' in data.obs.columns and 'Raw_Slideseq_Y' in data.obs.columns:
            data.obs["x"] = data.obs['Raw_Slideseq_X']
            data.obs["y"] = data.obs['Raw_Slideseq_Y']
        else:
            raise ValueError(f"Spatial coordinates not found in expected format for {data_dir}")
    
    # Add dataset-specific metadata
    data.obs['name'] = f'section{ind}'
    data.obs['file_name'] = os.path.basename(data_dir)
    data.obs['platform'] = 'STARmap' if 'starmap' in data_dir.lower() else 'Slide-seq'
    
    print(f"Loaded {data.shape[0]} spots/cells with {data.shape[1]} genes from {data.obs['platform']}")
    X_input.append(data)

# Set integration parameters
# Use Delaunay triangulation for STARmap (cell-level) and KNN for Slide-seq (spot-level)
kneighbor = ["delaunay", "knn"]
input_identity = ["ST", "ST"]
print(f"Loaded {len(X_input)} datasets from different platforms")

## 3. Cross-platform Integration

Now we'll perform the integration using FuseMap. The algorithm will:
1. Learn platform-invariant features
2. Preserve spatial relationships specific to each technology
3. Create a unified representation of the tissue

In [None]:
# Run the integration
spatial_integrate(X_input, args, kneighbor, input_identity)

## 4. Visualization

### read single-cell embedding

In [None]:
ad_embed=sc.read_h5ad(os.path.join(output_dir, 'ad_celltype_embedding.h5ad'))
sc.pp.neighbors(ad_embed, n_neighbors=50,use_rep='X')
sc.tl.umap(ad_embed)
ax = sc.pl.umap(ad_embed,color='batch',size=1, show=False)
ax.set_title('Single-cell embedding, colored by sample ID')

### read spatial embedding

In [None]:
ad_embed=sc.read_h5ad(os.path.join(output_dir, 'ad_celltype_embedding.h5ad'))
sc.pp.neighbors(ad_embed, n_neighbors=50,use_rep='X')
sc.tl.umap(ad_embed)
ax = sc.pl.umap(ad_embed,color='batch',size=1, show=False)
ax.set_title('Single-cell embedding, colored by sample ID')

### read gene embedding

In [None]:
ad_embed=sc.read_h5ad(os.path.join(output_dir, 'ad_celltype_embedding.h5ad'))
sc.pp.neighbors(ad_embed, n_neighbors=50,use_rep='X')
sc.tl.umap(ad_embed)
ax = sc.pl.umap(ad_embed,color='batch',size=1, show=False)
ax.set_title('Single-cell embedding, colored by sample ID')