# FuseMap Tutorial IV: Mapping to new datasets

In this tutorial, we will demonstrate how to map new spatial transcriptomics data to an existing FuseMap integration. This is useful when you want to analyze new samples in the context of previously integrated datasets.

FuseMap provides functionality to project new data points into the same latent space as the reference integration, allowing you to:

1. Compare new samples to existing integrated data
2. Transfer annotations and insights from the reference to new data
3. Analyze spatial patterns across old and new datasets together

We use the integrated model trained on merfish and starmap, and then transfer the model to slideseq.

- MERFISH data: Zhang et al. [Nature paper](https://www.nature.com/articles/s41586-023-06808-9#data-availability)
- STARmap data: Shi et al., [Nature paper](https://www.nature.com/articles/s41586-023-06569-5#data-availability)
- Slide-seq data: Langlieb et al., [Nature paper](https://www.nature.com/articles/s41586-023-06818-7#data-availability)

### 1. Define arguments

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import os
import scanpy as sc
from easydict import EasyDict as edict
from fusemap import seed_all, spatial_map
import copy
seed_all(0)

In [3]:
pretrain_model_path = "/Users/mingzeyuan/Workspace/FuseMap/output/integrate_merfish_starmap"
output_save_dir = "/Users/mingzeyuan/Workspace/FuseMap/output/map_slideseq"
args = edict(dict(pretrain_model_path=pretrain_model_path, output_save_dir=output_save_dir))
data_dir_list = ["/Users/mingzeyuan/Workspace/FuseMap/data/slideseq_Puck34.h5ad"]

### 2. Data loading and pre-processing

In [4]:
X_input = []
for ind, data_dir in enumerate(data_dir_list):
    print(f"Loading {data_dir}")
    data = sc.read_h5ad(data_dir)    
    # Handle spatial coordinates
    if "x" not in data.obs.columns:
        if "col" in data.obs.columns and "row" in X.obs.columns:
            data.obs["x"] = data.obs["col"]
            data.obs["y"] = data.obs["row"]
        elif "spatial" in data.obsm.keys():
            data.obs["x"] = data.obsm["spatial"][:,0]
            data.obs["y"] = data.obsm["spatial"][:,1]
        else:
            raise ValueError(f"Please provide spatial coordinates in the obs['x'] and obs['y'] columns for {data_dir}")
    
    # Add metadata
    data.obs['name'] = f'section{ind}'
    data.obs['file_name'] = os.path.basename(data_dir)
    print(f"Loaded {data.shape[0]} cells with {data.shape[1]} genes from {data.obs['file_name'].iloc[0]}")
    X_input.append(data)
    
# Set parameters for integration
kneighbor = ["delaunay"] * len(X_input)
input_identity = ["ST"] * len(X_input)
print(f"Loaded {len(X_input)} datasets")

Loading /Users/mingzeyuan/Workspace/FuseMap/data/slideseq_Puck34.h5ad


ValueError: Please provide spatial coordinates in the obs['x'] and obs['y'] columns for /Users/mingzeyuan/Workspace/FuseMap/data/slideseq_Puck34.h5ad

### 3. Mapping fusemap to new datasets

In [5]:
for i in range(len(X_input)):
    args_i=copy.copy(args)
    args_i.output_save_dir = os.path.join(args.output_save_dir, X_input[i].obs['file_name'].unique()[0])
    spatial_map([X_input[i]], args_i, [kneighbor[i]], [input_identity[i]])   