# Proteomics Tutorial with SpatialData `blobs`

This tutorial shows two equivalent ways to build a Vitessce proteomics config:

1. `proteomics_from_split_sources`: image, labels, and table are passed as separate paths.
2. `proteomics_from_spatialdata`: layers are resolved from a SpatialData object and can use coordinate systems.

Both workflows use the same underlying data so you can compare behavior directly.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import harpy_vitessce as hpv

  from pkg_resources import DistributionNotFound, get_distribution


In [3]:
import tempfile
from pathlib import Path

tmp_dir = Path(tempfile.mkdtemp(prefix="spatialdata_blobs"))

In [None]:
import scanpy as sc
from spatialdata.datasets import blobs
from spatialdata.models import TableModel

sdata = blobs()

adata = sdata["table"]

# add leiden clusters using a dummy scanpy pipeline
sc.pp.scale(adata, max_value=10)
sc.pp.pca(
    adata,
    n_comps=2,
    svd_solver="arpack",
)
sc.pp.neighbors(
    adata,
    use_rep="X_pca",
    n_neighbors=10,
)
sc.tl.leiden(adata, resolution=0.6, key_added="leiden")
sc.tl.umap(adata, min_dist=0.3)

# uncomment these to convince yourself that Vitessce (when using SpatialDataWrapper) falls back to the index of the table if there is not instance/region key in the table.
# del adata.obs["instance_id"]
# del adata.uns[TableModel.ATTRS_KEY]
# adata.obs.index = [f"segmentation_{uuid.uuid4()}" for _ in range(len(adata.obs))]

spatialdata_path = tmp_dir / "sdata.zarr"
sdata.write(
    spatialdata_path,
    overwrite=True,
)

  return convert_region_column_to_categorical(adata)
  return dispatch(args[0].__class__)(*args, **kw)
  return meth(arg, *args, **kwargs)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/var/folders/sz/t3tgg4fs4tz9btm0fbqg_tzc0000gn/T/spatialdata_blobs4s_o1gst/[0m[95msdata.zarr[0m                     


### Why index alignment matters for split sources

`proteomics_from_split_sources` uses separate wrappers for image/labels/table.
For cell-level linking, segmentation IDs in `labels_source` should match
the AnnData observation IDs (`adata.obs_names`, i.e. the table index).
If they do not match, selections in spatial and feature views cannot be synchronized correctly.


In [5]:
import dask.array as da

display(sdata["table"].obs.index)
# should match ID's in
display(da.unique(sdata["blobs_labels"].data).compute())

Index(['1', '2', '3', '4', '5', '6', '8', '9', '10', '11', '12', '13', '15',
       '16', '17', '18', '19', '20', '22', '23', '24', '25', '26', '27', '29',
       '30'],
      dtype='object')

array([ 0,  1,  2,  3,  4,  5,  6,  8,  9, 10, 11, 12, 13, 15, 16, 17, 18,
       19, 20, 22, 23, 24, 25, 26, 27, 29, 30], dtype=int16)

### Build a Vitessce config from split sources

This call reads image, labels, and table from separate paths and builds a linked
multi-view layout (spatial view + marker/cluster views).


In [16]:
from IPython.display import HTML, display

vc = hpv.proteomics_from_split_sources(
    img_source=spatialdata_path
    / "images"
    / "blobs_multiscale_image",  # we require image of dimension "c", "y", "x"
    labels_source=spatialdata_path
    / "labels"
    / "blobs_labels",  # note we require segmentation mask to be of dimension "y", "x"
    microns_per_pixel_image=0.5,  # set as you please
    microns_per_pixel_mask=0.5,
    channels=[0, 1, 2],
    adata_source=spatialdata_path / "tables" / "table",
    visualize_feature_matrix=False,
    visualize_heatmap=True,
    embedding_key="X_umap",
    embedding_key_display_name="UMAP",
    cluster_key="leiden",
    cluster_key_display_name="Leiden clusters",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))

[32m2026-02-19 11:52:59.196[0m | [1mINFO    [0m | [36mharpy_vitessce.vitessce_config._image[0m:[36mbuild_image_layer_config[0m:[36m141[0m - [1mNo palette provided and 3 channels selected; rendering with the default channel palette.[0m


### SpatialData-native workflow

With `proteomics_from_spatialdata`, linkage is derived from SpatialData table annotations.
The table should annotate the labels element using SpatialData table attributes
(region/instance semantics), rather than relying on index matching alone.


In [None]:
sdata["table"].uns[TableModel.ATTRS_KEY]  # -> annotated by blobs_labels

{'instance_key': 'instance_id',
 'region': 'blobs_labels',
 'region_key': 'region'}

### Coordinate systems and transformations

For the SpatialData-based API, view-space scaling and reorientation are controlled
through named coordinate systems.

Below we add:
- `micron`: isotropic scaling from pixels to microns.
- `rotation`: an affine rotation in the `(x, y)` plane.
- `global`: required for OME-NGFF compatibility.


In [None]:
import numpy as np
from spatialdata.transformations import Affine, Identity, Scale, set_transformation

microns_per_pixel = 10
rotation_degrees = 20
rotation_radians = np.deg2rad(rotation_degrees)
rotation_matrix = [
    [np.cos(rotation_radians), -np.sin(rotation_radians), 0.0],
    [np.sin(rotation_radians), np.cos(rotation_radians), 0.0],
    [0.0, 0.0, 1.0],
]

transformations = {
    "micron": Scale(axes=("x", "y"), scale=[microns_per_pixel, microns_per_pixel]),
    "rotation": Affine(
        matrix=rotation_matrix, input_axes=("x", "y"), output_axes=("x", "y")
    ),
    "global": Identity(),  # Note that we need global coordinate sytem for ome ngff.
}

set_transformation(
    sdata["blobs_multiscale_image"],
    transformation=transformations,
    set_all=True,
    write_to_sdata=sdata,
)

set_transformation(
    sdata["blobs_labels"],
    transformation=transformations,
    set_all=True,
    write_to_sdata=sdata,
)

### Build a Vitessce config from SpatialData

We render the same data twice, changing only `to_coordinate_system` (`micron` vs `rotation`),
so you can see how coordinate-system selection affects the spatial view while preserving
feature-level linking.


In [10]:
from IPython.display import HTML, display

vc = hpv.proteomics_from_spatialdata(
    sdata_path=spatialdata_path,
    labels_layer="blobs_labels",
    img_layer="blobs_multiscale_image",
    table_layer="table",
    channels=[0, 1, 2],
    visualize_feature_matrix=True,
    to_coordinate_system="micron",  # specify the micron coordinate system.
    visualize_heatmap=True,
    embedding_key="X_umap",
    cluster_key="leiden",
    cluster_key_display_name="Leiden",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))


vc = hpv.proteomics_from_spatialdata(
    sdata_path=spatialdata_path,
    labels_layer="blobs_labels",
    img_layer="blobs_multiscale_image",
    table_layer="table",
    channels=[0, 1, 2],
    visualize_feature_matrix=True,
    to_coordinate_system="rotation",  # or a rotation
    visualize_heatmap=True,
    embedding_key="X_umap",
    cluster_key="leiden",
    cluster_key_display_name="Leiden",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))

[32m2026-02-19 12:13:43.680[0m | [1mINFO    [0m | [36mharpy_vitessce.vitessce_config._image[0m:[36mbuild_image_layer_config[0m:[36m141[0m - [1mNo palette provided and 3 channels selected; rendering with the default channel palette.[0m


[32m2026-02-19 12:13:43.844[0m | [1mINFO    [0m | [36mharpy_vitessce.vitessce_config._image[0m:[36mbuild_image_layer_config[0m:[36m141[0m - [1mNo palette provided and 3 channels selected; rendering with the default channel palette.[0m
