# Interactive Visualization of `EHRData` with Vitessce

This tutorial demonstrates how to create interactive visualizations of EHRData objects using [Vitessce](https://vitessce.io/).

Vitessce provides linked, coordinated views that allow you to explore clinical data interactively in a web browser or Jupyter notebook.


## Load Data


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import ehrdata as ed
from pathlib import Path

In [None]:
# Load a subset of PhysioNet 2012 data
# TODO: tabular? I think better longitudinal.
# and then explain that can take slice.
# edata = ed.dt.physionet2012(
#     data_path=Path("/Users/eljas.roellin/Documents/ehrapy_workspace/ehrdata/ehrapy_data/physionet2012"),
#     layer="tem_data",
#     # n_subsamples=500, subsample_seed=42
# )
# edata

In [None]:
# ed.io.write_h5ad(edata, "physionet2012_subset.h5ad")

In [3]:
edata = ed.io.read_h5ad("physionet2012_subset.h5ad")

[93m![0m This operation does not affect numeric layer tem_data.
[94m‚Ä¢[0m Harmonizing missing values of layer tem_data


## Save to Zarr Format

Vitessce reads data from the Zarr format. Let's save our data:


In [None]:
!pip show zarr

In [None]:
# Save to Zarr
zarr_path = Path("physionet2012_subset.zarr")
ed.io.write_zarr(edata, zarr_path)
print(f"Data saved to {zarr_path}")

## Generate Vitessce Configuration

The `ehrdata.integrations.vitessce.gen_config()` helper function creates a Vitessce configuration tailored for your EHRData object.

You can specify which patient attributes to visualize using the `obs_sets` parameter:


In [None]:
zarr_path / "anndata"

In [None]:
# Generate Vitessce configuration
vc = ed.integrations.vitessce.gen_config(
    path=zarr_path,
    name="PhysioNet 2012 ICU Data",
    obs_sets={
        "obs/Gender": "Gender",
        "obs/ICUType": "ICU Type",
        "obs/In-hospital_death": "Mortality",
        "obs/set": "Hospital Set",
    },
    obs_embeddings={},
)

In [None]:
vc.widget()

In [None]:
import anndata as ad

ad.io.write_zarr(Path("physionet2012_subset_adata.zarr"), ad.AnnData(edata))

In [12]:
edata.var_names

Index(['ALP', 'ALT', 'AST', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol',
       'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Glucose', 'HCO3', 'HCT', 'HR',
       'K', 'Lactate', 'MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP',
       'NISysABP', 'Na', 'PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2',
       'SysABP', 'Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight',
       'pH'],
      dtype='object', name='Parameter')

In [13]:
import numpy as np

edata.obsm["X_pca"] = np.array(
    [
        edata[:, edata.var_names == "HR", 10].layers["tem_data"].reshape(1, -1),
        edata[:, edata.var_names == "NIDiasABP", 10].layers["tem_data"].reshape(1, -1),
    ]
).reshape(-1, 2)  # np.random.rand(edata.n_obs, 10)

In [14]:
(np.isnan(edata.obsm["X_pca"]).sum(1) == 0).sum()

np.int64(6307)

In [15]:
edata2 = edata[np.where(np.isnan(edata.obsm["X_pca"]).sum(1) == 0)[0].tolist()]

In [None]:
len(edata.obsm["X_pca"])

In [None]:
print((np.isnan(edata.obsm["X_pca"]).sum(1) == 2).sum())

In [None]:
edata.obs

In [23]:
edata2 = edata2.copy()
edata2.X = edata2.layers["tem_data"][:, :, 10].reshape(edata2.n_obs, -1)

In [26]:
def optimize_and_gen_config(
    edata,
    zarr_filepath=Path("adata_for_vitessce.zarr"),
    *,
    obs_columns=None,
    obs_labels=None,  # Optional custom labels for obs columns
    obs_embeddings=None,
    obs_embedding_labels=None,  # Optional custom labels for embeddings
    var_cols=None,
    layer="tem_data",
    timestep=0,
):
    """Optimize EHRData for Vitessce and generate configuration.

    Args:
        edata: EHRData object to visualize
        zarr_filepath: Path to save the optimized zarr file
        obs_columns: List of observation column names (without 'obs/' prefix)
        obs_labels: Optional dict mapping column names to display labels
        obs_embeddings: List of embedding keys (without 'obsm/' prefix)
        obs_embedding_labels: Optional dict mapping embedding keys to display labels
        var_cols: Optional list of variable columns to include

    Returns:
        VitessceConfig object
        # TODO: add preview with image, and circled variables
    """
    # from vitessce.data_utils import optimize_adata
    import anndata as ad

    # Create default labels if not provided
    if obs_embeddings is None:
        obs_embeddings = ["X_pca"]
    if obs_columns is None:
        obs_columns = ["Gender", "ICUType", "In-hospital_death", "set"]
    if obs_labels is None:
        obs_labels = {col: col.replace("_", " ").title() for col in obs_columns}
    if obs_embedding_labels is None:
        obs_embedding_labels = {emb: emb.upper() for emb in obs_embeddings}

    # Store as AnnData object for Vitessce
    adata = ad.AnnData(edata)
    adata.write_zarr(zarr_filepath)

    # Construct obs_sets dict with 'obs/' prefix
    obs_sets = {f"obs/{col}": obs_labels[col] for col in obs_columns}

    # Construct obs_embeddings dict with 'obsm/' prefix
    obs_embeddings_dict = {f"obsm/{emb}": obs_embedding_labels[emb] for emb in obs_embeddings}

    # Generate and return Vitessce config
    vc = ed.integrations.vitessce.gen_config(
        path=zarr_filepath,
        name="PhysioNet 2012 ICU Data",
        obs_sets=obs_sets,
        obs_embeddings=obs_embeddings_dict,
    )
    return vc

In [27]:
vc = optimize_and_gen_config(edata2)
vc.widget()

VitessceWidget(uid='6948')

In [None]:
# Generate Vitessce configuration
vc = ed.integrations.vitessce.gen_config(
    path=Path("physionet2012_subset_adata.zarr"),
    name="PhysioNet 2012 ICU Data",
    obs_sets={
        "obs/Gender": "Gender",
        "obs/ICUType": "ICU Type",
        "obs/In-hospital_death": "Mortality",
        "obs/set": "Hospital Set",
    },
    obs_embeddings={},
)
vc  # .widget()

In [None]:
# Generate Vitessce configuration
import zarr

vc = ed.integrations.vitessce.gen_config(
    store=zarr.storage.StorePath("physionet2012_subset_adata.zarr"),
    name="PhysioNet 2012 ICU Data",
    obs_sets={
        "obs/Gender": "Gender",
        "obs/ICUType": "ICU Type",
        "obs/In-hospital_death": "Mortality",
        "obs/set": "Hospital Set",
    },
    obs_embeddings={},
)
vc  # .widget()

## Display the Interactive Visualization

Now display the interactive widget! You can:
- **Select patient groups** by gender, ICU type, or mortality outcome
- **Explore feature distributions** across different patient groups
- **View heatmaps** of clinical variables
- **All views are linked** - selections in one view update all others


In [None]:
# Display the widget
# TODO: make a nicer one
vc  # .widget()

## Summary

In this tutorial, we learned:

- ‚úÖ How to save EHRData to Zarr format with `.write_zarr()`
- ‚úÖ How to use `ed.integrations.vitessce.gen_config()` to create interactive visualizations
- ‚úÖ How to specify which patient attributes to visualize with `obs_sets`
- ‚úÖ How to display Vitessce widgets in Jupyter notebooks

The `gen_config()` helper handles the complexity of setting up Vitessce, allowing you to focus on exploring your clinical data!

## Where to go next

### üè• Working with OMOP Data
- **[OMOP Common Data Model](omop_intro)** - Load standardized healthcare data from OMOP databases and construct EHRData objects step-by-step.
