# Interactive Visualization of `EHRData` with Vitessce

This tutorial demonstrates how to create interactive visualizations of EHRData objects using [Vitessce](https://vitessce.io/).

Vitessce provides linked, coordinated views that allow you to explore clinical data interactively in a web browser or Jupyter notebook.

```{note}
**Prerequisites:** This tutorial assumes familiarity with the concepts from:
- [Getting Started](getting_started) - Understanding the EHRData structure
- [Real Dataset Example: PhysioNet 2012](real_dataset_example_physionet2012) - Working with real clinical data
```

## Load Data

We'll use the **PhysioNet 2012 ICU Challenge** dataset, which contains time series measurements from ICU patients:

- **~4,000 patients** admitted to intensive care units
- **37 clinical variables** including vital signs (heart rate, blood pressure), lab values (glucose, creatinine), and demographics
- **48 hours of measurements** after ICU admission
- **Outcome**: In-hospital mortality prediction

This dataset is ideal for demonstrating interactive exploration of multivariate clinical time series.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import ehrdata as ed

In [3]:
# edata = ed.dt.physionet2012(layer="tem_data")
edata = ed.io.read_h5ad("physionet2012_subset.h5ad")

[93m![0m This operation does not affect numeric layer tem_data.
[94m•[0m Harmonizing missing values of layer tem_data


## Generate Vitessce Configuration

[Vitessce](http://vitessce.io/) creates **interactive widgets directly in Jupyter notebooks** with linked, coordinated views. When you select data in one view, all other views update automatically - making it easy to explore patterns across different visualizations simultaneously.

The library is highly customizable with many view types and configuration options. See the [vitessce-python documentation](https://python-docs.vitessce.io/) for comprehensive examples and advanced configurations.

**ehrdata provides a quickstart** via `ed.integrations.vitessce.optimize_and_gen_config()`, which creates a sensible default configuration for clinical time series data. You can specify:
- `obs_columns`: Patient attributes to group by (e.g., gender, ICU type)
- `scatter_var_cols`: Variables to plot in the scatterplot
- `obs_embedding`: Dimensionality reduction for patient positioning (e.g., PCA)
- `layer` and `timestep`: Which time series layer and timepoint to visualize


Lets take a look at this in action.

In [4]:
vc = ed.integrations.vitessce.optimize_and_gen_config(
    edata,
    obs_columns=["Gender", "ICUType", "In-hospital_death", "set"],
    scatter_var_cols=["HR", "NIDiasABP"],
    obs_embedding="X_pca",
    layer="tem_data",
    timestep=10,
)
vc.widget()

VitessceWidget(uid='a605')

The output should look like this (and can be rearranged):

<p style="text-align:center;">
<img src="../_static/tutorial_images/vitessce_preview.png" alt="vitessce_preview">
</p>

### View Components

| View | Description |
|------|-------------|
| **Person Sets** (top left) | Hierarchical grouping of patients by categorical variables (Gender, ICU Type, etc.) |
| **Person Set Sizes** (middle left) | Bar chart showing patient counts per group |
| **Feature List** (top middle) | List of clinical variables to select for visualization |
| **Scatterplot** (middle right) | Patients positioned by PCA embedding, colored by selected group |
| **Heatmap** (bottom right)| Matrix view of variable values across patients |
| **Violinplot** (bottom left)| Comparison of distribution between selected Person Sets |
| **Histograms** (bottom middle) | Distribution of selected variable values |

The power of `Vitessce` really starts to shine when you interact with the views, while all of them are linked and update each other based on what you're looking at!

For instance, we can choose another variable (e.g. `WBC`, White Blood Cell Count (cells/nL)) at hour 10 after ICU entry, and compare not `Gender`, but `In-hospital_death` with just 2 clicks:

<p style="text-align:center; ">
<img src="../_static/tutorial_images/vitessce_preview_wbc_inhospitaldeath.png" alt="vitessce_preview">
</p>

Another way to obtain groups beyond the ones we know beforehand, such as `Gender` and `In-hospital_death`, is by using the Lasso tool on the Scatterplot. Simply select the Lasso Icon (we made the Scatterplot slightly larger for this), cand circle those groups you want to explore based on their scatterplot profile - run this notebook to try it yourself!

<p style="text-align:center; ">
<img src="../_static/tutorial_images/vitessce_preview_scatterplotlasso.png" alt="vitessce_preview">
</p>

This becomes particularly interesting when considering representation-learning approaches that provide meaningful representations learnt from complex data - see the machine learning notebooks of `ehrdata` and `ehrapy` to see how such approaches are readily available.

## Summary

In this tutorial, we learned:

- ✅ How to save EHRData to Zarr format with `.write_zarr()`
- ✅ How to use `ed.integrations.vitessce.gen_config()` to create interactive visualizations
- ✅ How to specify which patient attributes to visualize with `obs_sets`
- ✅ How to display Vitessce widgets in Jupyter notebooks

## Further Resources

- **[Vitessce](http://vitessce.io/)** - The Vitessce visual integration tool for exploration of spatial single-cell experiments
- **[vitessce-python Documentation](https://python-docs.vitessce.io/)** - Python API documentation for Vitessce