# Advanced: Data Management and Sharing with Lamin

This tutorial demonstrates how to use [Lamin](https://lamin.ai/) to:

- **Store** EHRData objects in the cloud with full provenance tracking
- **Share** interactive visualizations with collaborators via LaminHub

```{note}
**You don't need Lamin** to work with `ehrdata`, and this notebook is **OPTIONAL** when learning about `ehrdata`.

Lamin provides functionality to query, trace, and validate datasets and models at scale.

This notebook shows how one aspect of Lamin, its web browser based interface **LaminHub**, can be leveraged to share interactive `EHRData` visualizations.
```

## Why Lamin?

Lamin provides:
- üìä **Data versioning** - Track changes to your datasets over time
- üåê **Cloud storage** - Share large datasets without email attachments
- üîó **Lineage tracking** - Understand how datasets are derived
- üë• **Collaboration** - Easy sharing with team members

## Prerequisites

This tutorial builds on earlier tutorials:
1. **Getting Started** - Basic EHRData concepts
2. **OMOP Introduction** - Loading OMOP data
3. **Interactive Visualization** - Vitessce basics

You'll also need:
- A Lamin account (sign up at [lamin.ai](https://lamin.ai))
- Access to a Lamin instance (or create your own)

## Setup

Install required packages:


In [None]:
%pip install Lamin

**Important:** Before running this notebook, authenticate with Lamin from your terminal:

```bash
lamin login <your-email>
```


In [None]:
import ehrdata as ed
import lamindb as ln
import pandas as pd
from pathlib import Path
import duckdb

## Connect to Lamin

Connect to your Lamin instance (replace with your instance name):


In [5]:
# Replace 'your-username/your-instance' with your actual instance
ln.connect("theislab/ehr")

[92m‚Üí[0m connected lamindb: theislab/ehr


## Part 1: Load and Prepare OMOP Data

Let's start by loading some clinical data from the OMOP Common Data Model :cite:`reyna2020early` :cite:`goldberger2000physiobank`.




In [6]:
# Set up database connection
con = duckdb.connect(":memory:")
ed.dt.mimic_iv_omop(backend_handle=con)

# Load patient visits
edata = ed.io.omop.setup_obs(
    backend_handle=con,
    observation_table="person_visit_occurrence",
    death_table=True,
)

# Load measurement variables
edata = ed.io.omop.setup_variables(
    edata=edata,
    layer="measurements",
    backend_handle=con,
    data_tables=["measurement"],
    data_field_to_keep=["value_as_number"],
    interval_length_number=1,
    interval_length_unit="h",
    num_intervals=24,
    time_precision="datetime",
    enrich_var_with_feature_info=True,
)

print(f"Loaded {edata.n_obs} visits with {edata.n_vars} measurement types")

 [  4]
 [  5]
 [  6]
 [  7]
 [  8]
 [  9]
 [ 10]
 [ 11]
 [ 19]
 [ 21]
 [ 29]
 [ 31]
 [ 33]
 [ 38]
 [ 41]
 [ 47]
 [ 49]
 [ 51]
 [ 71]
 [ 83]
 [ 85]
 [ 95]
 [115]
 [117]
 [118]
 [119]
 [128]
 [129]
 [133]
 [138]
 [139]
 [147]
 [150]
 [153]
 [161]
 [166]
 [171]
 [191]
 [204]
 [205]
 [218]
 [220]
 [223]
 [225]
 [231]
 [233]
 [235]
 [240]
 [242]
 [251]
 [267]
 [277]
 [289]
 [293]
 [299]
 [304]
 [305]
 [307]
 [308]
 [311]
 [312]
 [317]
 [359]
 [367]]


Loaded 852 visits with 450 measurement types


For text descriptions in the `Vitessce` visualization, we choose the `"concept"` name column of `.var`:

In [7]:
edata.var.set_index("concept_name", inplace=True)

We format datetime columns to strings for storing the `EHRData` object in zarr:

In [8]:
for column in edata.obs.columns:
    if pd.api.types.is_datetime64_any_dtype(edata.obs[column]):
        edata.obs[column] = edata.obs[column].astype(str)

for column in edata.var.columns:
    if pd.api.types.is_datetime64_any_dtype(edata.var[column]):
        edata.var[column] = edata.var[column].astype(str)

## Part 2: Create Visualization and Upload to Lamin

Now we'll create an interactive `Vitessce` visualization and upload it to Lamin. 

First, let's create a Vitessce config that will automatically save our data to zarr format:

In [None]:
# Generate Vitessce config and save to zarr (combines both steps!)
zarr_path = Path("mimic_iv_visits.zarr")

vc, artifact = ed.integrations.vitessce.gen_default_config(
    edata,
    zarr_filepath=zarr_path,
    obs_columns=["gender_concept_id", "race_concept_id"],
    layer="measurements",
    timestep=0,
    return_lamin_artifact=True,
)

print(f"‚úì Created Vitessce config and saved data to {zarr_path}")

‚úì Created Vitessce config and saved data to mimic_iv_visits.zarr


### Upload to Lamin

Now let's upload this dataset to Lamin. This happens in two steps:

1. Create a lamin `Artifact` from our dataset locally (has been just done by `gen_default_config` above)
2. Upload the `Artifact` to the remote Lamin database

**What is a Lamin Artifact?**
A `ln.Artifact` is Lamin's way of tracking data files with rich metadata:
- **Provenance**: Who created it, when, from what sources
- **Versioning**: Automatic tracking of changes
- **Storage**: Seamless upload to cloud storage
- **Discovery**: Easy search and retrieval via metadata tags

**What happens during `artifact.save()`?**
1. Computes a unique hash of your data (for deduplication)
2. Uploads the file to your configured cloud storage (S3, GCS, etc.)
3. Registers metadata in the Lamin database
4. Tracks lineage and relationships to other artifacts


Lets see what the artifact prints to our notebook:

In [13]:
artifact

Artifact(uid='4ozkjwU5dDx5hAov0000', version_tag=None, is_latest=True, key=None, description='MIMIC-IV visits with 24-hour hourly measurements', suffix='.zarr', kind='dataset', otype='AnnData', size=538445, hash='38OfbGgMuqiAFZFoj_64nw', n_files=291, n_observations=None, branch_id=1, space_id=1, storage_id=1, run_id=None, schema_id=None, created_by_id=2, created_at=2026-01-25 16:06:13 UTC, is_locked=False)

and upload it to Lamin:

In [None]:
# Upload to cloud storage
artifact.save()

print(f"‚úì Uploaded artifact: {artifact.uid}")
print(f"  Cloud URL: {artifact.path.to_url()}")

‚úì Uploaded artifact: 4ozkjwU5dDx5hAov0000
  Cloud URL: https://lamin<...>.zarr


We also upload the `Vitessce` config `vc` to Lamin as follows

In [12]:
from lamindb.integrations import save_vitessce_config

# Save config as an artifact
vc_artifact = save_vitessce_config(
    vc,
    # description="Interactive view of MIMIC-IV OMOP visits",
)

print(f"‚úì Saved Vitessce config: {vc_artifact.uid}")
print("Now anyone with access can view this in LaminHub!")

[92m‚Üí[0m VitessceConfig references these artifacts:
Artifact(uid='4ozkjwU5dDx5hAov0000', version_tag=None, is_latest=True, key=None, description='MIMIC-IV visits with 24-hour hourly measurements', suffix='.zarr', kind='dataset', otype='AnnData', size=538445, hash='38OfbGgMuqiAFZFoj_64nw', n_files=291, n_observations=None, branch_id=1, space_id=1, storage_id=1, run_id=None, schema_id=None, created_by_id=2, created_at=2026-01-25 16:06:13 UTC, is_locked=False)
[92m‚Üí[0m returning artifact with same hash: Artifact(uid='DT7KBv1uRxIjyizx0000', version_tag=None, is_latest=True, key=None, description=None, suffix='.vitessce.json', kind='__lamindb_config__', otype=None, size=2189, hash='2mIxbXBQFxB77UBjKGGzjg', n_files=None, n_observations=None, branch_id=1, space_id=1, storage_id=1, run_id=15, schema_id=None, created_by_id=2, created_at=2026-01-25 16:50:37 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
[92m‚Üí[0m VitessceConfig: https://lamin.ai/thei

## Part 3: Explore the Interactive Visualization in the Browser

### Exploring the interactive View in the Browser

Now, without the need to start up Jupyter notebooks or any coding effort anymore, the visualization is accessible from LaminHub in your browser, looking as such when opening LaminHub:

<p style="text-align:center; ">
<img src="../_static/tutorial_images/vitessce_preview_mimiciv_laminhub.png" alt="vitessce_preview_mimiciv">
</p>

We created the `Vitessce` in this notebook, and can still explore it here; however, e.g. collaborators don't need to run (or understand) this notebook to explore the dataset - a web browser is all that's required for them now!

In [None]:
# Preview the Vitessce widget in the notebook
vc.widget()

<p style="text-align:center; ">
<img src="../_static/tutorial_images/vitessce_preview_mimiciv.png" alt="vitessce_preview_mimiciv">
</p>

## Summary

In this tutorial, you learned how to:

‚úÖ **Load** OMOP data into EHRData  
‚úÖ **Store** datasets in Lamin with metadata  
‚úÖ **Share** interactive visualizations via LaminHub  

## Key Benefits of Using Lamin

1. **Collaboration** - Team members can easily access and explore datasets
2. **Reproducibility** - Full lineage tracking ensures transparent workflows
3. **Versioning** - Track changes and compare different versions
4. **Sharing** - Shae interactive visualizations without local setup

## Next Steps

- **Learn more about Lamin**: [lamin.ai](https://lamin.ai)

## Resources

- **Lamin Documentation**: [docs.lamin.ai](https://docs.lamin.ai)
- **Vitessce**: [vitessce.io](https://vitessce.io)
- **OMOP CDM**: [ohdsi.github.io/CommonDataModel](https://ohdsi.github.io/CommonDataModel)
