# Jupyter Hooks + Session → AnnData Bridge (He developmental complete)

This notebook is a hands-on testbed for:

- **Hooks** (UI → Python): `on_selection`, `on_hover`, `on_click`, `on_ready`, `on_message`
- **Imperative state** (no callbacks): `viewer.state`, `viewer.wait_for_ready()`, `viewer.wait_for_event(...)`
- **Session capture (no download)**: `viewer.get_session_bundle()` → `CellucidSessionBundle`
- **Mutate AnnData from session state**: `bundle.apply_to_anndata(...)` / `viewer.apply_session_to_anndata(...)`

Dataset:
- `cellucid-python/data/experiments/he_developmental_complete_with_3d_umap.h5ad`

Notes:
- The embedded viewer loads from `https://www.cellucid.com` (network access required).
- Session state is currently **index-based** (cell 0..N-1). Reordering/subsetting breaks identity.


In [1]:
from pathlib import Path
import sys

HERE = Path(__file__).resolve().parent if "__file__" in globals() else Path.cwd()

def find_project_root(start: Path) -> Path:
    """Locate the `cellucid-python` repo root (folder containing `pyproject.toml`)."""
    for candidate in [start, *start.parents]:
        if (candidate / "pyproject.toml").exists():
            return candidate
        if (candidate / "cellucid-python" / "pyproject.toml").exists():
            return candidate / "cellucid-python"
    return start

PROJECT_ROOT = find_project_root(HERE)
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))


In [2]:
from __future__ import annotations

from pathlib import Path

import anndata as ad

import cellucid
from cellucid import show_anndata, apply_cellucid_session_to_anndata

print("cellucid version:", getattr(cellucid, "__version__", "unknown"))


cellucid version: 0.0.1a2


In [3]:
# Find the dataset file from wherever this notebook is executed.
# Supports common working directories:
# - repo root (`.../_`)
# - `cellucid-python/`
# - `cellucid-python/docs/...`

DATA_REL_1 = Path("cellucid-python/data/experiments/he_developmental_complete_with_3d_umap.h5ad")
DATA_REL_2 = Path("data/experiments/he_developmental_complete_with_3d_umap.h5ad")

def find_data_path() -> Path:
    cwd = Path.cwd()
    for base in [cwd, *cwd.parents]:
        for rel in (DATA_REL_1, DATA_REL_2):
            p = (base / rel)
            if p.exists():
                return p.resolve()
    raise FileNotFoundError(
        "Could not find he_developmental_complete_with_3d_umap.h5ad. "
        "Tried: 'cellucid-python/data/experiments/...' and 'data/experiments/...' in cwd and parents."
    )

DATA_PATH = find_data_path()
DATASET_ID = "he_developmental_complete_with_3d_umap"

print("DATA_PATH:", DATA_PATH)
print("Exists:", DATA_PATH.exists())
print("Size (MB):", round(DATA_PATH.stat().st_size / (1024 * 1024), 1))


DATA_PATH: /Users/kemalinecik/git_nosync/_/cellucid-python/data/experiments/he_developmental_complete_with_3d_umap.h5ad
Exists: True
Size (MB): 290.1


In [4]:
# Load AnnData in backed mode (fast + low memory).
# This is enough for:
# - hooks inspection (obs metadata)
# - applying sessions (adds columns to .obs/.uns)

adata = ad.read_h5ad(DATA_PATH, backed="r")

print("adata:", adata)
print("n_obs:", adata.n_obs, "n_vars:", adata.n_vars)
print("obs columns:", list(adata.obs.columns))
print("obsm keys:", list(adata.obsm.keys()))


adata: AnnData object with n_obs × n_vars = 71650 × 8192 backed at '/Users/kemalinecik/git_nosync/_/cellucid-python/data/experiments/he_developmental_complete_with_3d_umap.h5ad'
    obs: 'sample_ID', 'organ', 'age', 'cell_type', 'sex', 'sex_inferred', 'concatenated_integration_covariates', 'integration_donor', 'integration_biological_unit', 'integration_sample_status', 'integration_library_platform_coarse', 'n_genes', 'LVL3', 'LVL2', 'LVL1', 'LVL0', '_scvi_batch', '_scvi_labels'
    uns: 'metrics', 'neighbors', 'rank_genes_groups', 'umap'
    obsm: 'Unintegrated', 'X_pca', 'X_umap', 'X_umap_1d', 'X_umap_2d', 'X_umap_3d', 'harmony', 'scvi'
    obsp: 'connectivities', 'distances'
n_obs: 71650 n_vars: 8192
obs columns: ['sample_ID', 'organ', 'age', 'cell_type', 'sex', 'sex_inferred', 'concatenated_integration_covariates', 'integration_donor', 'integration_biological_unit', 'integration_sample_status', 'integration_library_platform_coarse', 'n_genes', 'LVL3', 'LVL2', 'LVL1', 'LVL0', '_scvi

## 1) Start the viewer (AnnData → embedded UI)

This starts a local data server and embeds an iframe pointing at `https://www.cellucid.com`.

**Important (browser security):** the hosted viewer is HTTPS. Many browsers/webviews (notably VSCode)
block HTTPS → HTTP localhost fetches as mixed content. We therefore run the local server over **HTTPS**.

If the viewer can’t connect and `viewer.debug_connection()` reports a likely certificate issue,
open `viewer.server_url + "/_cellucid/health"` once in a normal browser tab and accept/trust the certificate,
then reload the viewer.

If you’re on a remote kernel (HPC/JupyterHub), you’ll need SSH tunneling.
See the docs page: `docs/user_guide/web_app/b_data_loading/05_jupyter_tutorial.md`.


In [5]:
viewer = show_anndata(
    DATA_PATH,
    height=650,
    dataset_name="He developmental complete (3D UMAP)",
    dataset_id=DATASET_ID,
    use_https=True,
)

# Some notebook frontends require explicitly re-displaying the viewer object.
viewer


In [7]:
# Connectivity diagnostics (useful in VSCode/JupyterLab where devtools can be limited).
conn = viewer.debug_connection()
conn

{'server_url': 'https://127.0.0.1:8765',
 'viewer_url': 'https://www.cellucid.com?remote=https://127.0.0.1:8765&anndata=true&jupyter=true&viewerId=796deb82986f1761&viewerToken=11655c6d235b99504e272ca81a1a3e88',
 'viewer_id': '796deb82986f1761',
 'notebook_type': 'vscode',
 'server_running': True,
 'server_use_https': True,
 'server_ssl_certfile': '/Users/kemalinecik/.cellucid/ssl/localhost.crt',
 'server_ssl_keyfile': '/Users/kemalinecik/.cellucid/ssl/localhost.key',
 'health_url': 'https://127.0.0.1:8765/_cellucid/health',
 'python_health_status': 200,
 'python_health_json': {'status': 'ok',
  'type': 'anndata',
  'version': '0.0.1a2',
  'format': 'h5ad',
  'is_backed': True,
  'n_cells': 71650,
  'n_genes': 8192},
 'python_health_verified_error': 'URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1017)>',
 'likely_browser_block': 'localhost_https_certificate_not_trusted_by_os',
 'likely_browser_fix': 'VSCode webviews 

In [None]:

# Robust scripts should wait for the frontend to be ready.
ready = viewer.wait_for_ready(timeout=120)
print("ready event:", ready)
print("viewer.state.ready:", viewer.state.ready)


## 2) Python → UI commands

These commands are delivered into the iframe via `postMessage` (per-viewer token authenticated).


In [None]:
# Pick an existing obs field (this dataset has 'cell_type')
viewer.set_color_by("cell_type")

# Highlight a few known indices
viewer.highlight_cells([0, 10, 42, 100], color="#00cc66")

# Hide a couple cells (then undo)
viewer.set_visibility([0, 10], visible=False)
viewer.set_visibility([0, 10], visible=True)

# Reset camera + clear highlights
viewer.reset_view()
viewer.clear_highlights()


## 3) UI → Python hooks + imperative state

### Register hooks

Run the next cell, then interact in the UI:
- lasso-select a region (should trigger `selection`)
- hover/click (should trigger `hover`/`click`)


In [None]:
# Minimal debug hooks (prints only small summaries to avoid flooding).

@viewer.on_ready
def _on_ready(ev):
    print("[hook ready]", ev)

@viewer.on_selection
def _on_selection(ev):
    cells = ev.get("cells") or []
    source = ev.get("source")
    print(f"[hook selection] source={source!r} n={len(cells)} head={cells[:10]}")

    # Example analysis: show cell_type breakdown for selected cells.
    try:
        if cells:
            vc = adata.obs.iloc[cells]["cell_type"].value_counts().head(10)
            print("[hook selection] cell_type top10:\n", vc)
    except Exception as e:
        print("[hook selection] analysis failed:", e)

@viewer.on_hover
def _on_hover(ev):
    # Hover can fire frequently; print only when a concrete cell is hovered.
    cell = ev.get("cell")
    if cell is not None:
        print("[hook hover] cell=", cell)

@viewer.on_click
def _on_click(ev):
    print("[hook click]", ev)

@viewer.on_message
def _on_any(ev):
    # Uncomment for very verbose debugging:
    # print("[hook message]", ev)
    pass

print("Hooks registered. Now interact in the UI.")


### Imperative wait (no callback)

Run the next cell; it will **block** until the next selection arrives.
Then make a selection in the UI.


In [None]:
ev = viewer.wait_for_event("selection", timeout=120)
print("wait_for_event(selection) returned:", {k: ev.get(k) for k in ["source", "cells"] if k in ev})
print("viewer.state.selection now:", viewer.state.selection)


## 4) Session capture (no manual download)

To make this interesting, do some UI work first:
- create one or more **highlight groups** (confirm a selection)
- optionally create **user-defined fields** (categorical from pages / duplicate continuous field)

Then run the capture cell below.


In [None]:
@viewer.on_session_bundle
def _on_session_bundle(ev):
    # This fires when the upload completes (success or error).
    print("[hook session_bundle]", ev)

bundle = viewer.get_session_bundle(timeout=180)
print("bundle:", bundle)
print("bundle.path:", bundle.path)
print("bundle.path exists:", bundle.path.exists())


In [None]:
manifest = bundle.manifest
fp = manifest.get("datasetFingerprint")
print("datasetFingerprint:", fp)
print("chunk count:", len(manifest.get("chunks") or []))


## 5) Apply session → mutate AnnData

This reads the session bundle and materializes meaningful state into AnnData.

Defaults are intentionally safe:
- adds new columns (highlights / user-defined)
- destructive edits (rename/delete overlays) are opt-in


In [None]:
before_obs_cols = set(map(str, adata.obs.columns))

# Apply in-place to the backed AnnData handle (keeps memory low).
# If you want a copy, set inplace=False (may be RAM-heavy for large datasets).
out = apply_cellucid_session_to_anndata(
    adata,
    bundle,
    inplace=True,
    store_uns=True,
    expected_dataset_id=DATASET_ID,
    apply_field_overlays=True,
    update_original_obs=False,
    delete_policy="none",
    add_highlights=True,
    highlights_target="obs",
    highlights_prefix="cellucid_highlight__",
    add_user_defined_fields=True,
)

after_obs_cols = set(map(str, out.obs.columns))
added = sorted(after_obs_cols - before_obs_cols)

print("Applied to AnnData.")
print("New obs columns added (count):", len(added))
print("New obs columns (head):", added[:20])
print("Highlight columns added (count):", sum(c.startswith("cellucid_highlight__") for c in added))


### Optional: highlights as a sparse matrix (`.obsm`)

If you have many highlight groups, storing highlights in `.obsm["cellucid_highlights"]` can be more memory-efficient than many boolean `.obs` columns.


In [None]:
out2 = apply_cellucid_session_to_anndata(
    adata,
    bundle,
    inplace=True,
    store_uns=True,
    expected_dataset_id=DATASET_ID,
    add_highlights=True,
    highlights_target="obsm",
    add_user_defined_fields=False,
    apply_field_overlays=False,
)

m = out2.obsm.get("cellucid_highlights")
print("obsm['cellucid_highlights']:", type(m), getattr(m, "shape", None), getattr(m, "nnz", None))


In [None]:
# Inspect the stored session receipt in .uns
session = out.uns.get("cellucid", {}).get("session", {})
print("uns['cellucid']['session'] keys:", sorted(session.keys()))
print("datasetFingerprintMatchesAnnData:", session.get("datasetFingerprintMatchesAnnData"))
print("skippedDatasetDependentChunks:", session.get("skippedDatasetDependentChunks"))
print("highlights summary keys:", sorted((session.get("highlights") or {}).keys()))
print("user_defined_fields keys:", sorted((session.get("user_defined_fields") or {}).keys()))


## 6) One-liner: `viewer.apply_session_to_anndata(...)`

This does `get_session_bundle()` → apply → cleanup temp bundle file.

To avoid name collisions during repeated tests, this example uses a different prefix.


In [None]:
before_obs_cols = set(map(str, adata.obs.columns))

_ = viewer.apply_session_to_anndata(
    adata,
    inplace=True,
    store_uns=True,
    add_highlights=True,
    highlights_target="obs",
    highlights_prefix="cellucid_highlight_one_liner__",
    add_user_defined_fields=True,
    user_defined_prefix="one_liner__",
)

after_obs_cols = set(map(str, adata.obs.columns))
added = sorted(after_obs_cols - before_obs_cols)
print("New columns (head):", added[:20])


## 7) Identity guard test (expected dataset id)

This is a *negative test*: apply the same bundle with a deliberately wrong `expected_dataset_id`.
Dataset-dependent chunks should be skipped.


In [None]:
adata_guard = ad.read_h5ad(DATA_PATH, backed="r")
before = set(map(str, adata_guard.obs.columns))

out_guard = apply_cellucid_session_to_anndata(
    adata_guard,
    bundle,
    inplace=True,
    store_uns=True,
    expected_dataset_id="WRONG_DATASET_ID",
    add_highlights=True,
    add_user_defined_fields=True,
)

after = set(map(str, out_guard.obs.columns))
added = sorted(after - before)
session = out_guard.uns.get("cellucid", {}).get("session", {})

print("datasetFingerprint:", session.get("datasetFingerprint"))
print("expectedDatasetId:", session.get("expectedDatasetId"))
print("datasetFingerprintMatchesAnnData:", session.get("datasetFingerprintMatchesAnnData"))
print("skippedDatasetDependentChunks:", session.get("skippedDatasetDependentChunks"))
print("New obs columns added:", len(added))


## 8) Cleanup

- `bundle.cleanup()` removes the temp `.cellucid-session` file for the captured bundle.
- `viewer.stop()` shuts down the local data server.


In [None]:
tmp_path = bundle.path
print("bundle temp exists before cleanup:", tmp_path.exists())
bundle.cleanup()
print("bundle temp exists after cleanup:", tmp_path.exists())

# Stop the server when you're done (important if you re-run cells often).
viewer.stop()
