# Jupyter Integration Tutorial

This tutorial covers options 10-11: embedding Cellucid directly in Jupyter notebooks.

## Jupyter Options Overview

| # | Method | Data Format | In-Memory | h5ad | zarr | Lazy Loading |
|---|--------|-------------|-----------|------|------|--------------|
| 10 | `show()` | Pre-exported | No | No | No | Yes |
| 11 | `show_anndata()` | h5ad/zarr/memory | **Yes** | **Yes** | **Yes** | Yes |

## When to Use

- Interactive analysis workflows
- Exploring analysis results immediately
- Programmatic viewer control (highlight, filter, color)
- Integration with scanpy/Python analysis

In [None]:
# Setup
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

---

# Option 10: Show Pre-exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | Yes |
| Lazy Loading | Yes |
| Performance | Best |

Use `show()` for pre-exported data created with `prepare()`.

In [None]:
from cellucid import show

# Show pre-exported data
# viewer = show("/path/to/export_dir", height=500)

print("Use show() for pre-exported data")
print("Example: viewer = show('/path/to/export')")

---

# Option 11: Show h5ad, zarr, or In-Memory AnnData

| Property | Value |
|----------|-------|
| Data Format | h5ad, zarr, or in-memory AnnData |
| In-Memory AnnData | **Yes** |
| h5ad/zarr File Support | **Yes** |
| Python Required | Yes |
| Lazy Loading | Yes (backed mode for h5ad, chunked for zarr) |
| Performance | Good |

Use `show_anndata()` for direct visualization without export.

In [None]:
import numpy as np
import pandas as pd
import anndata as ad
from scipy import sparse

# Create example AnnData
n_cells = 5000
n_genes = 1000

# Sparse expression matrix
X = sparse.random(n_cells, n_genes, density=0.1, format='csr', dtype=np.float32)

adata = ad.AnnData(X)
adata.var_names = [f"Gene_{i}" for i in range(n_genes)]
adata.obs_names = [f"Cell_{i}" for i in range(n_cells)]

# REQUIRED: UMAP embeddings
adata.obsm['X_umap_3d'] = np.random.randn(n_cells, 3).astype(np.float32) * 5
adata.obsm['X_umap_2d'] = np.random.randn(n_cells, 2).astype(np.float32) * 5

# Cell metadata
adata.obs['cell_type'] = pd.Categorical(
    np.random.choice(['T cell', 'B cell', 'Macrophage', 'NK cell', 'Dendritic'], n_cells)
)
adata.obs['cluster'] = pd.Categorical(
    np.random.choice([f'C{i}' for i in range(10)], n_cells)
)
adata.obs['n_counts'] = np.random.poisson(5000, n_cells)
adata.obs['score'] = np.random.uniform(0, 1, n_cells)

print(f"Created AnnData: {adata}")
print(f"obsm keys: {list(adata.obsm.keys())}")
print(f"obs columns: {list(adata.obs.columns)}")

In [None]:
from cellucid import show_anndata

# Visualize in-memory AnnData directly
# This starts a local server and embeds the viewer

# Uncomment to visualize:
# viewer = show_anndata(adata, height=500)

print("Use show_anndata(adata) to visualize in-memory AnnData")

In [None]:
# From h5ad file (lazy loading)
# viewer = show_anndata("/path/to/data.h5ad", height=500)

# From zarr store (lazy loading)
# viewer = show_anndata("/path/to/data.zarr", height=500)

print("Use show_anndata() for h5ad files, zarr stores, or in-memory AnnData in Jupyter")

---

# Programmatic Viewer Control

The viewer object returned by `show()` and `show_anndata()` supports programmatic control.

## Available Methods

| Method | Description |
|--------|-------------|
| `highlight_cells(indices, color)` | Highlight specific cells |
| `clear_highlights()` | Remove all highlights |
| `set_color_by(field)` | Change coloring field |
| `set_visibility(indices, visible)` | Show/hide cells |
| `reset_view()` | Reset camera to default |
| `stop()` | Stop the server |

In [None]:
# Example: Highlight cells from analysis

# Simulate finding interesting cells
high_expression_cells = np.where(adata.obs['score'] > 0.8)[0].tolist()
print(f"Found {len(high_expression_cells)} cells with high score")

# Highlight them in the viewer
# viewer.highlight_cells(high_expression_cells, color="#ff0000")

In [None]:
# Example: Highlight by cell type

t_cell_indices = adata.obs[adata.obs['cell_type'] == 'T cell'].index
t_cell_positions = [int(i.split('_')[1]) for i in t_cell_indices]
print(f"Found {len(t_cell_positions)} T cells")

# Highlight T cells in blue
# viewer.highlight_cells(t_cell_positions, color="#0066ff")

In [None]:
# Example: Change coloring

# Color by categorical field
# viewer.set_color_by("cell_type")

# Color by continuous field
# viewer.set_color_by("n_counts")

# Color by gene expression
# viewer.set_color_by("Gene_0")

print("Available coloring fields:")
print(f"  Categorical: {[c for c in adata.obs.columns if hasattr(adata.obs[c], 'cat')]}")
print(f"  Continuous: {[c for c in adata.obs.columns if not hasattr(adata.obs[c], 'cat')]}")
print(f"  Genes: {adata.var_names[:5].tolist()} ... ({adata.n_vars} total)")

In [None]:
# Example: Hide/show cells

# Hide first 100 cells
# viewer.set_visibility(list(range(100)), visible=False)

# Show them again
# viewer.set_visibility(list(range(100)), visible=True)

print("Use set_visibility() to hide/show cells")

In [None]:
# Example: Reset and cleanup

# Reset camera
# viewer.reset_view()

# Clear all highlights
# viewer.clear_highlights()

# Stop the server when done
# viewer.stop()

print("Use viewer.stop() to cleanup when done")

---

# Integration with Analysis Workflows

Cellucid integrates naturally with scanpy analysis workflows.

In [None]:
# Example: Visualize after differential expression analysis

# Assume we've run differential expression and found marker genes
# marker_gene = "Gene_42"

# Highlight cells with high expression of marker gene
# expression = adata[:, marker_gene].X.toarray().flatten()
# high_expr_cells = np.where(expression > np.percentile(expression, 90))[0].tolist()
# 
# viewer.highlight_cells(high_expr_cells, color="#ff6600")
# viewer.set_color_by(marker_gene)

print("Example workflow: highlight cells with high marker gene expression")

In [None]:
# Example: Visualize clustering results

# After running Leiden clustering
# sc.tl.leiden(adata)
# viewer.set_color_by("leiden")

# Highlight a specific cluster
# cluster_0_cells = adata.obs[adata.obs['leiden'] == '0'].index
# viewer.highlight_cells(cluster_0_cells.tolist(), color="#00ff00")

print("Example workflow: visualize and highlight clustering results")

---

# Advanced: Viewer Classes

For more control, use the viewer classes directly.

In [None]:
from cellucid import CellucidViewer, AnnDataViewer

# For pre-exported data
# viewer = CellucidViewer(
#     data_dir="/path/to/export",
#     port=8766,          # Specific port
#     height=600,         # Viewer height
#     interactive=True,   # Enable programmatic control
#     auto_open=False,    # Don't display immediately
# )
# viewer.display()  # Display when ready

# For AnnData (h5ad, zarr, or in-memory)
# viewer = AnnDataViewer(
#     data=adata,  # or "/path/to/file.h5ad" or "/path/to/data.zarr"
#     port=8767,
#     height=600,
#     interactive=True,
#     auto_open=False,
# )
# viewer.display()

print("Use CellucidViewer or AnnDataViewer for more control")

---

# Performance Comparison

| Method | Initial Load | Gene Query | Memory | Centroids |
|--------|--------------|------------|--------|----------|
| `show()` (exported) | Fast | Fast | Low | Full |
| `show_anndata(adata)` | Medium | Medium | As needed | Full |
| `show_anndata("file.h5ad")` | Medium | Medium | Low | Full |
| `show_anndata("data.zarr")` | Medium | Medium | Low | Full |

### Recommendations

| Scenario | Recommendation |
|----------|----------------|
| Quick exploration | `show_anndata(adata)` |
| Large h5ad dataset | `show_anndata("file.h5ad")` (lazy) |
| Large zarr dataset | `show_anndata("data.zarr")` (lazy) |
| Repeated use | Pre-export + `show()` |
| Publication figure | Pre-export + `show()` |
| Cloud data (S3/GCS) | zarr format recommended |

---

# Troubleshooting

### "No UMAP embeddings found"

Your AnnData needs UMAP in `obsm`:

```python
import scanpy as sc
sc.pp.neighbors(adata)
sc.tl.umap(adata, n_components=3)
adata.obsm['X_umap_3d'] = adata.obsm['X_umap']

# Save as h5ad or zarr
adata.write("data.h5ad")
adata.write_zarr("data.zarr")
```

### "Port already in use"

Stop previous viewers or use a different port:

```python
viewer.stop()  # Stop previous viewer
viewer = show_anndata(adata, port=8800)  # Use different port
```

### Viewer not displaying

- Ensure you're in a Jupyter environment
- Check browser console for errors (F12)
- Try refreshing the notebook

### Slow gene loading

Expected for direct AnnData serving. For faster queries:
- Pre-export with `prepare(quantize_var=True)`

---

# Summary

## Quick Reference

| Task | Code |
|------|------|
| Show pre-exported | `viewer = show("/path/to/export")` |
| Show in-memory AnnData | `viewer = show_anndata(adata)` |
| Show h5ad file | `viewer = show_anndata("/path.h5ad")` |
| Show zarr store | `viewer = show_anndata("/path.zarr")` |
| Highlight cells | `viewer.highlight_cells([1,2,3], "#ff0000")` |
| Change coloring | `viewer.set_color_by("cell_type")` |
| Hide cells | `viewer.set_visibility([1,2,3], False)` |
| Reset view | `viewer.reset_view()` |
| Clear highlights | `viewer.clear_highlights()` |
| Stop server | `viewer.stop()` |

## Key Points

1. **`show()`** - For pre-exported data (best performance)
2. **`show_anndata()`** - For direct AnnData visualization (most convenient)
   - Works with in-memory AnnData, h5ad files, and zarr stores
3. **All viewers support programmatic control** - highlight, color, filter
4. **Always call `viewer.stop()`** when done to free resources

---

## Related Notebooks

- [01_loading_options_overview.ipynb](01_loading_options_overview.ipynb) - All 14 options
- [02_local_demo_tutorial.ipynb](02_local_demo_tutorial.ipynb) - GitHub Pages hosting
- [04_server_tutorial.ipynb](04_server_tutorial.ipynb) - CLI and Python servers