# Cellucid Loading Options Overview

This notebook provides a comprehensive overview of all 14 data loading options in Cellucid.

## The 14 Loading Options

Cellucid supports 6 deployment modes, each with support for pre-exported binary data, h5ad files, and zarr stores:

| # | Method | Exported | h5ad | zarr | Python | Lazy Load | Performance |
|---|--------|----------|------|------|--------|-----------|-------------|
| 1 | Local Demo (GitHub) | ✅ | - | - | No* | Yes | Best |
| 2 | Remote Demo (GitHub) | ✅ | - | - | No* | Yes | Best |
| 3 | Browser File Picker | ✅ | - | - | No | Yes | Best |
| 4 | Browser File Picker | - | ✅ | - | No | **No** | Slower |
| 5 | Browser File Picker | - | - | ✅ | No | **No** | Slower |
| 6 | Server CLI | ✅ | - | - | Yes | Yes | Best |
| 7 | Server CLI | - | ✅ | ✅ | Yes | Yes | Good |
| 8 | Python serve() | ✅ | - | - | Yes | Yes | Best |
| 9 | Python serve_anndata() | - | ✅ | ✅ | Yes | Yes | Good |
| 10 | Jupyter show() | ✅ | - | - | Yes | Yes | Best |
| 11 | Jupyter show_anndata() | - | ✅ | ✅ | Yes | Yes | Good |

\* Python required for initial export, not for viewing

**Summary by method:**
| Method | Exported | h5ad | zarr | Total |
|--------|----------|------|------|-------|
| Local/Remote Demo | ✅ | - | - | 2 |
| Browser File Picker | ✅ | ✅ | ✅ | 3 |
| Server CLI | ✅ | ✅ | ✅ | 3 |
| Python serve | ✅ | ✅ | ✅ | 3 |
| Jupyter | ✅ | ✅ | ✅ | 3 |
| **Total** | | | | **14** |

### Key Notes:
- **Browser h5ad/zarr**: Entire file loaded into memory - no lazy loading due to JavaScript limitations
- **Python h5ad/zarr modes**: True lazy loading via AnnData backed mode (h5ad) or zarr's native chunked access
- **Pre-exported data**: Always fastest - use for production and sharing
- **zarr stores**: Can be a directory (.zarr) or a file - the Python server auto-detects the format

---

## Quick Decision Guide

```
Need to share publicly?
  |
  +-- YES --> Options 1-2: Local/Remote Demo (export + GitHub Pages)
  |
  +-- NO --> Have Python available?
                |
                +-- NO --> Options 3-5: Browser File Picker
                |           (Prefer exported for large datasets)
                |
                +-- YES --> Working in Jupyter?
                              |
                              +-- YES --> Options 10-11: Jupyter integration
                              |           show() or show_anndata()
                              |
                              +-- NO --> Need remote access?
                                          |
                                          +-- YES --> Options 6-7: Server CLI
                                          |
                                          +-- NO --> Options 8-9: Python serve
```

In [None]:
# Setup: ensure cellucid is importable
import sys
from pathlib import Path

# For development, add src to path
PROJECT_ROOT = Path.cwd().parent
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

---

# Options 1-2: Local & Remote Demo (GitHub Pages)

**Best for:** Public demos, sharing datasets, publications

| Property | Option 1: Local Demo | Option 2: Remote Demo |
|----------|---------------------|----------------------|
| Data Format | Pre-exported binary | Pre-exported binary |
| Python Required | No (after export) | No (after export) |
| Lazy Loading | Yes | Yes |
| Performance | Best | Best |
| How it works | Clone repo + file:// | GitHub Pages URL |

## Option 1: Local Demo

1. Export your data to binary format with `prepare()`
2. Host the exported files on GitHub
3. Users clone the repo and open locally via `file://`

## Option 2: Remote Demo

1. Export your data to binary format with `prepare()`
2. Host the exported files on GitHub Pages or any static file server
3. Users access via `https://cellucid.com?remote=<your-url>`

## Example

In [None]:
from cellucid import prepare
import anndata as ad

# Load your data
# adata = ad.read_h5ad("your_data.h5ad")

# Export for web hosting
# prepare(
#     adata,
#     output_dir="./docs/datasets/my_dataset",  # GitHub Pages serves from docs/
#     dataset_name="My Public Dataset",
#     compress=True,       # Essential for web
#     quantize_obs=True,   # Smaller files
#     quantize_var=True,   # Faster gene queries
# )

print("After export, push to GitHub and enable Pages.")
print("Access at: https://cellucid.com?remote=https://username.github.io/repo/datasets/my_dataset")

---

# Options 3-5: Browser File Picker

**Best for:** Quick preview, no Python needed

## Option 3: Exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | No |
| Lazy Loading | Yes |
| Performance | Best |

## Option 4: h5ad File Directly

| Property | Value |
|----------|-------|
| Data Format | h5ad |
| Python Required | No |
| Lazy Loading | **No** (browser limitation) |
| Performance | Slower (no quantization, full memory load) |

## Option 5: zarr Store Directly

| Property | Value |
|----------|-------|
| Data Format | zarr |
| Python Required | No |
| Lazy Loading | **No** (browser limitation) |
| Performance | Slower (no quantization, full memory load) |

## How it works

1. Go to [cellucid.com](https://cellucid.com)
2. Click the "Folder" button to select an exported directory, OR
3. Click the ".h5ad" button to select an h5ad file directly, OR
4. Click the ".zarr" button to select a zarr store directly

### Important: Browser h5ad/zarr Limitations

When loading h5ad or zarr in browser:
- **Entire file loaded into memory** - JavaScript cannot do partial HDF5/zarr reads efficiently
- **No quantization** - Higher memory usage than exported data
- **Slower gene queries** - No pre-computed indices
- **No centroids** - Performance tradeoff

**Recommendation:** For h5ad/zarr files > 100MB or > 100k cells, use Python-based options instead.

---

# Options 6-7: Server via Terminal (CLI)

**Best for:** Team sharing, remote server access

## Option 6: Serve Exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | Yes (server) |
| Lazy Loading | Yes |
| Performance | Best |

```bash
cellucid serve /path/to/export_dir --port 8765
```

## Option 7: Serve h5ad or zarr Directly

| Property | Value |
|----------|-------|
| Data Format | h5ad or zarr |
| Python Required | Yes (server) |
| Lazy Loading | **Yes** (backed mode for h5ad, chunked access for zarr) |
| Performance | Good |

```bash
# Serve h5ad file
cellucid serve-anndata /path/to/data.h5ad --port 8765

# Serve zarr store (directory or file)
cellucid serve-anndata /path/to/data.zarr --port 8765

# Options:
#   --no-browser    Don't auto-open browser
#   --no-backed     Load entire file into memory
#   --latent-key    Specify latent space for outlier quantiles
```

## Remote Server with SSH Tunnel

```bash
# On remote server
cellucid serve-anndata /data/large_dataset.h5ad --no-browser
# or
cellucid serve-anndata /data/large_dataset.zarr --no-browser

# On local machine
ssh -L 8765:localhost:8765 user@server

# Open in browser
# https://cellucid.com?remote=http://localhost:8765&anndata=true
```

---

# Options 8-9: Server via Python

**Best for:** Programmatic control, integration with analysis pipelines

## Option 8: Serve Exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | Yes |
| Lazy Loading | Yes |
| Performance | Best |

In [None]:
from cellucid import serve

# Serve pre-exported data
# serve("/path/to/export_dir", port=8765)
print("Use serve() to serve pre-exported data")

## Option 9: Serve h5ad, zarr, or In-Memory AnnData

| Property | Value |
|----------|-------|
| Data Format | h5ad, zarr, or in-memory AnnData |
| In-Memory AnnData | **Yes** |
| h5ad/zarr File Support | **Yes** |
| Python Required | Yes |
| Lazy Loading | Yes (backed mode for h5ad, chunked for zarr) |
| Performance | Good |

In [None]:
from cellucid import serve_anndata

# Serve h5ad file (lazy loading)
# serve_anndata("/path/to/data.h5ad", port=8765)

# Serve zarr store (lazy loading)
# serve_anndata("/path/to/data.zarr", port=8765)

# Or serve in-memory AnnData
# serve_anndata(adata, port=8765)

print("Use serve_anndata() for h5ad files, zarr stores, or in-memory AnnData")

---

# Options 10-11: Jupyter Integration

**Best for:** Interactive analysis, exploration, iterative workflows

## Option 10: Show Exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | Yes |
| Lazy Loading | Yes |
| Performance | Best |

In [None]:
from cellucid import show

# Show pre-exported data
# viewer = show("/path/to/export_dir", height=500)

print("Use show() for pre-exported data in Jupyter")

## Option 11: Show h5ad, zarr, or In-Memory AnnData

| Property | Value |
|----------|-------|
| Data Format | h5ad, zarr, or in-memory AnnData |
| In-Memory AnnData | **Yes** |
| h5ad/zarr File Support | **Yes** |
| Python Required | Yes |
| Lazy Loading | Yes (backed mode for h5ad, chunked for zarr) |
| Performance | Good |

In [None]:
from cellucid import show_anndata
import numpy as np
from scipy import sparse

# Create example AnnData
import anndata as ad

n_cells, n_genes = 1000, 500
X = sparse.random(n_cells, n_genes, density=0.1, format='csr', dtype=np.float32)
adata = ad.AnnData(X)
adata.obsm['X_umap_3d'] = np.random.randn(n_cells, 3).astype(np.float32)
adata.obs['cluster'] = np.random.choice(['A', 'B', 'C'], n_cells)
adata.var_names = [f"Gene_{i}" for i in range(n_genes)]

print(f"Created example AnnData: {adata}")

# Uncomment to visualize:
# viewer = show_anndata(adata, height=500)

In [None]:
# From h5ad file (lazy loading)
# viewer = show_anndata("/path/to/data.h5ad", height=500)

# From zarr store (lazy loading)
# viewer = show_anndata("/path/to/data.zarr", height=500)

print("Use show_anndata() for h5ad files, zarr stores, or in-memory AnnData in Jupyter")

## Jupyter Programmatic Interaction

All viewers returned by `show()` and `show_anndata()` support programmatic control:

In [None]:
# Example interaction (uncomment when viewer is displayed)

# Highlight specific cells
# viewer.highlight_cells([0, 1, 2, 3, 4], color="#ff0000")

# Change coloring field
# viewer.set_color_by("cluster")

# Hide/show cells
# viewer.set_visibility([0, 1, 2], visible=False)

# Reset camera view
# viewer.reset_view()

# Clear highlights
# viewer.clear_highlights()

# Stop the server when done
# viewer.stop()

---

# Summary: All 14 Options at a Glance

| # | Method | Export | h5ad | zarr | Python | Lazy | Perf |
|---|--------|--------|------|------|--------|------|------|
| 1 | Local Demo | ✅ | - | - | No* | Yes | Best |
| 2 | Remote Demo | ✅ | - | - | No* | Yes | Best |
| 3 | Browser Picker | ✅ | - | - | No | Yes | Best |
| 4 | Browser Picker | - | ✅ | - | No | No | Slow |
| 5 | Browser Picker | - | - | ✅ | No | No | Slow |
| 6 | CLI serve | ✅ | - | - | Yes | Yes | Best |
| 7 | CLI serve-anndata | - | ✅ | ✅ | Yes | Yes | Good |
| 8 | Python serve | ✅ | - | - | Yes | Yes | Best |
| 9 | Python serve | - | ✅ | ✅ | Yes | Yes | Good |
| 10 | Jupyter show | ✅ | - | - | Yes | Yes | Best |
| 11 | Jupyter show | - | ✅ | ✅ | Yes | Yes | Good |

\* Python required for initial export, but not for viewing

### zarr Notes
- **zarr stores** can be a directory (`.zarr/`) or a consolidated file
- Python server auto-detects the format
- zarr provides native chunked access for efficient lazy loading
- Ideal for cloud storage scenarios (S3, GCS) with remote zarr stores

---

## Related Notebooks

- [02_local_demo_tutorial.ipynb](02_local_demo_tutorial.ipynb) - GitHub Pages hosting
- [03_browser_file_picker_tutorial.ipynb](03_browser_file_picker_tutorial.ipynb) - No-Python option
- [04_server_tutorial.ipynb](04_server_tutorial.ipynb) - CLI and Python servers
- [05_jupyter_tutorial.ipynb](05_jupyter_tutorial.ipynb) - Jupyter integration