# Local & Remote Demo Tutorial (GitHub Pages Hosting)

This tutorial shows how to host Cellucid visualizations on GitHub Pages for public sharing.

## Options 1-2: Local & Remote Demo

| Property | Option 1: Local | Option 2: Remote |
|----------|-----------------|------------------|
| Data Format | Pre-exported binary | Pre-exported binary |
| Python Required | No (after export) | No (after export) |
| Lazy Loading | Yes | Yes |
| Performance | Best | Best |
| Access Method | Clone repo + file:// | GitHub Pages URL |

## When to Use

- Public demos for papers/presentations
- Sharing datasets with collaborators
- Permanent, shareable visualizations
- No Python required for viewers

## Prerequisites

- Python with cellucid installed
- AnnData with UMAP embeddings
- GitHub account (for hosting)

In [None]:
# Setup
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

## Step 1: Prepare Your AnnData

Your AnnData must have UMAP embeddings in `obsm`. The shape requirements:

| Key | Shape | Description |
|-----|-------|-------------|
| `X_umap_3d` | (n_cells, 3) | 3D UMAP coordinates |
| `X_umap_2d` | (n_cells, 2) | 2D UMAP coordinates |
| `X_umap` | (n_cells, 2 or 3) | Auto-detected |

Optional but recommended:
- `obs`: Cell metadata (categorical and continuous)
- `X`: Gene expression matrix (dense or sparse)
- `obsp['connectivities']`: KNN graph edges

In [None]:
import numpy as np
import pandas as pd
import anndata as ad
from scipy import sparse

# Example: Create properly structured AnnData
n_cells = 10000
n_genes = 2000

# Sparse expression matrix (typical for scRNA-seq)
X = sparse.random(n_cells, n_genes, density=0.1, format='csr', dtype=np.float32)

adata = ad.AnnData(X)
adata.var_names = [f"Gene_{i}" for i in range(n_genes)]
adata.obs_names = [f"Cell_{i}" for i in range(n_cells)]

# REQUIRED: 3D UMAP coordinates
adata.obsm['X_umap_3d'] = np.random.randn(n_cells, 3).astype(np.float32) * 5

# Optional: 2D UMAP for comparison
adata.obsm['X_umap_2d'] = np.random.randn(n_cells, 2).astype(np.float32) * 5

# Categorical metadata
adata.obs['cell_type'] = pd.Categorical(
    np.random.choice(['T cell', 'B cell', 'Macrophage', 'NK cell'], n_cells)
)
adata.obs['cluster'] = pd.Categorical(
    np.random.choice([f'Cluster_{i}' for i in range(12)], n_cells)
)

# Continuous metadata
adata.obs['n_counts'] = np.random.poisson(5000, n_cells)
adata.obs['percent_mt'] = np.random.uniform(0, 10, n_cells)

print(f"AnnData structure:")
print(f"  Cells: {adata.n_obs:,}")
print(f"  Genes: {adata.n_vars:,}")
print(f"  obsm keys: {list(adata.obsm.keys())}")
print(f"  obs columns: {list(adata.obs.columns)}")

## Step 2: Export for Web

Use `prepare()` to create optimized binary files:

In [None]:
from cellucid import prepare

# Export with all optimizations enabled
OUTPUT_DIR = "./demo_export"

# Uncomment to actually export:
# prepare(
#     adata,
#     output_dir=OUTPUT_DIR,
#     dataset_name="Demo Dataset",
#     description="Example dataset for GitHub Pages hosting",
#     
#     # Compression (essential for web hosting)
#     compress=True,           # gzip all binary files
#     quantize_obs=True,       # 8/16-bit quantization for obs
#     quantize_var=True,       # quantize gene expression
#     
#     # Include all features
#     export_connectivity=True,
# )

print(f"Export directory: {OUTPUT_DIR}")

## Step 3: Export File Structure

After export, you'll have:

```
demo_export/
├── dataset_identity.json      # Dataset metadata
├── obs_manifest.json          # Cell metadata schema
├── var_manifest.json          # Gene expression schema
├── connectivity_manifest.json # KNN edge schema (if included)
├── points_3d.bin.gz           # 3D coordinates (gzipped)
├── points_2d.bin.gz           # 2D coordinates (if available)
├── obs/                       # Cell metadata binaries
│   ├── cell_type.codes.u8.bin.gz
│   ├── cell_type.outliers.f32.bin.gz
│   ├── cluster.codes.u8.bin.gz
│   ├── n_counts.values.f32.bin.gz
│   └── ...
├── var/                       # Gene expression binaries
│   ├── Gene_0.values.f32.bin.gz
│   └── ...
└── connectivity/              # KNN edges (if included)
    ├── edges.src.bin.gz
    └── edges.dst.bin.gz
```

## Step 4: Host on GitHub Pages

### Option A: Project Repository

```bash
# Create repository structure
my-project/
├── docs/
│   └── datasets/
│       └── demo_export/  # Your exported data
│           ├── dataset_identity.json
│           └── ...
└── README.md

# Push to GitHub
git add .
git commit -m "Add Cellucid dataset"
git push

# Enable GitHub Pages: Settings > Pages > Source: main, /docs
```

### Option B: Dedicated Dataset Repository

```bash
# Create repository with just the data
my-dataset/
├── dataset_identity.json
├── obs_manifest.json
├── var_manifest.json
├── points_3d.bin.gz
├── obs/
└── var/

# Push and enable Pages from root
```

## Step 5: Share the URL

After enabling GitHub Pages, share your visualization:

```
https://cellucid.com?remote=https://username.github.io/my-project/datasets/demo_export
```

### URL Parameters

| Parameter | Description | Example |
|-----------|-------------|--------|
| `remote` | URL to your dataset | `https://username.github.io/data` |
| `dataset` | Dataset ID (if hosting multiple) | `demo` |
| `color` | Initial color field | `cell_type` |

## Step 6: Test Locally First

Before pushing to GitHub, test locally:

In [None]:
from cellucid import show

# Test the exported data locally
# viewer = show(OUTPUT_DIR, height=500)

# Or use command line:
# $ cellucid serve ./demo_export

print("Test locally with: cellucid serve ./demo_export")

## Size Optimization Tips

For large datasets, optimize file sizes:

| Optimization | Command | Reduction |
|--------------|---------|----------|
| gzip compression | `compress=True` | 3-5x |
| 8-bit quantization | `quantize_obs=True` | 4x |
| Gene quantization | `quantize_var=True` | 4x |

### Typical Sizes

| Dataset Size | h5ad | Exported (optimized) |
|--------------|------|----------------------|
| 10k cells | ~50 MB | ~5 MB |
| 100k cells | ~500 MB | ~50 MB |
| 1M cells | ~5 GB | ~500 MB |

## Troubleshooting

### "CORS error" when loading

GitHub Pages adds CORS headers automatically. If using a different host:
- Ensure `Access-Control-Allow-Origin: *` header is set

### "File not found" errors

- Verify all files were pushed to GitHub
- Check case sensitivity in filenames
- Wait a few minutes after pushing (Pages can be slow to update)

### Large files rejected by GitHub

GitHub has a 100 MB file limit. For large datasets:
- Use Git LFS: `git lfs track "*.bin.gz"`
- Or use a dedicated file host (S3, Cloudflare R2)

---

## Summary

1. **Prepare AnnData** with UMAP in `obsm`
2. **Export** with `prepare(compress=True, quantize_*=True)`
3. **Test locally** with `cellucid serve` or `show()`
4. **Push to GitHub** and enable Pages
5. **Share URL**: `https://cellucid.com?remote=<your-github-pages-url>`

### Option 1 vs Option 2

| Option | How to Access |
|--------|---------------|
| **Local Demo** | Clone repo, open `index.html` via `file://` |
| **Remote Demo** | Use `https://cellucid.com?remote=<url>` |

---

## Next: [03_browser_file_picker_tutorial.ipynb](03_browser_file_picker_tutorial.ipynb)