# Browser File Picker Tutorial

This tutorial covers using Cellucid directly in the browser without any Python setup.

## Options 3-5: Browser File Picker

### Option 3: Exported Data

| Property | Value |
|----------|-------|
| Data Format | Pre-exported binary |
| Python Required | No |
| Lazy Loading | Yes |
| Performance | Best |

### Option 4: h5ad File Directly

| Property | Value |
|----------|-------|
| Data Format | h5ad |
| Python Required | No |
| Lazy Loading | **No** |
| Performance | Slower |

### Option 5: zarr Store Directly

| Property | Value |
|----------|-------|
| Data Format | zarr |
| Python Required | No |
| Lazy Loading | **No** |
| Performance | Slower |

## When to Use

- Quick preview without installing Python
- Sharing data with non-technical collaborators
- Viewing data on machines without Python

---

## Using the Browser File Picker

### Step 1: Go to Cellucid

Open [https://cellucid.com](https://cellucid.com) in your browser.

### Step 2: Select Your Data

In the left panel, you'll see three buttons:

| Button | Function |
|--------|----------|
| **Folder** | Select a pre-exported directory |
| **.h5ad** | Select an h5ad file directly |
| **.zarr** | Select a zarr store directly |

### For Pre-exported Data (Recommended)

1. Click **"Folder"**
2. Navigate to your exported directory
3. Select the folder containing `dataset_identity.json`
4. Data loads with full lazy loading support

### For h5ad Files (Quick Preview)

1. Click **".h5ad"**
2. Select your `.h5ad` file
3. Wait for the file to load into browser memory
4. Explore your data

### For zarr Stores (Quick Preview)

1. Click **".zarr"**
2. Select your `.zarr` directory or file
3. Wait for the data to load into browser memory
4. Explore your data

---

## h5ad and zarr Requirements

For h5ad and zarr files to work in the browser, they need:

### Required

```python
# One of these UMAP embeddings in obsm:
adata.obsm['X_umap_3d']  # shape: (n_cells, 3) - recommended
adata.obsm['X_umap_2d']  # shape: (n_cells, 2)
adata.obsm['X_umap']     # shape: (n_cells, 2 or 3)
```

### Optional

```python
# Cell metadata for coloring/filtering
adata.obs  # DataFrame with categorical and continuous columns

# Gene expression for gene coloring
adata.X  # sparse or dense matrix
adata.var_names  # gene identifiers

# KNN connectivity for edge visualization
adata.obsp['connectivities']  # sparse matrix
```

### zarr-specific Notes

- zarr stores can be directories (`.zarr/`) or consolidated files
- AnnData zarr format stores data in chunked arrays
- Works with AnnData saved via `adata.write_zarr("data.zarr")`

---

## Browser h5ad/zarr Limitations

When loading h5ad or zarr files directly in the browser, there are important limitations:

### 1. No Lazy Loading

The **entire file** is loaded into browser memory before any data can be accessed.

**Why?** JavaScript cannot perform efficient partial reads from HDF5 files. For zarr, while the format supports chunking, browser-based access still requires loading significant portions into memory.

### 2. Higher Memory Usage

- No quantization (all values stored as full precision)
- Both CSR matrix indices and values in memory
- Browser memory limits apply (~2-4 GB on most systems)

### 3. Slower Gene Queries

- No pre-computed indices
- CSR to CSC conversion done on first gene query
- Each gene extraction requires array scanning

### 4. No Centroids

- Centroid computation requires loading all coordinates
- Skipped to avoid additional memory/time overhead

### Recommendations by Dataset Size

| Dataset Size | Recommendation |
|--------------|----------------|
| < 50k cells | Browser h5ad/zarr works well |
| 50k - 100k cells | Pre-export recommended |
| 100k - 500k cells | Use Python (`show_anndata()`) |
| > 500k cells | Pre-export required |

---

## Preparing h5ad for Browser Loading

If you want to use browser h5ad loading, prepare your file appropriately:

In [None]:
# Setup
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent
SRC_DIR = PROJECT_ROOT / "src"
if SRC_DIR.exists() and str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

In [None]:
import numpy as np
import pandas as pd
import anndata as ad
from scipy import sparse

# Create example data
n_cells = 10000
n_genes = 2000

# Sparse expression (CSR format - most common)
X = sparse.random(n_cells, n_genes, density=0.1, format='csr', dtype=np.float32)

adata = ad.AnnData(X)
adata.var_names = [f"Gene_{i}" for i in range(n_genes)]
adata.obs_names = [f"Cell_{i}" for i in range(n_cells)]

# REQUIRED: Add UMAP embeddings
adata.obsm['X_umap_3d'] = np.random.randn(n_cells, 3).astype(np.float32) * 5
adata.obsm['X_umap_2d'] = np.random.randn(n_cells, 2).astype(np.float32) * 5

# Add cell metadata
adata.obs['cell_type'] = pd.Categorical(
    np.random.choice(['T cell', 'B cell', 'Macrophage'], n_cells)
)
adata.obs['n_counts'] = np.random.poisson(5000, n_cells)

print(f"AnnData ready for browser loading:")
print(f"  Shape: {adata.shape}")
print(f"  X format: {type(adata.X).__name__}")
print(f"  obsm keys: {list(adata.obsm.keys())}")

In [None]:
# Estimate file size
import sys

x_size = adata.X.data.nbytes + adata.X.indices.nbytes + adata.X.indptr.nbytes
obsm_size = sum(arr.nbytes for arr in adata.obsm.values())
estimated_mb = (x_size + obsm_size) / 1024 / 1024

print(f"Estimated uncompressed size: {estimated_mb:.1f} MB")
print(f"h5ad file will be ~{estimated_mb * 0.5:.1f} MB (with HDF5 compression)")

if estimated_mb > 200:
    print("\n" + "="*50)
    print("WARNING: Large file detected!")
    print("Browser loading may be slow. Consider:")
    print("  1. Use show_anndata() in Python (lazy loading)")
    print("  2. Pre-export with prepare()")
    print("="*50)

In [None]:
# Save as h5ad
# adata.write("browser_demo.h5ad")
print("Save with: adata.write('browser_demo.h5ad')")

---

## Comparison: h5ad/zarr vs Pre-exported

| Feature | Browser h5ad/zarr | Pre-exported |
|---------|-------------------|---------------|
| Initial load | **Entire file** | Metadata only |
| Memory usage | High | Low |
| Gene query | 50-500ms | 10-50ms |
| Lazy loading | No | Yes |
| File size | Original size | 3-5x smaller |
| Setup required | None | Python export |

### When to Use Each

**Use h5ad/zarr:**
- Quick preview of small datasets
- One-time exploration
- No Python available

**Use pre-exported:**
- Large datasets (> 50k cells)
- Repeated use
- Sharing with others
- Production visualizations

### zarr vs h5ad

| Feature | zarr | h5ad |
|---------|------|------|
| Format | Directory or file | Single file |
| Chunking | Native | Limited |
| Cloud storage | Better | Limited |
| Portability | Multiple files | Single file |

---

## Troubleshooting

### "No UMAP embeddings found"

Your h5ad/zarr needs UMAP coordinates in `obsm`. Add them with scanpy:

```python
import scanpy as sc
sc.pp.neighbors(adata)
sc.tl.umap(adata, n_components=3)
adata.obsm['X_umap_3d'] = adata.obsm['X_umap']

# Save as h5ad
adata.write("with_umap.h5ad")

# Or save as zarr
adata.write_zarr("with_umap.zarr")
```

### Browser tab crashes or freezes

File is too large for browser memory. Options:
1. Use Python with `show_anndata()` (lazy loading)
2. Pre-export with `prepare()`
3. Subsample your data

### Very slow loading

Expected for large files. The entire h5ad/zarr must be:
1. Downloaded into browser memory
2. Parsed by h5wasm (h5ad) or zarr.js (zarr)
3. Converted to arrays

For faster loading, pre-export your data.

### Gene expression coloring is slow

First gene query triggers CSRâ†’CSC matrix conversion (one-time cost). Subsequent queries are faster.

---

## Summary

### Browser File Picker - Key Points

1. **No Python needed** - works directly in browser
2. **Three options:**
   - Folder button: Pre-exported data (recommended)
   - .h5ad button: Direct h5ad loading
   - .zarr button: Direct zarr loading
3. **h5ad/zarr limitations:**
   - Entire file loaded into memory
   - No lazy loading
   - Best for small datasets (< 50k cells)

### Quick Links

- [Cellucid Web Viewer](https://cellucid.com)
- [01_loading_options_overview.ipynb](01_loading_options_overview.ipynb) - All 14 options
- [02_local_demo_tutorial.ipynb](02_local_demo_tutorial.ipynb) - GitHub Pages hosting
- [04_server_tutorial.ipynb](04_server_tutorial.ipynb) - Python server options