# Tutorial for CITE-seq data

More generally, a dataset containing two observation-by-feature matrices:

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTnc5F8W-GAfJcYclSArNX2wixfiPGafVYoM0bDBY28-uepJ4C7CL4o6wttH0lco696NHxMEpzFXdUq/pub?w=1373&h=569"/>

For CITE-seq, Feature 1 is RNA and Feature 2 is antibody-derived tags (ADT).

We can consider this as analogous to SNARE-seq where Feature 2 is peaks rather than ADTs.

## 1. Convert data

The data used in this notebook can be obtained from https://satijalab.org/seurat/articles/multimodal_vignette.html.

We can use the `seurat_to_anndata_zarr` function from `vitessceR` to convert the Seurat object to an AnnData object [written to Zarr format](https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.write_zarr.html#anndata.AnnData.write_zarr).


```R
# Install R dependencies
install.packages("seurat")
install.packages("devtools")
devtools::install_github("mojaveazure/seurat-disk")
devtools::install_github("vitessce/vitessceR")
```

Note: the file paths in `read.csv` below will need to be updated based on the location of the raw data files.
```R
library(Seurat)

# Load in the RNA UMI matrix

# Note that this dataset also contains ~5% of mouse cells, which we can use as negative
# controls for the protein measurements. For this reason, the gene expression matrix has
# HUMAN_ or MOUSE_ appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "/Users/mkeller/Downloads/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",",
                               header = TRUE, row.names = 1))

# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)

# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "/Users/mkeller/Downloads/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",",
                               header = TRUE, row.names = 1))

# Note that all operations below are performed on the RNA assay Set and verify that the
# default assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay

# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)

# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"

# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")


vitessceR::seurat_to_anndata_zarr(cbmc, file.path("data", "multimodal_vignette.rna.h5ad.zarr"), assay = "RNA")
vitessceR::seurat_to_anndata_zarr(cbmc, file.path("data", "multimodal_vignette.adt.h5ad.zarr"), assay = "ADT")
```

## 1.2. Check the converted data objects (optional)

In [2]:
from anndata import read_zarr
from os.path import join

We can open the converted AnnData objects in Python and check their contents to ensure the conversion was successful.

In [5]:
rna_zarr = join("data", "multimodal_vignette.rna.h5ad.zarr")
adt_zarr = join("data", "multimodal_vignette.adt.h5ad.zarr")

In [6]:
rna_adata = read_zarr(rna_zarr)
rna_adata

AnnData object with n_obs × n_vars = 8617 × 2000
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'RNA_snn_res.0.8', 'seurat_clusters'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'

In [7]:
adt_adata = read_zarr(adt_zarr)
adt_adata

AnnData object with n_obs × n_vars = 8617 × 13
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_ADT', 'nFeature_ADT', 'RNA_snn_res.0.8', 'seurat_clusters'
    var: 'features'
    obsm: 'X_umap'

## 2. Configure visualization

Next, we configure the visualization using the `vitessce` python package.

In [30]:
from vitessce import (
    VitessceConfig,
    ViewType as vt,
    CoordinationType as ct,
    FileType as ft,
    AnnDataWrapper,
    OmeTiffWrapper,
)

### Instantiate a `VitessceConfig` object

We begin the configuration process by creating an object using the `VitessceConfig` constructor. This takes three parameters:
- `schema_version`: the schema version that the configuration should conform to. Valid values can be found at http://vitessce.io/docs/view-config-json/#version. The current most recent version `1.0.15` (as of 1/16/2023) is used in this tutorial.
- `name`: a name for the configuration.
- `description` (optional): a brief description of the configuration.


In [23]:
vc = VitessceConfig(schema_version="1.0.15", name='Multimodal seurat object', description='RNA+ADT')

### Add data

To add data to the configuration, we first run [add_dataset](https://vitessce.github.io/vitessce-python/api_config.html#vitessce.config.VitessceConfig.add_dataset) which takes the dataset `name` as a parameter.

This returns a new `VitessceConfigDataset` instance. We can add objects which represent local data such as AnnData stores by running [add_object](https://vitessce.github.io/vitessce-python/api_config.html#vitessce.config.VitessceConfigDataset.add_object) on the dataset instance. To enable multiple `add_object` calls to be chained together, the `add_object` function also returns the `VitessceConfigDataset` instance.

We will store the `VitessceConfigDataset` instance in a variable (`dataset`) to use later when configuring views.

__Note__: the functions used in the following cells `.add_dataset`, `.add_object`, and `.add_view` have side effects (i.e., they modify the `vc` object). Running these cells more than once on the same `vc` instance may result in an unexpected configuration, so be sure to run the cells in order starting from the initial `vc = VitessceConfig(...)` cell.

In [24]:
dataset = vc.add_dataset(name='RNA+ADT').add_object(AnnDataWrapper(
    # We run add_object with adata_path=rna_zarr first to add the cell-by-gene matrix and associated metadata.
    adata_path=rna_zarr,
    obs_embedding_paths=["obsm/X_umap", "obsm/X_pca"],
    obs_embedding_names=["UMAP", "PCA"],
    obs_set_paths=["obs/seurat_clusters"],
    obs_set_names=["Seurat Clusters"],
    obs_feature_matrix_path="X",
    # To be explicit that the features represent genes and gene expression, we specify that here.
    coordination_values={
        "featureType": "gene",
        "featureValueType": "expression"
    }
)).add_object(AnnDataWrapper(
    # We next run add_object with adata_path=adt_zarr to add the cell-by-ADT matrix and associated metadata.
    adata_path=adt_zarr,
    obs_embedding_paths=["obsm/X_umap"],
    obs_embedding_names=["UMAP"],
    obs_set_paths=["obs/seurat_clusters"],
    obs_set_names=["Seurat Clusters"],
    obs_feature_matrix_path="X",
    # If the features do not represent genes and gene expression, we specify alternate values here.
    coordination_values={
        "featureType": "tag",
        "featureValueType": "count"
    }
))

### Add views

Next, we configure the visualization and controller views of interest. Based on the data available in the dataset, we might want to add the following view types:
- two UMAP `SCATTERPLOT` views (one for genes and one for ADTs)
- two lists for selecting features (one for genes and one for ADTs), using the `FEATURE_LIST` view type
- two `HEATMAP` views, to visualize the two cell-by-feature matrices

In [25]:
umap_scatterplot_by_rna = vc.add_view(vt.SCATTERPLOT, dataset=dataset, mapping="UMAP")
umap_scatterplot_by_adt = vc.add_view(vt.SCATTERPLOT, dataset=dataset, mapping="UMAP")

gene_list = vc.add_view(vt.FEATURE_LIST, dataset=dataset)
protein_list = vc.add_view(vt.FEATURE_LIST, dataset=dataset)

rna_heatmap = vc.add_view(vt.HEATMAP, dataset=dataset).set_props(transpose=True)
adt_heatmap = vc.add_view(vt.HEATMAP, dataset=dataset).set_props(transpose=True)

### Coordinate views

Views can be linked on different properties by using the [link_views](https://vitessce.github.io/vitessce-python/api_config.html#vitessce.config.VitessceConfig.link_views) function.

In [26]:
# We need to specify which of the two features (i.e., genes or tags) the different plots correspond to.
# We also need to make sure the selection of genes and tags are scoped to only the corresponding plots,
# and we want to make sure the color mappings are independent for each modality.
coordination_types = [ct.FEATURE_TYPE, ct.FEATURE_VALUE_TYPE, ct.FEATURE_SELECTION, ct.OBS_COLOR_ENCODING, ct.FEATURE_VALUE_COLORMAP_RANGE]
vc.link_views([umap_scatterplot_by_rna, gene_list, rna_heatmap], coordination_types, ["gene", "expression", None, 'cellSetSelection', [0.0, 0.3]])
vc.link_views([umap_scatterplot_by_adt, protein_list, adt_heatmap], coordination_types, ["tag", "count", None, 'cellSetSelection', [0.0, 1.0]])

# We can link the two scatterplots on their zoom level and (X,Y) center point so that zooming/panning is coordinated.
vc.link_views([umap_scatterplot_by_rna, umap_scatterplot_by_adt], [ct.EMBEDDING_ZOOM, ct.EMBEDDING_TARGET_X, ct.EMBEDDING_TARGET_Y], [3, 0, 0])

<vitessce.config.VitessceConfig at 0x7fcae239be50>

In [29]:
# We define a layout for the plots using two rows.
# In the first row, we add the three gene-related visualizations,
# and in the second row, we add the three ADT-related visualizations.
vc.layout(
    (rna_heatmap | (umap_scatterplot_by_rna | gene_list))
    / (adt_heatmap | (umap_scatterplot_by_adt | protein_list))
);

### Render the widget into the notebook

To render the interactive visualization into the notebook, we run the [widget](https://vitessce.github.io/vitessce-python/api_config.html#vitessce.config.VitessceConfig.widget) function. 

In [28]:
vw = vc.widget()
vw

VitessceWidget(config={'version': '1.0.15', 'name': 'Multimodal seurat object', 'description': 'RNA+ADT', 'dat…