## Imports

In [None]:
import scanpy as sc
import anndata
from anndata import AnnData
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import scvi
import shutil
from utils import *
import mplscience

# plotting command for the styling
sns.reset_orig()

## Align libraries
👉 align_libraries.py

## Process libraries
👉 process_libraries.py

## Integrate libraries
👉 integrate_libraries.py

## Visualization & Analysis

Here are some things that you could do with your integrated data.

In [None]:
# TODO Provide the ful path to the integrated data
adata = anndata.read_h5ad("your_absolute_path_to_the_integrated_data.h5ad")

# put total counts on log scale for visualization
adata.obs["log_total_count"] = np.log1p(adata.obs["total_counts"])

In [None]:
adata

We can start by visualizing the scvi-generated UMAP (and the PCA-generated one, which is not batch corrected, if you want).  
If all went well, in theory we should see good batch mixing (thanks to batch effect correction) with scvi.

In [None]:
with mplscience.style_context():
    adata.obsm["X_umap"] = adata.obsm["X_umap_pca"]
    sc.pl.umap(adata, color=["library_id", "log_total_count"], wspace=0.5)
    adata.obsm["X_umap"] = adata.obsm["X_umap_scvi"]
    sc.pl.umap(adata, color=["library_id", "log_total_count"], wspace=0.5)

In [None]:
# Visualize leiden clusters computed on the scvi latent space
sc.pl.umap(adata, color = "scvi_leiden")

Next you might want to run differential expression (with scvi or scanpy) to find marker genes for each scvi leiden cluster in your dataset.  
- scvi DE tutorial: https://docs.scvi-tools.org/en/stable/tutorials/notebooks/api_overview.html#Differential-expression
- scanpy DE tutorial: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html#Finding-marker-genes