# Setup

# MissionBio Tapestri Single-Cell Analysis with EspressoPro

This notebook demonstrates a comprehensive single-cell protein analysis pipeline using MissionBio Tapestri data. The analysis leverages EspressoPro for automated cell type annotation and includes advanced refinement techniques.

## Key Features:
- **Data**: Multi-sample PBMC dataset (HD01 and HD02 samples)
- **Analysis**: Complete protein-based single-cell characterization
- **Methods**: UMAP dimensionality reduction, graph-based clustering, automated annotation
- **Refinement**: Small cluster handling, mixed cluster resolution, cluster-based improvements, mast and/or custom cell type detection

## Workflow Overview:
1. Data loading and setup
2. Sample-wise analysis (normalization, dimensionality reduction, clustering)
3. EspressoPro automated cell type prediction
4. Multi-step annotation refinement
5. Final visualization and validation

## Loading modules

In [None]:
import missionbio.mosaic as ms
import os
import espressopro as ep
import anndata as ad
import scanpy as sc
import pandas as pd
import numpy as np
import random
import plotly.graph_objects as go
import copy

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rc_context
import scanpy as sc
from anndata import AnnData

In [None]:
import warnings
import pandas as pd

# silence globally
warnings.filterwarnings("ignore", category=pd.errors.PerformanceWarning)

## Setting seed

conda env config vars set PYTHONHASHSEED=0

In [None]:
os.environ['PYTHONHASHSEED'] = '0'
random.seed(42)
np.random.seed(42)

In [None]:
def ensure_pythonhashseed(seed=0):
    current_seed = os.environ.get("PYTHONHASHSEED")

    seed = str(seed)
    if current_seed is None or current_seed != seed:
        print(f'Setting PYTHONHASHSEED="{seed}"')
        os.environ["PYTHONHASHSEED"] = seed
        # restart the current process
        os.execl(sys.executable, sys.executable, *sys.argv)

In [None]:
import random

hash = random.getrandbits(128)

print("hash value: %032x" % hash)

# Loading data

In [None]:
PBMC_samples = ms.load_example_dataset(path="Multisample PBMC", single=False)

## Data Loading

Loading the MissionBio example multi-sample PBMC dataset. This dataset contains protein expression measurements from multiple healthy donor samples (HD01, HD02) measured using the Tapestri platform.

# PBMC - HD01 Analysis

In [None]:
PBMC_HD01 = PBMC_samples.samples[0]

In [None]:
PBMC_HD01

## Remove non variable ADTs

In [None]:
PBMC_HD01.protein = PBMC_HD01.protein.drop(['IgG1', 'IgG2a', 'IgG2b'])

## Normalisation

Performing two essential preprocessing steps:
1. **Normalization**: Corrects for technical variations and library size differences between cells using ADT CLR normalisation with Seurat flavour (akin to Muon ADT normalisation)
2. **Scaling**: Standardizes protein expression values to enable proper dimensionality reduction and downstream analysis

These steps ensure that subsequent dimensionality reduction and clustering are not biased by technical artifacts.

In [None]:
ep.Normalise_protein_data(PBMC_HD01, inplace=True, axis=1, flavor="seurat")
ep.Scale_protein_data(PBMC_HD01, inplace=True)

## Dimentionality reduction

In [None]:
PBMC_HD01.protein.run_pca(attribute='Scaled_reads', components=45,show_plot=True, random_state=42, svd_solver='randomized')

We retain only those components that explain the majority of the variance, selecting the number of components at the point where additional components yield diminishing returns, as indicated by a plateau in variance explained.

In [None]:
PBMC_HD01.protein.run_pca(attribute='Scaled_reads', components=8, show_plot=False, random_state=42, svd_solver='randomized')

UMAP (Uniform Manifold Approximation and Projection) reduces the high-dimensional protein data to 2D for visualization while preserving local neighborhood structure. 

**Parameters used:**
- `n_neighbors=50`: Larger neighborhood for global structure preservation
- `min_dist=0.1`: Allows for tight clustering of similar cells  
- `spread=8`: Broader distribution of points in embedding space
- `random_state=42`: Ensures reproducible results

In [None]:
PBMC_HD01.protein.run_umap(attribute='pca', random_state=42, n_neighbors=50, min_dist=0.1, spread=8, n_components=2)

## Unsupervised clustering

In [None]:
PBMC_HD01.protein.cluster(attribute='umap', method='graph-community', k=5, random_state=42) 

In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='label')
go.Figure(fig)

In [None]:
PBMC_HD01.protein.row_attrs['Clusters'] = copy.copy(PBMC_HD01.protein.row_attrs['label'])

## EspressoPro predictions and annotation

EspressoPro uses machine learning models trained on extensive reference datasets to automatically predict cell types based on protein expression profiles. 

**Process:**
1. `generate_predictions()`: Creates probability scores for each potential cell type
2. `annotate_data()`: Assigns final cell type labels based on highest confidence predictions

**Output annotations:**
- `Simplified.Celltype`: Broad cell type categories (e.g., T cells, B cells, Monocytes)
- `Detailed.Celltype`: Specific cell subtypes (e.g., CD4+ T cells, CD8+ T cells, NK cells)

In [None]:
# Examples
PBMC_HD01 = ep.generate_predictions(obj=PBMC_HD01)
PBMC_HD01 = ep.annotate_data(obj=PBMC_HD01)

In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='Averaged.Simplified.Celltype')
go.Figure(fig)

In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype')
go.Figure(fig)

## Mark rare celltypes as "small"

Small-Cluster Flagging (Size-Based)

This step identifies **very small clusters** that are likely technical artifacts or rare, unreliable events and flags them as **Small**.

**Method:**
- For each cluster, count the number of cells
- If the cluster size is **below the minimum cell threshold** (e.g. < 3 cells),
  label **all cells in that cluster** as *Small*
- Otherwise, keep the original cell type labels
- Results are written to a new output column

**Why use this:**
- Reduces noise from spurious or low-support clusters  
- Prevents over-interpretation of rare events  
- Improves robustness and clarity of downstream analyses and visualizations  

In [None]:
PBMC_HD01 = ep.mark_small_clusters(
    PBMC_HD01,
    label_in="Averaged.Detailed.Celltype",
    label_out="Averaged.Detailed.Celltype.Small",
    cluster_col="Clusters",
    min_cells=3,
)

In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Small')
go.Figure(fig)

In [None]:
PBMC_HD01.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Small')

## Mark clusters with mixed annotation as "mixed"

Cluster-Level Mixed-Label Detection (Frequency-Based)

This step identifies clusters that do **not** have a dominant cell type and flags them as **Mixed**.

**Method:**
- For each cluster, count cell type labels
- Compute the frequency of the most common label
- If this frequency is **below the threshold**, label **all cells in the cluster** as *Mixed*
- Otherwise, keep the original cell type labels
- Results are written to a new output column

**Why use this:**
- Explicitly flags heterogeneous clusters  
- Prevents over-interpreting mixed populations  
- Provides a clean checkpoint before downstream refinement or consensus steps

In [None]:
PBMC_HD01 = ep.mark_mixed_clusters(
    PBMC_HD01,
    label_in="Averaged.Detailed.Celltype.Small",
    label_out="Averaged.Detailed.Celltype.Mixed",
    cluster_col="Clusters",
    min_frequency_threshold=0.35,
)


In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Mixed')
go.Figure(fig)

In [None]:
PBMC_HD01.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Mixed')

## Refine and expand annotation using unsupervised data

Cluster-Level Cell Type Assignment (Frequency-Based)

This step assigns a **single cell type per cluster** based on label frequency.

**Method:**
- For each cluster, count cell type labels
- If the most frequent label exceeds the dominance threshold, assign it to all cells
- Otherwise, label the entire cluster as **Mixed** (same as *mark_mixed_clusters*)
- Results are written to the output label column

**Why use this:**
- Enforces cluster consistency  
- Flags heterogeneous clusters clearly

In [None]:
PBMC_HD01, summary, pivot = ep.suggest_cluster_celltype_identity(
    sample=PBMC_HD01,
    dominance_threshold=0.35,
    label_in="Averaged.Detailed.Celltype.Mixed",
    cluster_col="Clusters",
    label_out="Averaged.Detailed.Celltype.Refined",
    rewrite=True,
    verbose=True
)

In [None]:
PBMC_HD01.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Refined')

## Annotate custom celltypes using known signatures

Signature-Based Cell Type Annotation (Mast / Custom)

This step identifies cells expressing a **specific protein marker signature** using pyUCell scores and probabilistic modeling.

**Method:**

* Compute a pyUCell signature score from positive (and optional negative) markers
* Model the score distribution with an **adaptive GMM**
* Select cells from the **right-hand (high-score) component** with high posterior probability
* Relabel only those cells as the target cell type
* Optionally plot the GMM fit for QC

**Why use this:**

* Reliably detects rare or subtle populations
* Avoids hard score thresholds
* Adapts to both bimodal and skewed (tail-heavy) distributions
* Provides interpretable QC plots


In [None]:
PBMC_HD01 = ep.add_mast_annotation(
    PBMC_HD01,
    layer="Normalized_reads",
    label_in='Averaged.Detailed.Celltype.Refined',
    label_out='Averaged.Detailed.Celltype.Final',
    plot_gmm=True,
    tail_q=0.80,
    plot_title="Mast signature score — adaptive GMM",
    verbose=True
)

In [None]:
# -------- Build minimal AnnData --------
coords = np.asarray(PBMC_HD01.protein.row_attrs["umap"])
if coords.ndim != 2 or coords.shape[1] < 2:
    raise ValueError(f"row_attrs['umap'] must be (n_cells, >=2). Got {coords.shape}")
coords = coords[:, :2]

df_protein = PBMC_HD01.protein.get_attribute("Normalized_reads", constraint="row+col")

pbmc = AnnData(X=np.asarray(df_protein.values))
pbmc.obs_names = df_protein.index.astype(str)
pbmc.var_names = df_protein.columns.astype(str)
pbmc.obsm["X_umap"] = coords

# add score column
pbmc.obs["Mast_signature_score"] = np.asarray(PBMC_HD01.protein.row_attrs["Mast_signature_score"], dtype=float)

# -------- Plot (1x1) --------
with rc_context({"figure.figsize": (4.35, 4.35)}):
    fig, ax = plt.subplots(1, 1, figsize=(4.35, 4.35))

    sc.pl.umap(
        pbmc,
        color="Mast_signature_score",
        add_outline=True,
        cmap="magma",
        frameon=False,
        size=50,
        alpha=0.9,
        title="",
        ax=ax,
        show=False,
    )

    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_xticks([])
    ax.set_yticks([])

    fig.tight_layout()

    plt.show()
    plt.close(fig)

In [None]:
PBMC_HD01 = ep.add_signature_annotation(
    PBMC_HD01,
    layer="Normalized_reads",
    label_in='Averaged.Detailed.Celltype.Final',
    label_out='Averaged.Detailed.Celltype.Final',
    positive_markers=["CD14", "CD33", "CD11b", "CD64"],
    negative_markers="",
    cell_type_label="CD14_Mono",
    verbose=True,
    plot_gmm=True,
    plot_title="CD14 Mono signature score — adaptive GMM",
    tail_q=0.80,
)

In [None]:
# -------- Build minimal AnnData --------
coords = np.asarray(PBMC_HD01.protein.row_attrs["umap"])
if coords.ndim != 2 or coords.shape[1] < 2:
    raise ValueError(f"row_attrs['umap'] must be (n_cells, >=2). Got {coords.shape}")
coords = coords[:, :2]

df_protein = PBMC_HD01.protein.get_attribute("Normalized_reads", constraint="row+col")

pbmc = AnnData(X=np.asarray(df_protein.values))
pbmc.obs_names = df_protein.index.astype(str)
pbmc.var_names = df_protein.columns.astype(str)
pbmc.obsm["X_umap"] = coords

# add score column
pbmc.obs["CD14_Mono_signature_score"] = np.asarray(PBMC_HD01.protein.row_attrs["CD14_Mono_signature_score"], dtype=float)

# -------- Plot (1x1) --------
with rc_context({"figure.figsize": (4.35, 4.35)}):
    fig, ax = plt.subplots(1, 1, figsize=(4.35, 4.35))

    sc.pl.umap(
        pbmc,
        color="CD14_Mono_signature_score",
        add_outline=True,
        cmap="magma",
        frameon=False,
        size=50,
        alpha=0.9,
        title="",
        ax=ax,
        show=False,
    )

    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_xticks([])
    ax.set_yticks([])

    fig.tight_layout()

    plt.show()
    plt.close(fig)

## Final visualization of refined cell type annotations

This UMAP plot displays the final refined cell type annotations after all processing steps including:
- Initial EspressoPro predictions
- Removal of small clusters (< 3 cells)
- Identification of mixed clusters
- Annotation refinement leveraging unsupervised clusters
- Addition of mast and custom cell type annotations

The `Averaged.Detailed.Celltype.Final` field represents the most accurate cell type assignments for each cell.

In [None]:
fig = PBMC_HD01.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Final')
go.Figure(fig)

In [None]:
PBMC_HD01.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Final')

In [None]:
PBMC_HD01 = ep.clear_annotation(PBMC_HD01)

In [None]:
PBMC_HD01.protein.row_attrs

# PBMC - HD02 Analysis

In [None]:
PBMC_HD02 = PBMC_samples.samples[1]

In [None]:
PBMC_HD02

## Normalisation

Performing two essential preprocessing steps:
1. **Normalization**: Corrects for technical variations and library size differences between cells using ADT CLR normalisation with Seurat flavour (akin to Muon ADT normalisation)
2. **Scaling**: Standardizes protein expression values to enable proper dimensionality reduction and downstream analysis

These steps ensure that subsequent dimensionality reduction and clustering are not biased by technical artifacts.

In [None]:
ep.Normalise_protein_data(PBMC_HD02, inplace=True, axis=1, flavor="seurat")
ep.Scale_protein_data(PBMC_HD02, inplace=True)

## Dimentionality reduction

In [None]:
PBMC_HD02.protein.run_pca(attribute='Scaled_reads', components=45,show_plot=True, random_state=42, svd_solver='randomized')

We retain only those components that explain the majority of the variance, selecting the number of components at the point where additional components yield diminishing returns, as indicated by a plateau in variance explained.

In [None]:
PBMC_HD02.protein.run_pca(attribute='Scaled_reads', components=8, show_plot=False, random_state=42, svd_solver='randomized')

UMAP (Uniform Manifold Approximation and Projection) reduces the high-dimensional protein data to 2D for visualization while preserving local neighborhood structure. 

**Parameters used:**
- `n_neighbors=50`: Larger neighborhood for global structure preservation
- `min_dist=0.1`: Allows for tight clustering of similar cells  
- `spread=8`: Broader distribution of points in embedding space
- `random_state=42`: Ensures reproducible results

In [None]:
PBMC_HD02.protein.run_umap(attribute='pca', random_state=42, n_neighbors=50, min_dist=0.1, spread=8, n_components=2)

## Unsupervised clustering

In [None]:
PBMC_HD02.protein.cluster(attribute='umap', method='graph-community', k=5, random_state=42) 

In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='label')
go.Figure(fig)

In [None]:
PBMC_HD02.protein.row_attrs['Clusters'] = copy.copy(PBMC_HD02.protein.row_attrs['label'])

## EspressoPro predictions and annotation

EspressoPro uses machine learning models trained on extensive reference datasets to automatically predict cell types based on protein expression profiles. 

**Process:**
1. `generate_predictions()`: Creates probability scores for each potential cell type
2. `annotate_data()`: Assigns final cell type labels based on highest confidence predictions

**Output annotations:**
- `Simplified.Celltype`: Broad cell type categories (e.g., T cells, B cells, Monocytes)
- `Detailed.Celltype`: Specific cell subtypes (e.g., CD4+ T cells, CD8+ T cells, NK cells)

In [None]:
# Examples
PBMC_HD02 = ep.generate_predictions(obj=PBMC_HD02)
PBMC_HD02 = ep.annotate_data(obj=PBMC_HD02)

In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='Averaged.Simplified.Celltype')
go.Figure(fig)

In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype')
go.Figure(fig)

## Mark rare celltypes as "small"

Small-Cluster Flagging (Size-Based)

This step identifies **very small clusters** that are likely technical artifacts or rare, unreliable events and flags them as **Small**.

**Method:**
- For each cluster, count the number of cells
- If the cluster size is **below the minimum cell threshold** (e.g. < 3 cells),
  label **all cells in that cluster** as *Small*
- Otherwise, keep the original cell type labels
- Results are written to a new output column

**Why use this:**
- Reduces noise from spurious or low-support clusters  
- Prevents over-interpretation of rare events  
- Improves robustness and clarity of downstream analyses and visualizations  

In [None]:
PBMC_HD02 = ep.mark_small_clusters(
    PBMC_HD02,
    label_in="Averaged.Detailed.Celltype",
    label_out="Averaged.Detailed.Celltype.Small",
    cluster_col="Clusters",
    min_cells=3,
)

In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Small')
go.Figure(fig)

In [None]:
PBMC_HD02.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Small')

## Mark clusters with mixed annotation as "mixed"

Cluster-Level Mixed-Label Detection (Frequency-Based)

This step identifies clusters that do **not** have a dominant cell type and flags them as **Mixed**.

**Method:**
- For each cluster, count cell type labels
- Compute the frequency of the most common label
- If this frequency is **below the threshold**, label **all cells in the cluster** as *Mixed*
- Otherwise, keep the original cell type labels
- Results are written to a new output column

**Why use this:**
- Explicitly flags heterogeneous clusters  
- Prevents over-interpreting mixed populations  
- Provides a clean checkpoint before downstream refinement or consensus steps

In [None]:
PBMC_HD02 = ep.mark_mixed_clusters(
    PBMC_HD02,
    label_in="Averaged.Detailed.Celltype.Small",
    label_out="Averaged.Detailed.Celltype.Mixed",
    cluster_col="Clusters",
    min_frequency_threshold=0.35,
)


In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Mixed')
go.Figure(fig)

In [None]:
PBMC_HD02.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Mixed')

## Refine and expand annotation using unsupervised data

Cluster-Level Cell Type Assignment (Frequency-Based)

This step assigns a **single cell type per cluster** based on label frequency.

**Method:**
- For each cluster, count cell type labels
- If the most frequent label exceeds the dominance threshold, assign it to all cells
- Otherwise, label the entire cluster as **Mixed** (same as *mark_mixed_clusters*)
- Results are written to the output label column

**Why use this:**
- Enforces cluster consistency  
- Flags heterogeneous clusters clearly

In [None]:
PBMC_HD02, summary, pivot = ep.suggest_cluster_celltype_identity(
    sample=PBMC_HD02,
    dominance_threshold=0.35,
    label_in="Averaged.Detailed.Celltype.Mixed",
    cluster_col="Clusters",
    label_out="Averaged.Detailed.Celltype.Refined",
    rewrite=True,
    verbose=True
)

In [None]:
PBMC_HD02.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Refined')

## Annotate custom celltypes using known signatures

Signature-Based Cell Type Annotation (Mast / Custom)

This step identifies cells expressing a **specific protein marker signature** using pyUCell scores and probabilistic modeling.

**Method:**

* Compute a pyUCell signature score from positive (and optional negative) markers
* Model the score distribution with an **adaptive GMM**
* Select cells from the **right-hand (high-score) component** with high posterior probability
* Relabel only those cells as the target cell type
* Optionally plot the GMM fit for QC

**Why use this:**

* Reliably detects rare or subtle populations
* Avoids hard score thresholds
* Adapts to both bimodal and skewed (tail-heavy) distributions
* Provides interpretable QC plots


In [None]:
PBMC_HD02 = ep.add_mast_annotation(
    PBMC_HD02,
    layer="Normalized_reads",
    label_in='Averaged.Detailed.Celltype.Refined',
    label_out='Averaged.Detailed.Celltype.Final',
    plot_gmm=True,
    tail_q=0.80,
    plot_title="Mast signature score — adaptive GMM",
    verbose=True
)

In [None]:
# -------- Build minimal AnnData --------
coords = np.asarray(PBMC_HD02.protein.row_attrs["umap"])
if coords.ndim != 2 or coords.shape[1] < 2:
    raise ValueError(f"row_attrs['umap'] must be (n_cells, >=2). Got {coords.shape}")
coords = coords[:, :2]

df_protein = PBMC_HD02.protein.get_attribute("Normalized_reads", constraint="row+col")

pbmc = AnnData(X=np.asarray(df_protein.values))
pbmc.obs_names = df_protein.index.astype(str)
pbmc.var_names = df_protein.columns.astype(str)
pbmc.obsm["X_umap"] = coords

# add score column
pbmc.obs["Mast_signature_score"] = np.asarray(PBMC_HD02.protein.row_attrs["Mast_signature_score"], dtype=float)

# -------- Plot (1x1) --------
with rc_context({"figure.figsize": (4.35, 4.35)}):
    fig, ax = plt.subplots(1, 1, figsize=(4.35, 4.35))

    sc.pl.umap(
        pbmc,
        color="Mast_signature_score",
        add_outline=True,
        cmap="magma",
        frameon=False,
        size=50,
        alpha=0.9,
        title="",
        ax=ax,
        show=False,
    )

    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_xticks([])
    ax.set_yticks([])

    fig.tight_layout()

    plt.show()
    plt.close(fig)

In [None]:
PBMC_HD02 = ep.add_signature_annotation(
    PBMC_HD02,
    layer="Normalized_reads",
    label_in='Averaged.Detailed.Celltype.Final',
    label_out='Averaged.Detailed.Celltype.Final',
    positive_markers=["CD14", "CD33", "CD11b", "CD64"],
    negative_markers="",
    cell_type_label="CD14_Mono",
    verbose=True,
    plot_gmm=True,
    plot_title="CD14 Mono signature score — adaptive GMM",
    tail_q=0.80,
)

In [None]:
# -------- Build minimal AnnData --------
coords = np.asarray(PBMC_HD02.protein.row_attrs["umap"])
if coords.ndim != 2 or coords.shape[1] < 2:
    raise ValueError(f"row_attrs['umap'] must be (n_cells, >=2). Got {coords.shape}")
coords = coords[:, :2]

df_protein = PBMC_HD02.protein.get_attribute("Normalized_reads", constraint="row+col")

pbmc = AnnData(X=np.asarray(df_protein.values))
pbmc.obs_names = df_protein.index.astype(str)
pbmc.var_names = df_protein.columns.astype(str)
pbmc.obsm["X_umap"] = coords

# add score column
pbmc.obs["CD14_Mono_signature_score"] = np.asarray(PBMC_HD02.protein.row_attrs["CD14_Mono_signature_score"], dtype=float)

# -------- Plot (1x1) --------
with rc_context({"figure.figsize": (4.35, 4.35)}):
    fig, ax = plt.subplots(1, 1, figsize=(4.35, 4.35))

    sc.pl.umap(
        pbmc,
        color="CD14_Mono_signature_score",
        add_outline=True,
        cmap="magma",
        frameon=False,
        size=50,
        alpha=0.9,
        title="",
        ax=ax,
        show=False,
    )

    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_xticks([])
    ax.set_yticks([])

    fig.tight_layout()

    plt.show()
    plt.close(fig)

## Final visualization of refined cell type annotations

This UMAP plot displays the final refined cell type annotations after all processing steps including:
- Initial EspressoPro predictions
- Removal of small clusters (< 3 cells)
- Identification of mixed clusters
- Annotation refinement leveraging unsupervised clusters
- Addition of mast and custom cell type annotations

The `Averaged.Detailed.Celltype.Final` field represents the most accurate cell type assignments for each cell.

In [None]:
fig = PBMC_HD02.protein.scatterplot(attribute='umap',colorby='Averaged.Detailed.Celltype.Final')
go.Figure(fig)

In [None]:
PBMC_HD02.protein.signaturemap('Normalized_reads',
                           splitby='Averaged.Detailed.Celltype.Final')

In [None]:
PBMC_HD02 = ep.clear_annotation(PBMC_HD02)

In [None]:
PBMC_HD02.protein.row_attrs

# Analysis Summary

## Completed Analysis Pipeline

This notebook successfully demonstrates a comprehensive single-cell protein analysis workflow using two PBMC samples (HD01 and HD02). Each sample underwent identical processing steps:

### Key Analysis Steps:
1. **Data preprocessing** - Normalization and scaling of protein expression
2. **Dimensionality reduction** - PCA followed by UMAP for visualization  
3. **Unsupervised clustering** - Graph-community detection for initial cell grouping
4. **Automated annotation** - EspressoPro machine learning predictions
5. **Quality control** - Detection of small clusters and mixed populations
6. **Refinement** -  Cluster-based improvements
7. **Specialized detection** - Mast or custom cell types identification
8. **Validation** - Multiple visualization methods (UMAP, heatmaps, signature maps)

### Final Results:
- **High-quality cell type annotations** with multiple levels of detail
- **Robust cell populations** validated through multiple approaches
- **Comprehensive visualization** enabling biological interpretation
- **Reproducible workflow** suitable for similar datasets

### Applications:
This analysis framework can be applied to:
- Clinical sample characterization
- Treatment response studies  
- Cell atlas construction
- Biomarker discovery
- Quality control for single-cell experiments