Let us now consider hierarchical cell-type relationships. For this, we will utilize the Allen-Brain datasets, as they come with two (non-overlapping) granularity-levels of cell-types.

In [1]:
import sys

sys.path.append("../src/")

import json
import logging
import scanpy as sc
import refcm
from refcm import RefCM

refcm.start_logging(logging.DEBUG)

In [2]:
mtg = sc.read_h5ad("../data/MTG.h5ad")
alm = sc.read_h5ad("../data/ALM.h5ad")
visp = sc.read_h5ad("../data/VISp.h5ad")

[h5py._conv      ] [DEBUG   ] : Creating converter from 3 to 5


Let us first retrieve the hierarchical relationships between cell types:

In [3]:
labels = mtg.obs[["labels3", "labels34"]].set_index("labels3")
coarse_levels = labels.index.unique().to_list()

hierarchy = {
    level: labels.loc[level].drop_duplicates().values.flatten().tolist()
    for level in coarse_levels
}

print(json.dumps(hierarchy, indent=4, sort_keys=True))

{
    "Excitatory": [
        "Exc L5/6 IT 3",
        "Exc L6 CT",
        "Exc L6b",
        "Exc L4/5 IT",
        "Exc L5/6 IT 2",
        "Exc L6 IT 1",
        "Exc L6 IT 2",
        "Exc L5/6 NP",
        "Exc L2/3 IT",
        "Exc L5/6 IT 1",
        "Exc L3/5 IT",
        "Exc L5 PT"
    ],
    "Inhibitory": [
        "Sst 1",
        "Lamp5 Rosehip",
        "Vip 5",
        "Pvalb 2",
        "Sst 3",
        "Pvalb 1",
        "Vip 3",
        "Pax6",
        "Vip 1",
        "Vip Sncg",
        "Vip 4",
        "Lamp5 2",
        "Lamp5 Lhx6",
        "Sst 4",
        "Chandelier",
        "Vip 2",
        "Sst 2",
        "Sst 5",
        "Sst Chodl",
        "Lamp5 1"
    ],
    "Non-neuronal": [
        "Astrocyte",
        "Oligo"
    ]
}


We can then map across these different levels and datasets -- here VISp to ALM -- and evaluate the performance as follows. Let us first consider mapping from granular to coarse resolutions.

In [4]:
# ensure we allow the query clusters to "merge" without restriction
rcm = RefCM(max_merges=-1)
rcm.setref(alm, "ALM", "labels3")
m = rcm.annotate(visp, "VISp", "labels34")

[refcm.refcm     ] [INFO    ] : NOTE: raw counts expected in anndata .X attributes.
[refcm.refcm     ] [DEBUG   ] : Loading cached mapping costs from /Users/valerio/Library/Caches/refcm/cache.json.
[refcm.embeddings] [DEBUG   ] : Using 1503 genes.
[refcm.refcm     ] [DEBUG   ] : Computing Wasserstein distances.
|████████████████| [100.00% ] : 00:26
[refcm.refcm     ] [DEBUG   ] : Saving mapping costs to /Users/valerio/Library/Caches/refcm/cache.json.
[refcm.refcm     ] [DEBUG   ] : starting LP optimization
[refcm.refcm     ] [DEBUG   ] : optimization terminated w. status "Optimal"


In [5]:
m.set_type_equality_strictness(0.7)
m.eval("labels34")
m.display_matching_costs(ground_truth_obs_key="labels34")

[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Astrocyte            mapped to Non-neuronal        
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Chandelier           mapped to Inhibitory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L2/3 IT          mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L3/5 IT          mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L4/5 IT          mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L5 PT            mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L5/6 IT 1        mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L5/6 IT 2        mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L5/6 IT 3        mapped to Excitatory          
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Exc L5/6 NP          mapped to E

Comparing with the previously-established hierarchy, every cell was indeed correctly labeled to its coarser cell type!

Conversely, let us now map from coarse to granular annotations:

In [6]:
# ensure we allow the query clusters to "merge" without restriction
rcm = RefCM(max_splits=-1)
rcm.setref(alm, "ALM", "labels34")
m = rcm.annotate(visp, "VISp", "labels3")

[refcm.refcm     ] [INFO    ] : NOTE: raw counts expected in anndata .X attributes.
[refcm.refcm     ] [DEBUG   ] : Loading cached mapping costs from /Users/valerio/Library/Caches/refcm/cache.json.
[refcm.embeddings] [DEBUG   ] : Using 1503 genes.
[refcm.refcm     ] [DEBUG   ] : Computing Wasserstein distances.
|████████████████| [100.00% ] : 00:24
[refcm.refcm     ] [DEBUG   ] : Saving mapping costs to /Users/valerio/Library/Caches/refcm/cache.json.
[refcm.refcm     ] [DEBUG   ] : starting LP optimization
[refcm.refcm     ] [DEBUG   ] : optimization terminated w. status "Optimal"


In [7]:
m.set_type_equality_strictness(0.7)
m.eval("labels3")
fig = m.display_matching_costs(
    ground_truth_obs_key="labels3",
    show_all_labels=True,
    angle_x_labels=True,
    width=1000,
    height=400,
)
fig.show()
# fig.write_image("trees/refcm_brain.png", scale=3)

[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L2/3 IT         
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L3/5 IT         
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L4/5 IT         
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L5 PT           
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L5/6 IT 1       
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L5/6 IT 2       
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L5/6 IT 3       
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L5/6 NP         
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to Exc L6 CT           
[refcm.matchings ] [DEBUG   ] : [32m[+|0.70][0m Excitatory           mapped to E

Comparing this graph with the previous one and the established hierarchy, we conclude that this mapping direction also establishes the correct links in this direction!