## HRA Hierarchical Tissue Unit Annotation

In this notebook, we will build on [an existing one on hierarchical tissue unit annotation](https://github.com/HickeyLab/Hierarchical-Tissue-Unit-Annotation) by [Dr. John Hickey](https://bme.duke.edu/people/john-hickey/). Concretely, we will take a CSV file with cell positions, types, donor IDs, and extraction sites, and then create a nost-dist-vis widget. For more information and documentation on hra-jupyter-widgets, please see [https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb](https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb).

## Load libraries

In [16]:
# Import native packages
import time
import sys
import math
import os

In [17]:
#Install and import external packages
%pip install matplotlib
import matplotlib.pyplot as plt

%pip install pandas
import pandas as pd

%pip install seaborn
import seaborn as sns

%pip install numpy
import numpy as np

%pip install -U scikit-learn
from sklearn.neighbors import NearestNeighbors
from sklearn.cluster import MiniBatchKMeans
from sklearn.cluster import KMeans

%pip install ipywidgets
import ipywidgets as widgets

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [18]:
# Import hra-jupyter-widgets. For documentation, please see https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb
%pip install hra_jupyter_widgets
from hra_jupyter_widgets import (
    BodyUi,
    CdeVisualization,
    Eui,
    EuiOrganInformation,
    FtuExplorer,
    FtuExplorerSmall,
    MedicalIllustration,
    ModelViewer,
    NodeDistVis,
    Rui,
)

Note: you may need to restart the kernel to use updated packages.


## Download data from Dryad

In [19]:
#  I tried using curl to download the CSV file from Dryad, but I got a 403 response (forbidden). So I downloaded the file manually via the browser from https://datadryad.org/stash/downloads/file_stream/2572152. Sicne it is 2.91 GB big, I added it to gitignore.
!curl -L https://datadryad.org/stash/downloads/file_stream/2572152 -o 23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   118  100   118    0     0    507      0 --:--:-- --:--:-- --:--:--   508


## Read data as DataFrame

In [20]:
# Read the CSV file and convert it to a df
df = pd.read_csv('data/23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv', index_col=0)
df

Unnamed: 0,MUC2,SOX9,MUC1,CD31,Synapto,CD49f,CD15,CHGA,CDX2,ITLN1,...,Cell Type em,Cell subtype,Neighborhood,Neigh_sub,Neighborhood_Ind,NeighInd_sub,Community,Major Community,Tissue Segment,Tissue Unit
0,-0.303994,-0.163727,-0.587608,-0.212903,0.164173,-0.664863,0.049305,0.003616,-0.377532,-0.450794,...,NK,Immune,Mature Epithelial,Epithelial,Mature Epithelial,Epithelial,Plasma Cell Enriched,Immune,Mucosa,Mucosa
1,-0.301927,-0.491706,-0.500804,-0.243205,-0.142568,-0.664861,-0.182627,-0.117573,-0.182754,-0.236199,...,NK,Immune,Transit Amplifying Zone,Epithelial,Mature Epithelial,Epithelial,Mature Epithelial,Epithelial,Mucosa,Mucosa
2,-0.302206,-0.547234,-0.510705,-0.235309,-0.217185,-0.622758,-0.296486,-0.091504,-0.268055,-0.355383,...,NK,Immune,Innate Immune Enriched,Immune,Innate Immune Enriched,Immune,Innate Immune Enriched,Immune,Mucosa,Mucosa
3,-0.304219,-0.613068,-0.584499,-0.243757,-0.266696,-0.658449,-0.299027,-0.121460,-0.345381,-0.450792,...,NK,Immune,Stroma & Innate Immune,Stromal,Stroma & Innate Immune,Stromal,Stroma,Stroma,Subucosa,Submucosa
4,-0.294644,-0.615593,-0.570580,-0.247548,-0.042246,-0.642230,-0.299031,-0.121458,-0.377533,-0.450797,...,NK,Immune,Outer Follicle,Immune,Outer Follicle,Immune,Follicle,Immune,Mucosa,Mucosa
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2603212,0.351916,0.693827,-0.081489,-0.240643,0.008875,0.143445,0.373710,-0.097896,0.869830,0.579653,...,CD66+ Enterocyte,Epithelial,CD66+ Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Secretory Epithelial,Epithelial,Mucosa,Mucosa
2603213,0.233642,0.171892,0.141842,-0.236145,-0.097772,-0.099283,0.626185,-0.105545,0.092076,0.682969,...,CD66+ Enterocyte,Epithelial,CD66+ Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Secretory Epithelial,Epithelial,Mucosa,Mucosa
2603214,-0.212237,-0.280904,-0.197833,-0.245638,-0.152563,-0.125035,0.430416,-0.105787,-0.038327,-0.173319,...,CD66+ Enterocyte,Epithelial,CD8+ T Enriched IEL,Immune,CD8+ T Enriched IEL,Immune,Mature Epithelial,Epithelial,Mucosa,Mucosa
2603215,-0.328666,0.607609,-0.180362,-0.247351,-0.143742,-0.169576,1.095596,-0.113879,0.370160,-0.133272,...,CD66+ Enterocyte,Epithelial,Transit Amplifying Zone,Epithelial,Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Mucosa,Mucosa


In [21]:
# Only keep cells from one dataset by selecting 1 donor and 1 region
df_filtered = df[(df['donor'] == "B004") & (
    df['unique_region'] == "B004_Ascending")]

In [22]:
# Make new df with only x, y, and Cell Type columns (needed for node-dist-vis)
df_cells = df_filtered[['x', 'y', 'Cell Type']]
df_cells

Unnamed: 0,x,y,Cell Type
0,3984.0,3387.0,NK
1,5188.0,4116.0,NK
2,6070.0,3146.0,NK
3,7587.0,2361.0,NK
4,6792.0,3891.0,NK
...,...,...,...
75608,7666.0,6049.0,B
75609,6920.0,7160.0,B
75610,8311.0,6284.0,B
76271,3780.0,1700.0,Stroma


In [23]:
# Prepare df_cells for visualization with node_dist_vis widget
node_list = [{'x': row['x'], 'y': row['y'], 'Cell Type': row['Cell Type']} for index, row in df_cells.iterrows()]
node_list

[{'x': 3984.0, 'y': 3387.0, 'Cell Type': 'NK'},
 {'x': 5188.0, 'y': 4116.0, 'Cell Type': 'NK'},
 {'x': 6070.0, 'y': 3146.0, 'Cell Type': 'NK'},
 {'x': 7587.0, 'y': 2361.0, 'Cell Type': 'NK'},
 {'x': 6792.0, 'y': 3891.0, 'Cell Type': 'NK'},
 {'x': 7968.0, 'y': 6351.0, 'Cell Type': 'NK'},
 {'x': 6932.0, 'y': 4531.0, 'Cell Type': 'NK'},
 {'x': 7172.0, 'y': 4423.0, 'Cell Type': 'NK'},
 {'x': 7700.0, 'y': 6261.0, 'Cell Type': 'NK'},
 {'x': 163.0, 'y': 261.0, 'Cell Type': 'Enterocyte'},
 {'x': 188.0, 'y': 272.0, 'Cell Type': 'Enterocyte'},
 {'x': 202.0, 'y': 43.0, 'Cell Type': 'MUC1+ Enterocyte'},
 {'x': 202.0, 'y': 115.0, 'Cell Type': 'Enterocyte'},
 {'x': 191.0, 'y': 245.0, 'Cell Type': 'Enterocyte'},
 {'x': 171.0, 'y': 265.0, 'Cell Type': 'Enterocyte'},
 {'x': 221.0, 'y': 48.0, 'Cell Type': 'Enterocyte'},
 {'x': 219.0, 'y': 106.0, 'Cell Type': 'Enterocyte'},
 {'x': 224.0, 'y': 59.0, 'Cell Type': 'MUC1+ Enterocyte'},
 {'x': 213.0, 'y': 270.0, 'Cell Type': 'Enterocyte'},
 {'x': 248.0, 'y': 

In [24]:
# Finally, let's instantiate the NodeDistVis class with some parameters. We pass in the node_list, indicate Endothelial cells as targets for the edges. 
# As we are not supplying an edge list, we need to provide a max_edge_distance, which is set to 1000 (generiously)

node_dist_vis = NodeDistVis(
    nodes = node_list,
    node_target_key="Cell Type",
    node_target_value="Endothelial",
    max_edge_distance = 1000
)

# Display our new widget
display(node_dist_vis)

NodeDistVis(color_map=None, color_map_data=None, color_map_key=None, edges=None, max_edge_distance=1000, node_…