## HRA Hierarchical Tissue Unit Annotation

In this notebook, we will build on [an existing one on hierarchical tissue unit annotation](https://github.com/HickeyLab/Hierarchical-Tissue-Unit-Annotation) by [Dr. John Hickey](https://bme.duke.edu/people/john-hickey/). Concretely, we will take a CSV file with cell positions, types, donor IDs, and extraction sites, and then create a nost-dist-vis widget. For more information and documentation on hra-jupyter-widgets, please see [https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb](https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb).

## Load libraries

In [1]:
# Import native packages
import time
import sys
import math
import os
from pprint import pprint

In [2]:
#Install and import external packages
%pip install matplotlib
import matplotlib.pyplot as plt

%pip install pandas
import pandas as pd

%pip install seaborn
import seaborn as sns

%pip install numpy
import numpy as np

%pip install -U scikit-learn
from sklearn.neighbors import NearestNeighbors
from sklearn.cluster import MiniBatchKMeans
from sklearn.cluster import KMeans

%pip install ipywidgets
import ipywidgets as widgets

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
# Import hra-jupyter-widgets. For documentation, please see https://github.com/x-atlas-consortia/hra-jupyter-widgets/blob/main/usage.ipynb
%pip install hra_jupyter_widgets
from hra_jupyter_widgets import (
    BodyUi,
    CdeVisualization, # in this example, we will use this one...
    Eui,
    EuiOrganInformation,
    FtuExplorer,
    FtuExplorerSmall,
    MedicalIllustration,
    ModelViewer,
    NodeDistVis, # ...and this one, but all of them are usable for different purposes!
    Rui,
)

Note: you may need to restart the kernel to use updated packages.


## Download data from Dryad

In [4]:
#  I tried using curl to download the CSV file from Dryad, but I got a 403 response (forbidden). So I downloaded the file manually via the browser from https://datadryad.org/stash/downloads/file_stream/2572152. Sicne it is 2.91 GB big, I added it to gitignore.
!curl -L https://datadryad.org/stash/downloads/file_stream/2572152 -o 23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   118  100   118    0     0    392      0 --:--:-- --:--:-- --:--:--   392


## Read data as DataFrame

In [5]:
# Read the CSV file and convert it to a df
df = pd.read_csv('data/23_09_CODEX_HuBMAP_alldata_Dryad_merged.csv', index_col=0)
df

Unnamed: 0,MUC2,SOX9,MUC1,CD31,Synapto,CD49f,CD15,CHGA,CDX2,ITLN1,...,Cell Type em,Cell subtype,Neighborhood,Neigh_sub,Neighborhood_Ind,NeighInd_sub,Community,Major Community,Tissue Segment,Tissue Unit
0,-0.303994,-0.163727,-0.587608,-0.212903,0.164173,-0.664863,0.049305,0.003616,-0.377532,-0.450794,...,NK,Immune,Mature Epithelial,Epithelial,Mature Epithelial,Epithelial,Plasma Cell Enriched,Immune,Mucosa,Mucosa
1,-0.301927,-0.491706,-0.500804,-0.243205,-0.142568,-0.664861,-0.182627,-0.117573,-0.182754,-0.236199,...,NK,Immune,Transit Amplifying Zone,Epithelial,Mature Epithelial,Epithelial,Mature Epithelial,Epithelial,Mucosa,Mucosa
2,-0.302206,-0.547234,-0.510705,-0.235309,-0.217185,-0.622758,-0.296486,-0.091504,-0.268055,-0.355383,...,NK,Immune,Innate Immune Enriched,Immune,Innate Immune Enriched,Immune,Innate Immune Enriched,Immune,Mucosa,Mucosa
3,-0.304219,-0.613068,-0.584499,-0.243757,-0.266696,-0.658449,-0.299027,-0.121460,-0.345381,-0.450792,...,NK,Immune,Stroma & Innate Immune,Stromal,Stroma & Innate Immune,Stromal,Stroma,Stroma,Subucosa,Submucosa
4,-0.294644,-0.615593,-0.570580,-0.247548,-0.042246,-0.642230,-0.299031,-0.121458,-0.377533,-0.450797,...,NK,Immune,Outer Follicle,Immune,Outer Follicle,Immune,Follicle,Immune,Mucosa,Mucosa
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2603212,0.351916,0.693827,-0.081489,-0.240643,0.008875,0.143445,0.373710,-0.097896,0.869830,0.579653,...,CD66+ Enterocyte,Epithelial,CD66+ Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Secretory Epithelial,Epithelial,Mucosa,Mucosa
2603213,0.233642,0.171892,0.141842,-0.236145,-0.097772,-0.099283,0.626185,-0.105545,0.092076,0.682969,...,CD66+ Enterocyte,Epithelial,CD66+ Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Secretory Epithelial,Epithelial,Mucosa,Mucosa
2603214,-0.212237,-0.280904,-0.197833,-0.245638,-0.152563,-0.125035,0.430416,-0.105787,-0.038327,-0.173319,...,CD66+ Enterocyte,Epithelial,CD8+ T Enriched IEL,Immune,CD8+ T Enriched IEL,Immune,Mature Epithelial,Epithelial,Mucosa,Mucosa
2603215,-0.328666,0.607609,-0.180362,-0.247351,-0.143742,-0.169576,1.095596,-0.113879,0.370160,-0.133272,...,CD66+ Enterocyte,Epithelial,Transit Amplifying Zone,Epithelial,Mature Epithelial,Epithelial,CD66+ Mature Epithelial,Epithelial,Mucosa,Mucosa


In [6]:
# Only keep cells from one dataset by selecting 1 donor and 1 region
df_filtered = df[(df['donor'] == "B004") & (
    df['unique_region'] == "B004_Ascending")]

In [7]:
# Make new df with only x, y, and Cell Type columns (needed for node-dist-vis)
df_cells = df_filtered[['x', 'y', 'Cell Type']]
df_cells

Unnamed: 0,x,y,Cell Type
0,3984.0,3387.0,NK
1,5188.0,4116.0,NK
2,6070.0,3146.0,NK
3,7587.0,2361.0,NK
4,6792.0,3891.0,NK
...,...,...,...
75608,7666.0,6049.0,B
75609,6920.0,7160.0,B
75610,8311.0,6284.0,B
76271,3780.0,1700.0,Stroma


In [8]:
# Next, let's define a function that turns a DataFrame into a node list that can then be passed into the CdeVisualization or NodeDistVis widget
def make_node_list(df:pd.DataFrame, is_3d:bool = False):
  """Turn a DataFrame into a list of dicts for passing them into a HRA widget

  Args:
      df (pd.DataFrame): A DataFrame with cells
  """
  
  # If the df does not have a z-axis column, let's add one and set all cells to 0
  if not is_3d:
    df.loc[:, ('z')] = 0
  
  node_list = [{'x': row['x'], 'y': row['y'], 'z': row['z'], 'Cell Type': row['Cell Type']}
                 for index, row in df.iterrows()]

  return node_list
  

In [9]:
# Prepare df_cells for visualization with NodeDistVis widget
node_list = make_node_list(df_cells, False)

# Let's inspect the first 5 rows
pprint(node_list[:5])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:, ('z')] = 0


[{'Cell Type': 'NK', 'x': 3984.0, 'y': 3387.0, 'z': 0},
 {'Cell Type': 'NK', 'x': 5188.0, 'y': 4116.0, 'z': 0},
 {'Cell Type': 'NK', 'x': 6070.0, 'y': 3146.0, 'z': 0},
 {'Cell Type': 'NK', 'x': 7587.0, 'y': 2361.0, 'z': 0},
 {'Cell Type': 'NK', 'x': 6792.0, 'y': 3891.0, 'z': 0}]


In [10]:
# Finally, let's instantiate the NodeDistVis class with some parameters. We pass in the node_list, indicate Endothelial cells as targets for the edges. 
# As we are not supplying an edge list, we need to provide a max_edge_distance, which is set to 1000 (generiously)

node_dist_vis = NodeDistVis(
    nodes = node_list,
    node_target_key="Cell Type",
    node_target_value="Endothelial",
    max_edge_distance = 1000
)

# Display our new widget
display(node_dist_vis)

NodeDistVis(color_map=None, color_map_data=None, color_map_key=None, edges=None, max_edge_distance=1000, node_…

## Next, let's get all regions and make a 3D tissue stack.

In [11]:
# Only keep cells from one dataset by selecting 1 donor and 3 regions
df_filtered_3d = df[(df['donor'] == "B004") & (
    df['unique_region'] == 'B004_Descending') | (df['unique_region'] == 'B004_Ascending') | (df['unique_region'] == 'B004_Transverse')]

In [12]:
# Set a z-offset
offset = 1000

# Set z axis (or any other axis) by region
df_filtered_3d['z'] = df_filtered_3d['unique_region'].apply(lambda v: 0 if v == 'B004_Descending' 
                                                            else offset if v == 'B004_Ascending'
                                                            else offset * 2)

# Make new df with only x, y, z, and Cell Type columns
df_cells_3d = df_filtered_3d[['x', 'y', 'z','Cell Type']]
df_cells_3d

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered_3d['z'] = df_filtered_3d['unique_region'].apply(lambda v: 0 if v == 'B004_Descending'


Unnamed: 0,x,y,z,Cell Type
0,3984.0,3387.0,1000,NK
1,5188.0,4116.0,1000,NK
2,6070.0,3146.0,1000,NK
3,7587.0,2361.0,1000,NK
4,6792.0,3891.0,1000,NK
...,...,...,...,...
2470869,8523.0,8994.0,2000,B
2470870,8577.0,8109.0,2000,Plasma
2470871,8678.0,8108.0,2000,Plasma
2470872,8887.0,8318.0,2000,B


In [13]:
# Prepare df_cells_3d for visualization with CdeVisualization widget
node_list = make_node_list(df_cells_3d, True)

# Let's inspect the first 5 rows
pprint(node_list[:5])

[{'Cell Type': 'NK', 'x': 3984.0, 'y': 3387.0, 'z': 1000},
 {'Cell Type': 'NK', 'x': 5188.0, 'y': 4116.0, 'z': 1000},
 {'Cell Type': 'NK', 'x': 6070.0, 'y': 3146.0, 'z': 1000},
 {'Cell Type': 'NK', 'x': 7587.0, 'y': 2361.0, 'z': 1000},
 {'Cell Type': 'NK', 'x': 6792.0, 'y': 3891.0, 'z': 1000}]


In [14]:
# Finally, let's instantiate the NodeDistVis class with our node_list as paramter.
cde = CdeVisualization(
    nodes=node_list
)

# Display our new widget
display(cde)

CdeVisualization(age=None, color_map=None, color_map_key=None, color_map_value_key=None, creation_timestamp=No…