TODO: 

2. Check taxonomy class, the constructor __init__ get inputs that are unused. 
3. Verify that we can load and create another notebook for Viz. 
4. Create another classifier for "hybrid" taxonomy
5. Cleanup commented functios in TissueGraph.py (they are mostly related to hybrid taxonomy) 
6. Cleanup dependencies and unused methods from Utils
7. Viz needs to support Geoms as list
8. update Viz code (mostly comments)
9. Finish configuring Sphinx
10. Push to github




# Create and construct a TMG object from multiple slices

This example notebooks performs basic TMG analysis operations. The "input" are the per slice files generated by Processing module. 

In this example, the Taxonomy/Classifiers are not given and are created on the fly based on optimal leiden. 

Overall, the steps in the analysis are: 
1. Load and create an empty TMG object
2. Create a cell layer. This loads, normalizes, creates graphs, etc, but does not classify
3. Unsupervized learning using optimal leiden to create classifier and taxonomy
4. Create iso-zone layer
5. Classify cellular neighborhoods into regions using topic modeling
6. save to file

In [10]:
import numpy as np

In [1]:
# import igraph
import logging
import matplotlib.pyplot as plt 
import pandas as pd

from dredFISH.Analysis import TissueGraph
from dredFISH.Analysis import Classification
from dredFISH.Utils import tmgu

import importlib
importlib.reload(TissueGraph)
importlib.reload(Classification)
importlib.reload(tmgu)

logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S')

# Prepare files
- Two input files: `_matrix.csv` `_metadata.csv`
- for the metadata file, you specific columns:
    - `stage_x` `stage_y` `section_index`
- You **DO NOT** want to use column name:
    - `Type`

In [2]:
f = '/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/spatial_metadata_old.csv'
fout = f.replace('_old.csv', '.csv')
df = pd.read_csv(f)
df['section_index'] = 'sagittal1'
df = df.rename(columns={'Type': 'STARmap_type'})
df.to_csv(fout)
df

Unnamed: 0,section_index,stage_x,stage_y,stage_z,Tissue_Symbol,STARmap_type,Subtype_Symbol
0,sagittal1,14731.673077,1402.807692,8.759615,Meninges,Vascular and leptomeningeal cells,VLM_1
1,sagittal1,14614.902439,1411.317073,3.585366,Meninges,Unannotated,NA_2
2,sagittal1,14655.137255,1403.019608,3.352941,Meninges,Vascular and leptomeningeal cells,VLM_1
3,sagittal1,15798.808081,980.565657,6.818182,Meninges,Vascular and leptomeningeal cells,VLM_2
4,sagittal1,15377.847656,1200.558594,11.820312,Meninges,Vascular and leptomeningeal cells,VLM_2
...,...,...,...,...,...,...,...
91241,sagittal1,20473.408163,51331.224490,17.857143,CBXmo,Vascular and leptomeningeal cells,VLM_1
91242,sagittal1,20360.710145,51493.557971,4.253623,CBXmo,Vascular and leptomeningeal cells,VLM_1
91243,sagittal1,20511.285714,51288.204082,21.510204,CBXmo,Vascular and leptomeningeal cells,VLM_1
91244,sagittal1,20542.461538,51319.256410,9.358974,CBXmo,Vascular and leptomeningeal cells,VLM_2


### Import and create an empty TMG

In [4]:
basepth = '/bigstore/GeneralStorage/fangming/projects/test/for_haley/data'
!ls -alhtr $basepth
!head $basepth"/TMG.json"

total 49M
-rw-r--r-- 1 fangming wollmanlab   45 Nov  4 14:19 README.txt
-rw-r--r-- 1 fangming wollmanlab  30M Nov  4 14:20 pca_matrix.csv
-rw-r--r-- 1 fangming wollmanlab 9.5M Nov  4 14:20 spatial_metadata_old.csv
drwxr-xr-x 3 fangming wollmanlab   25 Nov  4 14:20 ..
drwxr-xr-x 2 fangming wollmanlab 4.0K Nov  4  2022 .
-rw-r--r-- 1 fangming wollmanlab 9.4M Nov  4  2022 spatial_metadata.csv
head: cannot open '/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/TMG.json' for reading: No such file or directory


In [5]:
TMG = TissueGraph.TissueMultiGraph(basepath=basepth, 
                                   redo=True, # create an empty one
                                  ) 

### Create a `cell` layer
Creating a cell layer, load data from file, normalizes and creates an unclassified tissue graph

In [6]:
%%time
TMG.create_cell_layer(metric = 'cosine')
logging.info(f"TMG has {len(TMG.Layers)} Layers")

INFO:root:In TMG.create_cell_layer
INFO:root:Started reading matrices and metadata
INFO:root:done reading files
INFO:root:91246 cells, minimum counts = 2.136777206
  self.adata = anndata.AnnData(feature_mat, obs=obs) # the tissuegraph AnnData object
INFO:root:building spatial graphs
INFO:root:Building spatial graphs for 1 sections


/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/cell.h5ad


INFO:root:updating anndata
INFO:root:done building spatial graph
INFO:root:building feature graphs
INFO:root:building feature graph using cosine
INFO:root:done with create_cell_layer
INFO:root:TMG has 1 Layers


CPU times: user 1min 50s, sys: 1min 11s, total: 3min 2s
Wall time: 37.7 s


### Create Geometries

In [7]:
%%time
# TMG.add_geoms()

CPU times: user 6 µs, sys: 0 ns, total: 6 µs
Wall time: 12.9 µs


### Create cell types using `OptLeiden classifier` applied on the `cell` layer

In [8]:
%%time
# Create the classifier
optleiden = Classification.OptimalLeidenKNNClassifier(TMG.Layers[0])
# train the classifier
optleiden.train(opt_res=11.5, opt_params={'iters':10, 'n_consensus':1})
# use the classifier to create types and add them to TMG using the Taxonomy created on the fly by the classifier
type_vec = optleiden.classify()
TMG.add_type_information(0, type_vec, optleiden.tax)

/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
Number of types: 164 initial entropy: -8.537518719946611 number of evals: 0
CPU times: user 40.7 s, sys: 202 ms, total: 40.9 s
Wall time: 40.7 s


### Create `isozone` layer

In [9]:
TMG.create_isozone_layer()
logging.info(f"TMG has {len(TMG.Layers)} Layers")

INFO:root:TMG has 2 Layers


/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad


### Create `region` layer
To create regions, we first create a new classifier that works on local cell environments (local type abundace). Classify cells based on their regions and use this type to create a new layer. 

In [13]:
%%time

n_topics_list = [2,5,10]
n_procs = 1 

topic_cls = Classification.TopicClassifier(TMG.Layers[0])

CPU times: user 16.9 s, sys: 609 ms, total: 17.5 s
Wall time: 17.4 s


In [14]:
topic_cls.train(n_topics_list=n_topics_list, n_procs=n_procs)
topics = topic_cls.classify(topic_cls.Env)

INFO:root:Running LDA in serial


In [15]:
%%time
TMG.create_region_layer(topics, topic_cls.tax)
logging.info(f"TMG has {len(TMG.Layers)} Layers")

  self.adata = anndata.AnnData(feature_mat, obs=obs) # the tissuegraph AnnData object
INFO:root:TMG has 3 Layers


/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/isozone.h5ad
/bigstore/GeneralStorage/fangming/projects/test/for_haley/data/region.h5ad
CPU times: user 514 ms, sys: 0 ns, total: 514 ms
Wall time: 507 ms


### Save to files
TMG is saved as a config json file, one AnnData file per layer, and one dataframe per taxonomy. 

In [24]:
t1 = TMG.Layers[0].adata.obsm['XY']
t2 = t1[0,0]

In [29]:
t1[:,0] #+ t1[:,1]

array([14731.6730769231, 14614.9024390244, 14655.137254902, ...,
       20511.2857142857, 20542.4615384615, 20431.8055555556], dtype=object)

In [16]:
%%time
TMG.save()

... storing 'section_index' as categorical
... storing 'Tissue_Symbol' as categorical
... storing 'STARmap_type' as categorical
... storing 'Subtype_Symbol' as categorical
... storing 'Slice' as categorical


TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'XY' of <class 'h5py._hl.group.Group'> to /