TODO: 

2. Check taxonomy class, the constructor __init__ get inputs that are unused. 
3. Verify that we can load and create another notebook for Viz. 
4. Create another classifier for "hybrid" taxonomy
5. Cleanup commented functios in TissueGraph.py (they are mostly related to hybrid taxonomy) 
6. Cleanup dependencies and unused methods from Utils
7. Viz needs to support Geoms as list
8. update Viz code (mostly comments)
9. Finish configuring Sphinx
10. Push to github




# Create and construct a TMG object from multiple slices

This example notebooks performs basic TMG analysis operations. The "input" are the per slice files generated by Processing module. 

In this example, the Taxonomy/Classifiers are not given and are created on the fly based on optimal leiden. 

Overall, the steps in the analysis are: 
1. Load and create an empty TMG object
2. Create a cell layer. This loads, normalizes, creates graphs, etc, but does not classify
3. Unsupervized learning using optimal leiden to create classifier and taxonomy
4. Create iso-zone layer
5. Classify cellular neighborhoods into regions using topic modeling
6. save to file

### Import and create an empty TMG

In [1]:
# import igraph
import logging
import matplotlib.pyplot as plt 

from dredFISH.Analysis import TissueGraph
from dredFISH.Analysis import Classification
from dredFISH.Utils import tmgu

import importlib
importlib.reload(TissueGraph)
importlib.reload(Classification)
importlib.reload(tmgu)

logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S')

In [2]:
basepth = '/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3'
!ls -alhtr $basepth
!head $basepth"/TMG.json"

total 61M
drwxrwxrwx 7 zach     wollmanlab 4.0K Jul  8 12:44 ..
lrwxrwxrwx 1 fangming wollmanlab   69 Jul  8 12:45 DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_metadata.csv -> ../Dataset1-t1/DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_metadata.csv
lrwxrwxrwx 1 fangming wollmanlab   67 Jul  8 12:45 DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_matrix.csv -> ../Dataset1-t1/DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_matrix.csv
drwxr-xr-x 2 fangming wollmanlab 4.0K Jul  8 12:52 .
-rw-r--r-- 1 fangming wollmanlab  165 Jul  8 14:09 TMG.json
-rw-r--r-- 1 fangming wollmanlab  54M Jul  8 14:09 cell.h5ad
-rw-r--r-- 1 fangming wollmanlab 6.8M Jul  8 14:09 isozone.h5ad
-rw-r--r-- 1 fangming wollmanlab 681K Jul  8 14:09 region.h5ad
-rw-r--r-- 1 fangming wollmanlab    3 Jul  8 14:09 Taxonomy_clusters.csv
-rw-r--r-- 1 fangming wollmanlab    3 Jul  8 14:09 Taxonomy_topics.csv
{"layers_graph": [[0, 1], [0, 2]], "layer_taxonomy_mapping": {"0": 0, "1": 0, "2": 1}, "Taxonomies": ["clusters", "topics"], "Layers": ["

In [3]:
TMG = TissueGraph.TissueMultiGraph(basepath=basepth, 
                                   redo=True, # create an empty one
                                  ) 

### Create a `cell` layer
Creating a cell layer, load data from file, normalizes and creates an unclassified tissue graph

In [4]:
%%time
TMG.create_cell_layer(metric = 'cosine')
logging.info(f"TMG has {len(TMG.Layers)} Layers")

INFO:root:In TMG.create_cell_layer
INFO:root:Started reading matrices and metadata
INFO:root:done reading files
INFO:root:77846 cells, minimum counts = 1872.0
INFO:root:building spatial graphs
INFO:root:Building spatial graphs for 1 sections


/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/cell.h5ad


INFO:root:updating anndata
INFO:root:done building spatial graph
INFO:root:building feature graphs
INFO:root:building feature graph using cosine
INFO:root:done with create_cell_layer
INFO:root:TMG has 1 Layers


CPU times: user 1min 35s, sys: 1min 12s, total: 2min 47s
Wall time: 39.4 s


### Create Geometries

In [5]:
%%time
TMG.add_geoms()

CPU times: user 4 µs, sys: 3 µs, total: 7 µs
Wall time: 12.6 µs


### Create cell types using `OptLeiden classifier` applied on the `cell` layer

In [6]:
%%time
# Create the classifier
optleiden = Classification.OptimalLeidenKNNClassifier(TMG.Layers[0])
# train the classifier
optleiden.train(opt_res=11.5, opt_params={'iters':10, 'n_consensus':1})
# use the classifier to create types and add them to TMG using the Taxonomy created on the fly by the classifier
type_vec = optleiden.classify()
TMG.add_type_information(0, type_vec, optleiden.tax)

/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
Number of types: 145 initial entropy: -6.776631426958785 number of evals: 0
CPU times: user 20.7 s, sys: 91.2 ms, total: 20.8 s
Wall time: 20.8 s


### Create `isozone` layer

In [7]:
TMG.create_isozone_layer()
logging.info(f"TMG has {len(TMG.Layers)} Layers")

INFO:root:TMG has 2 Layers


/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad


### Create `region` layer
To create regions, we first create a new classifier that works on local cell environments (local type abundace). Classify cells based on their regions and use this type to create a new layer. 

In [8]:
%%time

n_topics_list = [2,5,10]
n_procs = 3 

topic_cls = Classification.TopicClassifier(TMG.Layers[0])
topic_cls.train(n_topics_list=n_topics_list, n_procs=n_procs)
topics = topic_cls.classify(topic_cls.Env)

INFO:root:Running LDA in parallel with 3 cores


CPU times: user 23.5 s, sys: 1.92 s, total: 25.4 s
Wall time: 3min 2s


In [9]:
%%time
TMG.create_region_layer(topics, topic_cls.tax)
logging.info(f"TMG has {len(TMG.Layers)} Layers")

INFO:root:TMG has 3 Layers


/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/isozone.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1-t3/region.h5ad
CPU times: user 378 ms, sys: 0 ns, total: 378 ms
Wall time: 373 ms


### Save to files
TMG is saved as a config json file, one AnnData file per layer, and one dataframe per taxonomy. 

In [10]:
%%time
TMG.save()

INFO:root:saved


CPU times: user 158 ms, sys: 44.9 ms, total: 203 ms
Wall time: 340 ms
