TODO: 

2. Check taxonomy class, the constructor __init__ get inputs that are unused. 
3. Verify that we can load and create another notebook for Viz. 
4. Create another classifier for "hybrid" taxonomy
5. Cleanup commented functios in TissueGraph.py (they are mostly related to hybrid taxonomy) 
6. Cleanup dependencies and unused methods from Utils
7. Viz needs to support Geoms as list
8. update Viz code (mostly comments)
9. Finish configuring Sphinx
10. Push to github




# Create and construct a TMG object from multiple slices

This example notebooks performs basic TMG analysis operations. The "input" are the per slice files generated by Processing module. 

In this example, the Taxonomy/Classifiers are not given and are created on the fly based on optimal leiden. 

Overall, the steps in the analysis are: 
1. Load and create an empty TMG object
2. Create a cell layer. This loads, normalizes, creates graphs, etc, but does not classify
3. Unsupervized learning using optimal leiden to create classifier and taxonomy
4. Create iso-zone layer
5. Classify cellular neighborhoods into regions using topic modeling
6. save to file

### Import and create an empty TMG

In [1]:
import igraph
import matplotlib.pyplot as plt 

from dredFISH.Analysis.TissueGraph import *
from dredFISH.Visualization.Viz import *
from dredFISH.Analysis.Classification import *

import importlib
from dredFISH.Analysis import TissueGraph
importlib.reload(TissueGraph)
from dredFISH.Analysis.TissueGraph import *

from dredFISH.Analysis import Classification
importlib.reload(Classification)
from dredFISH.Analysis.Classification import *


from dredFISH.Utils import tmgu
importlib.reload(tmgu)

logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S')

In [2]:
basepth = '/bigstore/GeneralStorage/Data/dredFISH/Dataset1'
!ls -alhtr $basepth
!head $basepth"/TMG.json"

total 81M
-rw-r--r-- 1 zach     wollmanlab  16M Jun  9 14:55 DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_matrix.csv
-rw-r--r-- 1 zach     wollmanlab  20M Jun  9 14:55 DPNMF_PolyA_2021Nov19_Section_-1850X_270Y_metadata.csv
drwxrwxrwx 4 zach     wollmanlab   48 Jun 10 08:52 ..
drwxrwxr-x 2 rwollman wollmanlab 4.0K Jul  6 15:01 .
-rw-r--r-- 1 fangming wollmanlab    3 Jul  6 15:01 Taxonomy_topics.csv
-rw-rw-r-- 1 rwollman wollmanlab 638K Jul  7 09:35 region.h5ad
-rw-rw-r-- 1 rwollman wollmanlab  131 Jul  7  2022 TMG.json
-rw-rw-r-- 1 rwollman wollmanlab  40M Jul  7  2022 cells.h5ad
-rw-rw-r-- 1 rwollman wollmanlab 6.8M Jul  7  2022 isozones.h5ad
-rw-r--r-- 1 fangming wollmanlab    3 Jul  7  2022 Taxonomy_clusters.csv
{"layers_graph": [[0, 1]], "layer_taxonomy_mapping": {"0": 0, "1": 0}, "Taxonomies": ["clusters"], "Layers": ["cells", "isozones"]}

In [3]:
TMG = TissueMultiGraph(basepath=basepth, 
                       redo=True, # create an empty one
                      ) 

### Create a cell layer
Creating a cell layer, load data from file, normalizes and creates an unclassified tissue graph

In [4]:
%%time
TMG.create_cell_layer(metric = 'cosine')
logging.info(f"TMG has {len(TMG.Layers)} Layers")

2022-07-07 11:32:57 INFO     In TMG.create_cell_layer
2022-07-07 11:32:57 INFO     Started reading matrices and metadata
2022-07-07 11:32:58 INFO     done reading files
2022-07-07 11:32:58 INFO     77846 cells, minimum counts = 1872.0
2022-07-07 11:32:58 INFO     building spatial graphs
2022-07-07 11:32:58 INFO     Building spatial graphs for 1 sections


/bigstore/GeneralStorage/Data/dredFISH/Dataset1/cells.h5ad


2022-07-07 11:32:59 INFO     updating anndata
2022-07-07 11:33:00 INFO     done building spatial graph
2022-07-07 11:33:00 INFO     building feature graphs
2022-07-07 11:33:00 INFO     building feature graph using cosine
2022-07-07 11:33:29 INFO     done with create_cell_layer
2022-07-07 11:33:29 INFO     TMG has 1 Layers


CPU times: user 1min 25s, sys: 1min 29s, total: 2min 54s
Wall time: 31.8 s


### Create Geometries

In [5]:
%%time
# TMG.add_geoms()

CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 11.2 µs


### Create OptLeiden classifier and train it using cell layer

In [6]:
%%time
# Create the classifier
optleiden = OptimalLeidenKNNClassifier(TMG.Layers[0])
# train the classifier
optleiden.train(opt_res=11.5, opt_params={'iters':10, 'n_consensus':1})
# use the classifier to create types and add them to TMG using the Taxonomy created on the fly by the classifier
type_vec = optleiden.classify()
TMG.add_type_informations(0, type_vec, optleiden.tax)

/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
Number of types: 148 initial entropy: -6.804796642842886 number of evals: 0
CPU times: user 19.9 s, sys: 80.3 ms, total: 20 s
Wall time: 19.9 s


### Create isozone layer

In [7]:
TMG.create_isozone_layer()
logging.info(f"TMG has {len(TMG.Layers)} Layers")

2022-07-07 11:33:49 INFO     TMG has 2 Layers


/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad


### Create regions layer
To create regions, we first create a new classifier that works on local cell environments (local type abundace). Classify cells based on their regions and use this type to create a new layer. 

In [8]:
%%time
topic_cls = TopicClassifier(TMG.Layers[0])
topic_cls.train(max_num_of_topics = 3)
topics = topic_cls.classify(topic_cls.Env)

2022-07-07 11:34:03 INFO     Running LDA n serial


CPU times: user 2min 46s, sys: 283 ms, total: 2min 46s
Wall time: 2min 46s


In [9]:
%%time
TMG.create_region_layer(topics, topic_cls.tax)
logging.info(f"TMG has {len(TMG.Layers)} Layers")

2022-07-07 11:36:36 INFO     TMG has 3 Layers


/bigstore/GeneralStorage/Data/dredFISH/Dataset1/isozones.h5ad
/bigstore/GeneralStorage/Data/dredFISH/Dataset1/region.h5ad
CPU times: user 364 ms, sys: 0 ns, total: 364 ms
Wall time: 359 ms


### Save to files
TMG is saved as a config json file, one AnnData file per layer, and one dataframe per taxonomy. 

In [10]:
%%time
TMG.save()

2022-07-07 11:36:36 INFO     saved


CPU times: user 185 ms, sys: 0 ns, total: 185 ms
Wall time: 294 ms
