# Visualize network

The purpose of this notebook is to help visualize the metabolic network via Escher and ensure that the map and model are synchronized with each other.

## Using Escher online:
Brief steps on how to load the RBC metabolic network map using Escher online. 
1. Go to https://escher.github.io/#/ and set both the map and model options to *None*. 
2. Click the *Load map* button to open a blank canvas.
3. Use *(Ctrl+M or Cmd+M)* to load a COBRA model from a `json` file. Alternatively, click on the *Model* tab, then the *Load COBRA model JSON* option to load a new model.
    * Load the model file **"RBC-GEM.json"** from the `/model` directory.
4. Use *(Ctrl+O or Cmd+O)* to load an Escher map from a `json` file. Alternatively, click on the *Map* tab, then the *Load map JSON* option to load a new map.
    * Load the map file **"RBC-GEM.full.map.json"** from the `/map` directory.

## Using Escher via python API (Not recommended currently):
It is currently not recommend to utilize the python API as the current Escher dependencies conflict with recent versions of jupyter. It is therefore up to the user to manage package dependencies to utilize the Python API for Escher. 

The best way to do this is to utilize a seperate virtual environment into prevent dependency conflicts and install Escher. 
1. Install Python 3.9
2. Run the following lines to install packages (order matters!):

    ```
    cd /code # Navigate to code directory where the pyproject.toml file is located.
    pip install markupsafe==2.0.1
    pip install notebook==6.5.6
    pip install escher # Don't worry about any other dependency conflicts
    pip install "." # or ".[all]" for all optional dependencies
    ```

Once dependency conflicts are worked out, this will be updated :) 

## Additional information
See the [documentation for Escher](https://escher.readthedocs.io/en/latest/) for additional details on how to use Escher.


King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. PLoS Comput Biol. 2015 Aug 27;11(8):e1004321. doi: 10.1371/journal.pcbi.1004321. PMID: 26313928; PMCID: PMC4552468.
## Setup
### Import packages

In [1]:
from collections import defaultdict
import pandas as pd
import matplotlib as mpl
import datetime
import json

from rbc_gem_utils import (
    ROOT_PATH,
    EXTERNAL_PATH,
    ANNOTATION_PATH,
    RESULTS_PATH,
    CURATION_PATH,
    MAP_PATH,
    MODEL_PATH,
    GEM_NAME,
    MAP_NAMES,
    read_rbc_model,
    build_string,
    split_string,
    get_annotation_df,
    show_versions,
    explode_column,
)
from rbc_gem_utils.util import ensure_iterable

show_versions()

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd



Package Information
-------------------
rbc-gem-utils 0.0.1

Dependency Information
----------------------
beautifulsoup4                       4.12.3
bio                                   1.6.2
cobra                                0.29.0
depinfo                               2.2.0
kaleido                               0.2.1
matplotlib                            3.8.2
memote                               0.17.0
networkx                              3.2.1
notebook                              6.5.6
openpyxl                              3.1.2
pandas                                2.2.0
pre-commit                            3.6.0
pyvis                                 0.3.2
rbc-gem-utils[database,network,vis] missing
requests                             2.31.0
scipy                                1.12.0
seaborn                              0.13.2

Build Tools Information
-----------------------
pip        23.3.1
setuptools 68.2.2
wheel      0.41.2

Platform Information
-------------------

## Load RBC-GEM model
* Load the XML model to utilize annotations in any data mapping and visualization (guarunteed to have annotations).
* Use the JSON model to check against the model file that gets loaded into Escher.

In [2]:
model = read_rbc_model(filetype="xml")
model

0,1
Name,RBC_GEM
Memory address,7ff308145430
Number of metabolites,1967
Number of reactions,2790
Number of genes,653
Number of groups,74
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"


## Load map JSON

In [3]:
save_figures = True

print(MAP_NAMES)
map_name = "RBC-GEM.full.map"
convert_to = "Escher"
log_level = "OFF"

map_json_filepath = f"{ROOT_PATH}{MAP_PATH}/{map_name}.json"

{'RBC-GEM.full.map'}


### Format data for viewing on map

In [4]:
import escher

escher.rc["never_ask_before_quit"] = True

In [5]:
reaction_data = {}
metabolite_data = {}
gene_data = {}
reaction_scale = []
metabolite_scale = []

#### Example: categorized by subsystems
Nodes are provided a 

In [6]:
df_pathways = pd.read_csv(
    f"{ROOT_PATH}{CURATION_PATH}/subsystems.tsv", sep="\t", dtype=str
)
df_pathways["category"] = df_pathways["category"].replace(
    "Metabolism of other amino acids", "Amino acid metabolism"
)

categories_to_exclude = {"Pseudoreactions", "Model total"}

cmax = 0.8
colors = {
    "Amino acid metabolism": mpl.colors.to_hex(mpl.cm.spring(cmax)),
    "Carbohydrate metabolism": mpl.colors.to_hex(mpl.cm.Greens(cmax)),
    "Lipid metabolism": mpl.colors.to_hex(mpl.cm.Blues(cmax)),
    "Metabolism of cofactors and vitamins": mpl.colors.to_hex(mpl.cm.summer(cmax)),
    "Nucleotide metabolism": mpl.colors.to_hex(mpl.cm.winter(cmax)),
    "Reactive species": mpl.colors.to_hex(mpl.cm.Reds(cmax)),
    "Transport reactions": mpl.colors.to_hex(mpl.cm.Purples(cmax)),
    "Other": mpl.colors.to_hex(mpl.cm.gray_r(cmax)),
}

reaction_scales_mapping = {
    subsystem: {"type": "value", "value": f"{val}", "color": f"{color}"}
    for val, (subsystem, color) in enumerate(colors.items())
}


reaction_data = {}
df_cat_subsystems = df_pathways.groupby("category")["name"].agg(lambda x: list(x))
for category, subsystem_list in df_cat_subsystems.items():
    if category in categories_to_exclude:
        continue
    if (
        category not in reaction_scales_mapping
        and category not in categories_to_exclude
    ):
        category = "Other"

    reaction_data.update(
        {
            reaction.id: reaction_scales_mapping[category]["value"]
            for group in model.groups.get_by_any(subsystem_list)
            for reaction in group.members
        }
    )

reaction_scale = list(reaction_scales_mapping.values())
builder = escher.Builder(
    map_json=map_json_filepath,
    # model=model,
    model_json=f"{ROOT_PATH}{MODEL_PATH}/{GEM_NAME}.json",
)

for attr, value in dict(
    reaction_data=reaction_data,
    metabolite_data=metabolite_data,
    gene_data=gene_data,
    reaction_scale=reaction_scale,
    metabolite_scale=metabolite_scale,
).items():
    if value:
        setattr(builder, attr, value)
if save_figures:
    builder.save_html(
        f"{ROOT_PATH}{RESULTS_PATH}/network/html/{GEM_NAME}_categorized_subsystems.html"
    )

### Export data for web browser

In [7]:
# with open(f"{ROOT_PATH}{INTERIM_PATH}/reaction_map_data.json", "w") as map_datafile:
#     json.dump(reaction_data, map_datafile)

# with open(f"{ROOT_PATH}{INTERIM_PATH}/metabolite_map_data.json", "w") as map_datafile:
#     json.dump(metabolite_data, map_datafile)

## Ensure model and map are synchronized

In [8]:
with open(f"{map_json_filepath}", "r") as mapfile:
    map_json = json.load(mapfile)
    print(f"Loaded map as JSON object: {map_name}\n")

metadata_json = map_json[0]
print(pd.Series(metadata_json))
model_json = map_json[1]

Loaded map as JSON object: RBC-GEM.full.map

map_name                                            RBC-GEM.full.map
map_id                                                  ieNthsfByp9v
map_description                      \nLast Modified Mon Feb 12 2024
homepage                                    https://escher.github.io
schema             https://escher.github.io/escher/jsonschema/1-0-0#
dtype: object


### Check reactions
Use this code to find which reactions are missing from the map. Helpful for identifying any ID conversions or new additions that need to be made.

In [9]:
include_pseudoreactions = False
include_transports = False
map_rxns = {
    reaction_dict["bigg_id"] for reaction_dict in model_json["reactions"].values()
}
model_reactions = model.reactions
if not include_pseudoreactions:
    model_reactions = model_reactions.query(lambda x: not x.boundary)

if not include_transports:
    model_reactions = model_reactions.query(lambda x: len(x.compartments) == 1)

model_reactions = set(model_reactions.list_attr("id"))

found_in_map = model_reactions.intersection(map_rxns)
missing_rxns_from_map = model_reactions.difference(map_rxns)

print(f"Number of biochemical reactions in model: {len(model_reactions)}")
if include_transports:
    print(
        f"Number of transport reactions in map: {len(model_reactions.query(lambda x: len(x.compartments) != 1))}"
    )
print(f"Number of reactions not in map: {len(missing_rxns_from_map)}\n")
missing_rxns_from_map

Number of biochemical reactions in model: 1715
Number of reactions not in map: 2



{'POOL_FA', 'POOL_FACOA'}

### Check metabolites
Use this code to find which metabolites are missing from the map. Helpful for identifying any ID conversions or new additions that need to be made.

In [10]:
compartments = ["c"]

map_mets = set(
    [
        node_dict["bigg_id"]
        for node_dict in model_json["nodes"].values()
        if node_dict["node_type"] == "metabolite"
    ]
)

# missing_from_model = map_mets.difference(model_mets) # Not always accurate due to
if not compartments:
    compartments = model.compartments
compartments = ensure_iterable(compartments)

for comp in compartments:
    model_mets = set(
        model.metabolites.query(lambda x: x.compartment == comp).list_attr("id")
    )
    found_in_map = model_mets.intersection(map_mets)
    missing_mets_from_map = model_mets.difference(map_mets)

    print(
        f"Number of metabolites in model ({model.compartments[comp]}): {len(model_mets)}"
    )
    print(
        f"Number of metabolites in map ({model.compartments[comp]}): {len(found_in_map)}"
    )
    print(
        f"Number of metabolites not in map ({model.compartments[comp]}): {len(missing_mets_from_map)}\n"
    )
missing_mets_from_map

Number of metabolites in model (cytosol): 1560
Number of metabolites in map (cytosol): 1541
Number of metabolites not in map (cytosol): 19



{'23cgamp_c',
 '5mthf_c',
 'admarg__L_c',
 'ag1_c',
 'ca2_c',
 'cl_c',
 'cobalt2_c',
 'ergoth_c',
 'inds_c',
 'k_c',
 'li1_c',
 'mg2_c',
 'mma_c',
 'mmarg__L_c',
 'na1_c',
 'ppp9_c',
 'tetiodthy__L_c',
 'triiodthy__L_c',
 'zn2_c'}

### Update Map
#### Metadata

In [11]:
metadata_json["map_name"] = map_name
metadata_json["map_description"] = datetime.datetime.strftime(
    datetime.date.today(), "\nLast Modified %a %b %d %Y"
)
print(pd.Series(metadata_json))

map_name                                            RBC-GEM.full.map
map_id                                                  ieNthsfByp9v
map_description                      \nLast Modified Mon Feb 12 2024
homepage                                    https://escher.github.io
schema             https://escher.github.io/escher/jsonschema/1-0-0#
dtype: object


#### Map identifier replacement

In [12]:
rxns_id_replacements = {
    # Old ID on Map: New ID for Map
}
mets_id_replacements = {
    # Old ID on Map: New ID for Map
}

# Update identifiers
for reaction_dict in model_json["reactions"].values():
    reaction_dict["bigg_id"] = rxns_id_replacements.get(
        reaction_dict["bigg_id"], reaction_dict["bigg_id"]
    )
    for node_dict in reaction_dict["metabolites"]:
        node_dict["bigg_id"] = mets_id_replacements.get(
            node_dict["bigg_id"], node_dict["bigg_id"]
        )

for node_dict in model_json["nodes"].values():
    if node_dict["node_type"] != "metabolite":
        continue
    node_dict["bigg_id"] = mets_id_replacements.get(
        node_dict["bigg_id"], node_dict["bigg_id"]
    )

In [13]:
with open(f"{map_json_filepath}", "w") as mapfile:
    json.dump(obj=[metadata_json, model_json], fp=mapfile)
    print(f"Saved map as JSON object: {map_name}\n")

Saved map as JSON object: RBC-GEM.full.map



### Utilize EscherConverter to convert map to a standard format
* GitHub Page: https://github.com/draeger-lab/EscherConverter
* Instructions for the Escher converter: https://escher.readthedocs.io/en/stable/escherconverter.html

In [14]:
converter_path = f"{ROOT_PATH}{EXTERNAL_PATH}/EscherConverter-1.2.1.jar"
converted_output_filepath = {
    "SBGN": f"{ROOT_PATH}{MAP_PATH}/{map_name}.converted.sbgn",
    "SBML": f"{ROOT_PATH}{MAP_PATH}/{map_name}.converted.sbml",
    "Escher": f"{ROOT_PATH}{MAP_PATH}/{map_name}.converted.json",
}[convert_to]
!java -jar -Xms8G -Xmx8G -Duser.language=en "$converter_path" --input="$map_json_filepath" --output="$converted_output_filepath" --gui=false --log-level="$log_level"

objc[8881]: Class SheetSupport is implemented in both /Users/zhaiman/opt/github/RBC-GEM/code/notebooks/libquaqua64.jnilib (0x13a9c9258) and /Users/zhaiman/opt/github/RBC-GEM/code/notebooks/libquaqua64.dylib (0x13c458258). One of the two will be used. Which one is undefined.
------------------------------------------------------------
EscherConverter version 1.2.1
Copyright © 2015-2019 University of Tübingen
    Systems Biology Research Group.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome
to redistribute it under certain conditions.
See http://www.opensource.org/licenses/mit-license.php.
------------------------------------------------------------
