# RBC-GEM 0.1.1 --> 0.2.0
## Update iAB-RBC-283 model 

The purpose of this notebook is to update the iAB-RBC-283 reconstruction for the following purposes:
1. To serve as a base/draft model for the initial expansion of iAB-RBC-283 to RBC-GEM.
2. To serve as a model for comparison to the new reconstruction.
3. To link to the HumanGEM reconstruction annotations where possible in the model.

No stoichiometric changes are made in this minor update, however BIGG identifiers are altered in order to enable better comparision to the RBC-GEM. Furthermore, associations to Recon3D and HumanGEM are made here, as subsequent changes past 0.2.0 will be in an effort to modernize to current COBRA standards before iterative expansion.

RBC-GEM 0.2.0 will remain identified as iAB-RBC-283 in order to help future comparisions.

Bordbar, A., Jamshidi, N. & Palsson, B.O. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC Syst Biol 5, 110 (2011). https://doi.org/10.1186/1752-0509-5-110

## Setup
### Import packages

In [6]:
from collections import Counter, defaultdict

import pandas as pd
import cobra
from cobra.core import Group

from rbc_gem_utils import (
    COBRA_CONFIGURATION, 
    ROOT_PATH, 
    show_versions,
    read_rbc_model, write_rbc_model, 
    read_cobra_model,
)
from rbc_gem_utils.util import split_string, build_string
from rbc_gem_utils.qc import standardardize_metabolite_formulas

# Display versions of last time notebook ran and worked
show_versions()


Package Information
-------------------
rbc-gem-utils 0.0.1

Dependency Information
----------------------
cobra      0.29.0
depinfo     2.2.0
matplotlib  3.8.2
memote     0.16.1
notebook    7.0.6
requests   2.31.0
scipy      1.11.4
seaborn    0.13.0

Build Tools Information
-----------------------
pip        23.3.1
setuptools 68.2.2
wheel      0.41.2

Platform Information
--------------------
Darwin  22.6.0-x86_64
CPython        3.12.0


### Define configuration
#### COBRA Configuration

In [7]:
COBRA_CONFIGURATION

Attribute,Description,Value
solver,Mathematical optimization solver,gurobi
tolerance,"General solver tolerance (feasibility, integrality, etc.)",1e-07
lower_bound,Default reaction lower bound,-1000.0
upper_bound,Default reaction upper bound,1000.0
processes,Number of parallel processes,15
cache_directory,Path for the model cache,/Users/zhaiman/Library/Caches/cobrapy
max_cache_size,Maximum cache size in bytes,104857600
cache_expiration,Model cache expiration time in seconds (if any),


## Load RBC-GEM model
### Version: 0.1.1

In [8]:
model = read_rbc_model('xml')
model

0,1
Name,iAB_RBC_283
Memory address,14a29c890
Number of metabolites,342
Number of reactions,469
Number of genes,346
Number of groups,41
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"


## Extract identifiers from HumanGEM

Use HumanGEM (1.18.0) to get the MetabolicAtlas identifiers for reactions and metabolites.


In [9]:
HumanGEM = read_cobra_model(f"{ROOT_PATH}/data/raw/Human-GEM.xml")
HumanGEM

0,1
Name,HumanGEM
Memory address,14b339ee0
Number of metabolites,8456
Number of reactions,12995
Number of genes,2889
Number of groups,148
Objective expression,1.0*MAR13082 - 1.0*MAR13082_reverse_11d67
Compartments,"Cytosol, Extracellular, Lysosome, Endoplasmic reticulum, Mitochondria, Peroxisome, Golgi apparatus, Nucleus, Inner mitochondria"


### Map model BiGG ID metabolites to MetabolicAtlas via HumanGEM

In [10]:
model_metabolite_ids = {metabolite.id.replace(f"_{metabolite.compartment}", "") for metabolite in model.metabolites}
bigg_metabolites = set()
human_gem_mapping = {}
for metabolite in HumanGEM.metabolites:
    bigg_ids = metabolite.annotation.get("bigg.metabolite")
    if not bigg_ids:
        continue
    if isinstance(bigg_ids, str):
        bigg_ids = [bigg_ids]
    bigg_ids = set(bigg_ids)
    bigg_metabolites.update(bigg_ids)
    for bigg_id in bigg_ids:
        if bigg_id in human_gem_mapping:
            human_gem_mapping[bigg_id] += [metabolite.id[:-1]]
        else:
            human_gem_mapping[bigg_id] = [metabolite.id[:-1]]


print(f"Number of metabolites in RBC-GEM (excluding compartments): {len(model_metabolite_ids)}")

intersection = sorted(bigg_metabolites.intersection(model_metabolite_ids))
print(f"Number of metabolites that could be found: {len(intersection)}")
print()

id_mapping_dict = {}
for rbc_metabolite in intersection:
    metabolite_ids = set(human_gem_mapping.get(rbc_metabolite, []))
    if len(metabolite_ids) > 1:
        print(f'"{rbc_metabolite}": "{build_string(metabolite_ids)}",')
    id_mapping_dict[rbc_metabolite] = build_string(metabolite_ids)

manual_updates_and_corrections = {
    # Selected reactions below needed corrections.
 
}

id_mapping_dict.update(manual_updates_and_corrections)
id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["bigg", "metatlas"]

id_mapping_df.to_csv(
    f"{ROOT_PATH}/data/interim/BiGGMetAtlasMetabolites.tsv",
    sep="\t",
)
id_mapping_df

Number of metabolites in RBC-GEM (excluding compartments): 267
Number of metabolites that could be found: 205



Unnamed: 0,bigg,metatlas
0,13dpg,MAM00247
1,23dpg,MAM00569
2,2kmb,MAM01016
3,2pg,MAM00674
4,35cgmp,MAM01433
...,...,...
200,xmp,MAM03150
201,xu5p__D,MAM01761
202,xylt,MAM03155
203,xylu__D,MAM01759


### Map model BiGG ID reactions to MetabolicAtlas via HumanGEM

In [11]:
model_reaction_ids = {reaction.id for reaction in model.reactions}

bigg_reactions = set()
human_gem_mapping = {}
for reaction in HumanGEM.reactions:
    bigg_ids = reaction.annotation.get("bigg.reaction")
    if not bigg_ids:
        continue
    if isinstance(bigg_ids, str):
        bigg_ids = [bigg_ids]
    bigg_ids = set(bigg_ids)
    bigg_reactions.update(bigg_ids)
    for bigg_id in bigg_ids:
        if bigg_id in human_gem_mapping:
            human_gem_mapping[bigg_id] += [reaction.id]
        else:
            human_gem_mapping[bigg_id] = [reaction.id]


print(f"Number of reactions in RBC-GEM (excluding boundaries): {len(model_reaction_ids)}")

intersection = sorted(bigg_reactions.intersection(model_reaction_ids))
print(f"Number of reactions that could be found: {len(intersection)}")
print()
id_mapping_dict = {}
for rbc_reaction in intersection:
    reaction_ids = human_gem_mapping.get(rbc_reaction, [])
    if len(reaction_ids) > 1:
        print(f"Check {rbc_reaction} for false mappings")
    id_mapping_dict[rbc_reaction] = build_string(reaction_ids)

manual_updates_and_corrections = {
    # Selected reactions below needed corrections.
    "CAATPS": "MAR07629",
    "CAT": "MAR03980",
    "CHLPCTD": "MAR00638",
    "CHOLK": "MAR00636",
    "CHOLt4": "MAR07734",
    "COt": "MAR07798",
    "CYStec": "MAR05084",
    "CYTK1": "MAR04024",
    "EX_ac_e": "MAR09086",
    "EX_adrnl_e": "MAR09095",
    "EX_ala__L_e": "MAR09061",
    "EX_arg__L_e": "MAR09066",
    "EX_chol_e": "MAR09083",
    "EX_cl_e": "MAR09150",
    "EX_dopa_e": "MAR09092",
    "EX_fe2_e": "MAR09076",
    "EX_gal_e": "MAR09140",
    "EX_gam_e": "MAR09168",
    "EX_glc__D_e": "MAR09034",
    "EX_gln__L_e": "MAR09063",
    "EX_h2o_e": "MAR09047",
    "EX_h_e": "MAR09079",
    "EX_hco3_e": "MAR09078",
    "EX_k_e": "MAR09081",
    "EX_lnlc_e": "MAR09035",
    "EX_met__L_e": "MAR09042",
    "EX_nac_e": "MAR09142",
    "EX_nh4_e": "MAR11420",
    "EX_nrpphr_e": "MAR09093",
    "EX_ocdcea_e": "MAR00650",
    "EX_phe__L_e": "MAR09043",
    "EX_pyr_e": "MAR09133",
    "EX_ribflv_e": "MAR09143",
    "EX_thmmp_e": "MAR09105",
    "FBA": "MAR04375",
    "FBP26": "MAR04706",
    "FUM": "MAR04408",
    "G6PDH2r": "MAR08971",
    "GALKr": "MAR04130",
    "GAMt1r": "MAR04996",
    "GAPD": "MAR04373",
    "GLNt4": "MAR05308",
    "MEPIVESSte": "MAR08922",
    "NADK": "MAR04269",
    "NH4t3r": "MAR01534",
    "PPPGO": "MAR11316",
    # Selected reactions needed manaul updates
    "ACP1_FMN": "MAR06507",
    "ADRNLtu": "MAR09192",
    "ARD": "MAR05389",
    "ARGN": "MAR03816",
    "BILIRBU": "MAR11321",
    "C160CPT2rbc": "MAR02626",
    "C181CPT2rbc": "MAR11310",
    "CHLP": "MAR08424",
    "DGULND": "MAR08353",
    "DHAAt1r": "MAR08846",
    "DOPAMT": "MAR06763",
    "DPGM": "MAR04371",
    "DPGase": "MAR04372",
    "ENOPH": "MAR05387",
    "ETHAt": "MAR07896",
    "GALOR": "MAR08766",
    "GALT": "MAR08767",
    "GLCt1": "MAR05029",
    "GMPR": "MAR04419",
    "GPDDA1": "MAR00635",
    "GULND": "MAR06537",
    "HCO3_CLt": "MAR06525",
    "LEUKTRA4t": "MAR06254",
    "LEUKTRB4t": "MAR06255",
    "LNLCCPT2rbc": "MAR02742",
    "LTA4H": "MAR01080",
    "MDRPD": "MAR05386",
    "MI1345PP": "MAR06565",
    "MI145PK": "MAR06563",
    "MI145PP": "MAR06560",
    "MTRI": "MAR05385",
    "NADPN": "MAR07627",
    "NMNHYD": "MAR04264",
    "ORNDC": "MAR04212",
    "RNMK": "MAR04265",
    "SALMCOM": "MAR06750",
    "SALMCOM2": "MAR06746",
    "SBTD_D2": "MAR04315",
    "SBTR": "MAR04316",
    "SPMDtex2": "MAR04994",
    "TDP": "MAR04208",
    "THMTP": "MAR04207",
    "TMDPK": "MAR04204",
    "TMDPPK": "MAR04206",
    "UNK3": "MAR05391",
    "UPPDC1": "MAR04750",
    "XYLK": "MAR04595",
    "XYLTD_D": "MAR04593",
}

id_mapping_dict.update(manual_updates_and_corrections)
id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["bigg", "metatlas"]
# id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]

id_mapping_df.to_csv(
    f"{ROOT_PATH}/data/interim/BiGGMetAtlasReactions.tsv",
    sep="\t",
)
id_mapping_df

Number of reactions in RBC-GEM (excluding boundaries): 469
Number of reactions that could be found: 276

Check CAATPS for false mappings
Check CAT for false mappings
Check CHLPCTD for false mappings
Check CHOLK for false mappings
Check CHOLt4 for false mappings
Check COt for false mappings
Check CYStec for false mappings
Check CYTK1 for false mappings
Check EX_ac_e for false mappings
Check EX_adrnl_e for false mappings
Check EX_ala__L_e for false mappings
Check EX_arg__L_e for false mappings
Check EX_chol_e for false mappings
Check EX_cl_e for false mappings
Check EX_dopa_e for false mappings
Check EX_fe2_e for false mappings
Check EX_gal_e for false mappings
Check EX_gam_e for false mappings
Check EX_glc__D_e for false mappings
Check EX_gln__L_e for false mappings
Check EX_h2o_e for false mappings
Check EX_h_e for false mappings
Check EX_hco3_e for false mappings
Check EX_k_e for false mappings
Check EX_lnlc_e for false mappings
Check EX_met__L_e for false mappings
Check EX_nac_e for 

Unnamed: 0,bigg,metatlas
0,3MOXTYRESSte,MAR11306
1,4PYRDX,MAR08103
2,5AOPt2,MAR11307
3,ACALDt,MAR04948
4,ACGAM2E,MAR04527
...,...,...
317,TMDPPK,MAR04206
318,UNK3,MAR05391
319,UPPDC1,MAR04750
320,XYLK,MAR04595


## Update genes and gene reaction rules
1. Update gene IDs to match RECON3D with inclusion of NCBI Gene ID.
2. Manually update genes without NCBI numbers and genes lumped together.
3. Remove identical genes from `DAGK` gene reaction rule
4. Sort gene reaction components
3. Update gene reaction rules accordingly for new genes and identifiers

#### Generate gene ID mapping

In [12]:
id_mapping_dict = {}
manual_updates = {}
for gene in model.genes:
    ncbigene = gene.annotation.get("ncbigene")
    if ncbigene is None:
        # NCBI Gene ID is not annotated, will need manual updating
        continue
    gname, isonum = gene.id.split("_")
    id_mapping_dict[gene.id] = f"{ncbigene}_{isonum}"

# Annotations mapped in Recon3D but not iAB-RBC-283
id_mapping_dict["ThmtP_AT1"] = "79178_AT1"
id_mapping_dict["Cgi_14_AT1"] = "51005_AT1"
id_mapping_dict["Rhag_AT1"] = "6005_AT1"
id_mapping_dict["Rhbg_AT1"] = "57127_AT1"
id_mapping_dict["Abcc4_AT1"] = "10257_AT1"
id_mapping_dict["Slc2a1_AT1"] = "6513_AT1"
id_mapping_dict["Slc2a2_AT1"] = "6514_AT1"
id_mapping_dict["Slc2a3_AT1"] = "6515_AT1"
id_mapping_dict["Slc2a4_AT1"] = "6517_AT1"
id_mapping_dict["Slc2a5_AT1"] = "6518_AT1"
id_mapping_dict["Slc2a7_AT1"] = "155184_AT1"
id_mapping_dict["Slc2a8_AT1"] = "29988_AT1"
id_mapping_dict["Slc2a11_AT1"] = "66035_AT1"
id_mapping_dict["Slc4a1_AT1"] = "6521_AT1"
id_mapping_dict["Slc5a1_AT1"] = "6523_AT1"
id_mapping_dict["Slc5a3_AT1"] = "6526_AT1"
id_mapping_dict["Slc5a5_AT1"] = "6528_AT1"
id_mapping_dict["Slc12a7_AT1"] = "10723_AT1"
id_mapping_dict["Slc14a1_AT1"] = "6563_AT1"
id_mapping_dict["Slc29a1_AT1"] = "2030_AT1"
id_mapping_dict["Slc29a2_AT1"] = "3177_AT1"
id_mapping_dict["Flj22761_AT1"] = "80201_AT1"

# Pseudogene, does not seem to have annotations mapped 
id_mapping_dict["Gkp3_AT1"] = "2713_AT1"

# Rename genes using ID map, then add lumped genes to ID map and 
cobra.manipulation.modify.rename_genes(model, id_mapping_dict)

# Two seperate genes lumped into one
id_mapping_dict["Gucy1A2B2_AT1"] = "2977_AT1 and 2974_AT1"
id_mapping_dict["Gucy1A2B3_AT1"] = "2977_AT1 and 2983_AT1"
id_mapping_dict["Gucy1A3B3_AT1"] = "2982_AT1 and 2983_AT1"
id_mapping_dict["Gucy1A3B2_AT1"] = "2982_AT1 and 2974_AT1"
id_mapping_dict["NME12_AT1"] = "4830_AT1 and 4831_AT1"
id_mapping_dict["NME12_AT2"] = "4830_AT2 and 4831_AT1"
id_mapping_dict["Mat2ab1"] = "4144_AT1 and 27430_AT1"
id_mapping_dict["Mat2ab2"] = "4144_AT1 and 27430_AT2"
id_mapping_dict["Atp1a1b1"] = "476_AT1 and 481_AT1"
id_mapping_dict["Atp1a1b2"] = "476_AT1 and 482_AT1"
id_mapping_dict["Atp1a1b3"] = "476_AT1 and 483_AT1"
id_mapping_dict["Atp1a1b4"] = "476_AT1 and 23439_AT1"

In [13]:
id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["geneRetired", "genes"]
id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]
id_mapping_df["genes"] = id_mapping_df["genes"].str.split(" and ")
id_mapping_df = id_mapping_df.explode("genes")


previous_id_mapping_df = pd.read_csv(
    f"{ROOT_PATH}/data/deprecatedIdentifiers/replacedGenes.tsv",
    sep="\t",
    index_col=0,
)

for idx, row in id_mapping_df.iterrows():
    new_id, retiring = row[["genes", "geneRetired"]]
    previously_retired = previous_id_mapping_df[previous_id_mapping_df["genes"] == retiring]
    retired_set_of_ids = set([retiring])
    if not previously_retired.empty:
        # Get all previously retired IDs
        retired_set_of_ids.update(previously_retired["geneRetired"].apply(split_string).item())
        # Pulling the ID out of retirement
        if new_id in retired_set_of_ids:
            retired_set_of_ids.remove(new_id)
        retired_set_of_ids.add(retiring)
    id_mapping_df.loc[idx, "geneRetired"] = build_string(retired_set_of_ids, sep=";")


id_mapping_df.to_csv(
    f"{ROOT_PATH}/data/processed/replacedGenes.tsv",
    sep="\t",
)
id_mapping_df

Unnamed: 0,genes,geneRetired
0,54981_AT1,Nrk1_AT1
1,6120_AT2,Rpe_AT2
2,6120_AT1,Rpe_AT1
3,22934_AT1,Rpia_AT1
4,118881_AT1,Comtd1_AT1
...,...,...
343,482_AT1,Atp1a1b2
344,476_AT1,Atp1a1b3
344,483_AT1,Atp1a1b3
345,476_AT1,Atp1a1b4


#### Seperate "Lumped" genes into individual genes
These are reactions containing GPRs that have both "AND" and "OR" in them. 
The "AND" genes are lumped together, likely due to software compability with early COBRA versions that had difficulty parsing rules.

In [14]:
# Reaction GUACYC: Recon3D top, iAB-RBC-283 bottom
# 4881_AT1 or 4882_AT1 or 4882_AT2 or   2984_AT1 or   3000_AT1 or   2986_AT1 or (2977_AT1 and 2974_AT1) or (2977_AT1 and 2983_AT1) or (2982_AT1 and 2974_AT1) or (2982_AT1 and 2983_AT1)
# Npr1_AT1 or Npr2_AT1 or Npr2_AT2 or Gucy2C_AT1 or Gucy2D_AT1 or Gucy2F_AT1 or (    Gucy1A2B2_AT1    ) or (    Gucy1A2B3_AT1    ) or (    Gucy1A3B2_AT1    ) or (    Gucy1A3B3_AT1    )
reaction = model.reactions.get_by_id("GUACYC")
reaction.gene_reaction_rule = "(2974_AT1 and 2977_AT1) or (2974_AT1 and 2982_AT1) or (2977_AT1 and 2983_AT1) or (2982_AT1 and 2983_AT1) or 2984_AT1 or 2986_AT1 or 3000_AT1 or 4881_AT1 or 4882_AT1 or 4882_AT2"

# (4830_AT1 and 4831_AT1) or (4830_AT2 and 4831_AT1) or 4832_AT1 or 10201_AT1 or 29922_AT1 or 29922_AT2
# (      NME12_AT1      ) or (      NME12_AT2      ) or NME3_AT1 or  NME6_AT1 or  NME7_AT1 or  NME7_AT2
for reaction in ["NDPK1", "NDPK2", "NDPK3"]:
    reaction = model.reactions.get_by_id(reaction)
    reaction.gene_reaction_rule = "(4830_AT1 and 4831_AT1) or (4830_AT2 and 4831_AT1) or 4832_AT1 or 10201_AT1 or 29922_AT1 or 29922_AT2"

#  4143_AT1 or (4144_AT1 and 27430_AT1) or (4144_AT1 and 27430_AT2)
# Mat1a_AT1 or (       Mat2ab1        ) or (       Mat2ab2        )
reaction = model.reactions.get_by_id("METAT")
reaction.gene_reaction_rule = "4143_AT1 or (4144_AT1 and 27430_AT1) or (4144_AT1 and 27430_AT2)"

# (476_AT1 and 481_AT1) or (476_AT1 and  482_AT1) or (476_AT1 and 483_AT1) or (476_AT1 and 23439_AT1)
# (     Atp1a1b1      ) or (     Atp1a1b2       ) or (     Atp1a1b3      ) or (     Atp1a1b4      )
reaction = model.reactions.get_by_id("NaKt")
reaction.gene_reaction_rule = "(476_AT1 and 481_AT1) or (476_AT1 and 482_AT1) or (476_AT1 and 483_AT1) or (476_AT1 and 23439_AT1)"

# Remove old lumped genes from model
cobra.manipulation.delete.remove_genes(
    model,
    gene_list=model.genes.query(lambda x: not x.reactions),
    remove_reactions=False
)
model.genes.query(lambda x: not x.reactions)

[]

#### Fix gene reaction rules with duplicate genes
The `DAGK` reactions have an extra gene

In [15]:
# Auto fixed
for reaction in model.reactions.query(lambda x: x.gene_reaction_rule):
    or_split = reaction.gene_reaction_rule.split(" or ")
    if Counter(or_split) != Counter(set(or_split)):
        # Identify extra gene
        extra = Counter(or_split) - Counter(set(or_split))
        if extra:
            print(f"Reaction: {reaction.id}: {extra}")
        for gene, num_extra in extra.items():
            reaction.gene_reaction_rule = reaction.gene_reaction_rule.replace(f"{gene} or ", "", num_extra)

# # DAGK reactions have an extra gene
# # Manual fix
# model.reactions.get_by_id("DAGK_hs_16_0_16_0").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_16_0_18_1").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_16_0_18_2").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_18_1_18_1").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_18_1_18_2").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_18_2_16_0").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"
# model.reactions.get_by_id("DAGK_hs_18_2_18_1").gene_reaction_rule = "1606_AT1 or 1607_AT1 or 1607_AT2 or 1608_AT1 or 1609_AT1 or 8525_AT1 or 8525_AT2 or 8525_AT3 or 8526_AT1 or 8527_AT1 or 8527_AT2 or 9162_AT1 or 160851_AT1 or 160851_AT2"

Reaction: DAGK_hs_16_0_16_0: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_16_0_18_1: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_16_0_18_2: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_18_1_18_1: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_18_1_18_2: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_18_2_16_0: Counter({'1607_AT1': 1})
Reaction: DAGK_hs_18_2_18_1: Counter({'1607_AT1': 1})


#### Sort GPRs in reactions

In [16]:
or_gpr_reactions = [
    r for r in model.reactions 
    if "or" in r.gene_reaction_rule and not "and" in r.gene_reaction_rule]

and_gpr_reactions = [
    r for r in model.reactions 
    if "and" in r.gene_reaction_rule and not "or" in r.gene_reaction_rule]

for join_str, reactions in zip([" or ", " and "], [or_gpr_reactions, and_gpr_reactions]):
    for r in reactions:
        gpr_dict = defaultdict(list)
        for i in r.gene_reaction_rule.split(join_str):
            gpr_dict[int(i.split("_")[0])] += [i.split("_")[1]]
        genes = r.genes
        r.gene_reaction_rule = join_str.join([f"{key}_{v}" for key in sorted(gpr_dict) for v in sorted(gpr_dict[key])])
        try:
            assert genes == r.genes
        except AssertionError:
            raise AssertionError(f"A change occured in the genes for reaction {r.id} that should not have occured.")

## Update metabolites
1. Update metabolites to have the corresponding MetabolicAtlas annotation
2. Update BiGG identifiers and retire old ones

### Update metabolite annotations

In [17]:
id_mapping_df = pd.read_csv(
    f"{ROOT_PATH}/data/interim/BiGGMetAtlasMetabolites.tsv",
    sep="\t",
    index_col=0,
)
for _, row in id_mapping_df.iterrows():
    bigg_id, metatlas = row[["bigg", "metatlas"]]
    for model_metabolite in model.metabolites.query(lambda x: x.id.replace(f"_{x.compartment}", "") == bigg_id):
        model_metabolite.annotation["metatlas"] = f"{metatlas}{model_metabolite.compartment}"
    

### Update metabolite identifiers

In [18]:
id_mapping_dict = {
    # Add L/D stereochemistry to ID
    "adrnl": "adrnl__L",
    'ahcys': 'ahcys__L',
    'amet': 'amet__L',
    'dhdascb': 'dhdascb__L',
    "mepi": "mepi__L",
    'nrpphr': 'nrpphr__L',
    'orn': 'orn__L',

     # ID change to help distinguish L-dopa (3,4-dihydroxyphenylalanine) and L-dopamine, D-Dopa and D-dopamine
    'dopa': 'dpam__L',
     # ID change to help distinguish the 'gamma' in gamma-L-glutamyl-L-cysteine
    'glucys': 'gglucys__L',
    # ID change to make it clear that the species is (R)-S-lactoylglutathione, with (R) stereoconfirmation and S-bond
    'lgt__S': 'slgth__R',
    # ID change to make it clear that the species is UDP bound to a D-glucose
    'udpg': 'udpglc__D',
    # Increased name specificity for cyclic nucleotides
    'camp': '35camp',
    
    # Lipid ID changes to help distinguish stereochemistry on species and without use of common names.
    # Fatty Acids and Conjugates [FA01]	(FA)
    'hdca': 'FA_hs_16_0',
    'ocdcea': 'FA_hs_18_9Z',
    'lnlc': 'FA_hs_18_9Z12Z',

    # Fatty acyl CoAs [FA0705] (CoA)
    'pmtcoa': 'FAcoa_hs_16_0',
    'odecoa': 'FAcoa_hs_18_9Z',
    'lnlccoa': 'FAcoa_hs_18_9Z12Z',

    # Fatty acyl carnitines [FA0707] (CAR)
    'pmtcrn': 'FAcrn_hs_16_0',
    'odecrn': 'FAcrn_hs_18_9Z',
    'lnlccrn': 'FAcrn_hs_18_9Z12Z',

    # Diacylglycerols [GL0201] (DG)
    'dag_hs_16_0_18_1': 'dag_hs_16_0_18_9Z',
    'dag_hs_16_0_18_2': 'dag_hs_16_0_18_9Z12Z',
    'dag_hs_18_1_18_1': 'dag_hs_18_9Z_18_9Z',
    'dag_hs_18_1_18_2': 'dag_hs_18_9Z_18_9Z12Z',
    'dag_hs_18_2_16_0': 'dag_hs_18_9Z12Z_16_0',
    'dag_hs_18_2_18_1': 'dag_hs_18_9Z12Z_18_9Z',

    # Diacylglycerophosphates [GP1001] (PA)
    'pa_hs_16_0_18_1': 'pa_hs_16_0_18_9Z',
    'pa_hs_16_0_18_2': 'pa_hs_16_0_18_9Z12Z',
    'pa_hs_18_1_18_1': 'pa_hs_18_9Z_18_9Z',
    'pa_hs_18_1_18_2': 'pa_hs_18_9Z_18_9Z12Z',
    'pa_hs_18_2_16_0': 'pa_hs_18_9Z12Z_16_0',
    'pa_hs_18_2_18_1': 'pa_hs_18_9Z12Z_18_9Z',
    # Lysophospholipids	Prefix L
    'alpa_hs_16_0': 'lpa_hs_16_0',
    'alpa_hs_18_1': 'lpa_hs_18_9Z',
    'alpa_hs_18_2': 'lpa_hs_18_9Z12Z',

    # Diacylglycerophosphocholines [GP0101] PC
    'pchol_hs_16_0_16_0': 'pc_hs_16_0_16_0',
    'pchol_hs_16_0_18_1': 'pc_hs_16_0_18_9Z',
    'pchol_hs_16_0_18_2': 'pc_hs_16_0_18_9Z12Z',
    'pchol_hs_18_1_18_1': 'pc_hs_18_9Z_18_9Z',
    'pchol_hs_18_1_18_2': 'pc_hs_18_9Z_18_9Z12Z',
    'pchol_hs_18_2_16_0': 'pc_hs_18_9Z12Z_16_0',
    'pchol_hs_18_2_18_1': 'pc_hs_18_9Z12Z_18_9Z',
    # Lysophospholipids	Prefix L
    'lpchol_hs_16_0': 'lpc_hs_16_0',
    'lpchol_hs_18_1': 'lpc_hs_18_9Z',
    'lpchol_hs_18_2': 'lpc_hs_18_9Z12Z',

    # Diacylglycerophosphoethanolamines [GP0201] PE
    'pe_hs_16_0_18_1': 'pe_hs_16_0_18_9Z',
    'pe_hs_16_0_18_2': 'pe_hs_16_0_18_9Z12Z',
    'pe_hs_18_1_18_1': 'pe_hs_18_9Z_18_9Z',
    'pe_hs_18_1_18_2': 'pe_hs_18_9Z_18_9Z12Z',
    'pe_hs_18_2_16_0': 'pe_hs_18_9Z12Z_16_0',
    'pe_hs_18_2_18_1': 'pe_hs_18_9Z12Z_18_9Z',

    # Diacylglycerophosphoinositols [GP0601] PI
    'pail_hs_16_0_18_1': 'pail_hs_16_0_18_9Z',
    'pail_hs_16_0_18_2': 'pail_hs_16_0_18_9Z12Z',
    'pail_hs_18_1_18_1': 'pail_hs_18_9Z_18_9Z',
    'pail_hs_18_1_18_2': 'pail_hs_18_9Z_18_9Z12Z',
    'pail_hs_18_2_16_0': 'pail_hs_18_9Z12Z_16_0',
    'pail_hs_18_2_18_1': 'pail_hs_18_9Z12Z_18_9Z',
    # Diacylglycerophosphoinositol monophosphates [GP0701] PIP[4’]
    'pail4p_hs_16_0_18_1': 'pail4p_hs_16_0_18_9Z',
    'pail4p_hs_16_0_18_2': 'pail4p_hs_16_0_18_9Z12Z',
    'pail4p_hs_18_1_18_1': 'pail4p_hs_18_9Z_18_9Z',
    'pail4p_hs_18_1_18_2': 'pail4p_hs_18_9Z_18_9Z12Z',
    'pail4p_hs_18_2_16_0': 'pail4p_hs_18_9Z12Z_16_0',
    'pail4p_hs_18_2_18_1': 'pail4p_hs_18_9Z12Z_18_9Z',
    # Diacylglycerophosphoinositol bisphosphates [GP0801] PIP2[4’,5’]
    'pail45p_hs_16_0_18_1': 'pail45p_hs_16_0_18_9Z',
    'pail45p_hs_16_0_18_2': 'pail45p_hs_16_0_18_9Z12Z',
    'pail45p_hs_18_1_18_1': 'pail45p_hs_18_9Z_18_9Z',
    'pail45p_hs_18_1_18_2': 'pail45p_hs_18_9Z_18_9Z12Z',
    'pail45p_hs_18_2_16_0': 'pail45p_hs_18_9Z12Z_16_0',
    'pail45p_hs_18_2_18_1': 'pail45p_hs_18_9Z12Z_18_9Z',

    # CDP-diacylglycerols [GP1301] CDP-DG
    'cdpdag_hs_16_0_18_1': 'cdpdag_hs_16_0_18_9Z',
    'cdpdag_hs_16_0_18_2': 'cdpdag_hs_16_0_18_9Z12Z',
    'cdpdag_hs_18_1_18_1': 'cdpdag_hs_18_9Z_18_9Z',
    'cdpdag_hs_18_1_18_2': 'cdpdag_hs_18_9Z_18_9Z12Z',
    'cdpdag_hs_18_2_16_0': 'cdpdag_hs_18_9Z12Z_16_0',
    'cdpdag_hs_18_2_18_1': 'cdpdag_hs_18_9Z12Z_18_9Z',

}
for old_met, new_met in id_mapping_dict.items():
    for model_metabolite in model.metabolites.query(lambda x: x.id.replace(f"_{x.compartment}", "") == old_met):
        model_metabolite.id = f"{new_met}_{model_metabolite.compartment}"
model.repair()

In [19]:
id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["metRetired", "mets"]
id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]


for idx, row in id_mapping_df.iterrows():
    new_id, retiring = row[["mets", "metRetired"]]
    retired_set_of_ids = set([retiring])
    id_mapping_df.loc[idx, "metRetired"] = build_string(retired_set_of_ids, sep=";")

id_mapping_df.to_csv(
    f"{ROOT_PATH}/data/processed/replacedMetabolites.tsv",
    sep="\t",
)
id_mapping_df

Unnamed: 0,mets,metRetired
0,adrnl__L,adrnl
1,ahcys__L,ahcys
2,amet__L,amet
3,dhdascb__L,dhdascb
4,mepi__L,mepi
...,...,...
71,cdpdag_hs_16_0_18_9Z12Z,cdpdag_hs_16_0_18_2
72,cdpdag_hs_18_9Z_18_9Z,cdpdag_hs_18_1_18_1
73,cdpdag_hs_18_9Z_18_9Z12Z,cdpdag_hs_18_1_18_2
74,cdpdag_hs_18_9Z12Z_16_0,cdpdag_hs_18_2_16_0


## Update reactions
1. Update reactions to have the corresponding MetabolicAtlas annotation
2. Change the NADH demand reaction annoation to prevent false identification as a boundary reaction.
3. Update BiGG identifiers and retire old ones
### Update reaction annotations

In [20]:
id_mapping_df = pd.read_csv(
    f"{ROOT_PATH}/data/interim/BiGGMetAtlasReactions.tsv",
    sep="\t",
    index_col=0,
)
for _, row in id_mapping_df.iterrows():
    bigg_id, metatlas = row[["bigg", "metatlas"]]
    reaction = model.reactions.get_by_id(bigg_id)
    reaction.annotation["metatlas"] = metatlas

reaction = model.reactions.get_by_id("DM_nadh")
reaction.annotation["sbo"] = "SBO:0000176"

### Update reaction identifiers

In [21]:
id_mapping_dict = {

    # Change NADH demand to NADHload
    'DM_nadh': 'NADHload',
    # Updated to reflect stoichiometry of 2:1
    "CAATPS": "CAATPS2",
    # Addition of `x` or `y` to indiciate NADH and NADPH cofactors
    'BILIRED': 'BILIREDy',
    'GTHOr': 'GTHOy',
    "XYLTD_D": "XYLTD_Dx",
    "XYLUR": "XYLUR_Ly",

    # Updated for increased specificity
    'ACP1_FMN': 'FMNPH',
    'DOPAMT': 'LDPAMMT',
    'GGLUCT': 'GGLUCTC',
    'MI1345PP': 'MI1345P5PP',
    'MI145PK': 'MI145P3K',
    'MI145PP': 'MI145P5PP',
    'NP1': 'PNP2',
    'NT5C': 'NICHYD',
    'PDE1': 'PDEA',
    'UMPK': 'UMPK1',
    'UPP3S': 'UPPG3S',
    'UPPDC1': 'UPPG3DC',
    
    '4PYRDX': '4PYRDXABCte',
    'CAMPt': 'CAMPABCte',
    'CGMPt': 'CGMPABCte',
    'GTHOXti2': 'GTHOXABCte',
    'RIBFLVt3o': 'RIBFLVABCte',
    'HCO3_CLt': 'HCO3_Cltex',
    
    'ADRNLtu': 'ADRNLt',
    'ACt2r': 'ACt2',
    'GLYt7_211_r': 'GLY_Cl_2Nat',
    'L_LACt2r': 'L_LACt2',
    'NCAMUP': 'NCAMt',
    'NRPPHRtu': 'NRPPHRt',
    'DOPAtu': 'LDPAMt',

    # Removal of incorrect/unclear suffixes
    'AHCi': 'AHC',
    'G6PDH2r': 'G6PDH2',
    'GALKr': 'GALK',
    'GALUi': 'GALU',
    'GTHPi': 'GTHP',
    'ICDHyr': 'ICDHy',
    'NNATr': 'NNAT',
    'PDX5POi': 'PDX5PO',

    'ARGt5r': 'ARGtec',
    'SPRMt2i': 'SPRMt2',
    'THMtrbc': 'THMt',

    
    # Changed for preparation of additional lipids
    'C160CPT1': 'CRNAT_16_0',
    'C160CPT2rbc': 'CRNAT_16_0rbc',
    'C181CPT1': 'CRNAT_18_9Z',
    'C181CPT2rbc': 'CRNAT_18_9Zrbc',
    'LNLCCPT1': 'CRNAT_18_9Z12Z',
    'LNLCCPT2rbc': 'CRNAT_18_9Z12Zrbc',
    
    'FACOAL160': 'FACOAL_16_0',
    'FACOAL181': 'FACOAL_18_9Z',
    'FACOAL1821': 'FACOAL_18_9Z12Z',
    
    'HDCAt': 'FAt_16_0',
    'OCDCEAt': 'FAt_18_9Z',
    'LNLCt': 'FAt_18_9Z12Z',


    # Lysophospatidate (LPA) acyl transferase
    'AGPAT1_16_0_16_0': 'LPAAT_16_0_16_0',
    'AGPAT1_16_0_18_1': 'LPAAT_16_0_18_9Z',
    'AGPAT1_16_0_18_2': 'LPAAT_16_0_18_9Z12Z',
    'AGPAT1_18_1_18_1': 'LPAAT_18_9Z_18_9Z',
    'AGPAT1_18_1_18_2': 'LPAAT_18_9Z_18_9Z12Z',
    'AGPAT1_18_2_16_0': 'LPAAT_18_9Z12Z_16_0',
    'AGPAT1_18_2_18_1': 'LPAAT_18_9Z12Z_18_9Z',

    # Removal of "r"
    'CDIPTr_16_0_16_0': 'CDIPT_16_0_16_0',
    'CDIPTr_16_0_18_1': 'CDIPT_16_0_18_9Z',
    'CDIPTr_16_0_18_2': 'CDIPT_16_0_18_9Z12Z',
    'CDIPTr_18_1_18_1': 'CDIPT_18_9Z_18_9Z',
    'CDIPTr_18_1_18_2': 'CDIPT_18_9Z_18_9Z12Z',
    'CDIPTr_18_2_16_0': 'CDIPT_18_9Z12Z_16_0',
    'CDIPTr_18_2_18_1': 'CDIPT_18_9Z12Z_18_9Z',

    'CDS_16_0_18_1': 'CDS_16_0_18_9Z',
    'CDS_16_0_18_2': 'CDS_16_0_18_9Z12Z',
    'CDS_18_1_18_1': 'CDS_18_9Z_18_9Z',
    'CDS_18_1_18_2': 'CDS_18_9Z_18_9Z12Z',
    'CDS_18_2_16_0': 'CDS_18_9Z12Z_16_0',
    'CDS_18_2_18_1': 'CDS_18_9Z12Z_18_9Z',

    'CEPTC_16_0_18_1': 'CEPTC_16_0_18_9Z',
    'CEPTC_16_0_18_2': 'CEPTC_16_0_18_9Z12Z',
    'CEPTC_18_1_18_1': 'CEPTC_18_9Z_18_9Z',
    'CEPTC_18_1_18_2': 'CEPTC_18_9Z_18_9Z12Z',
    'CEPTC_18_2_16_0': 'CEPTC_18_9Z12Z_16_0',
    'CEPTC_18_2_18_1': 'CEPTC_18_9Z12Z_18_9Z',
    
    'CEPTE_16_0_18_1': 'CEPTE_16_0_18_9Z',
    'CEPTE_16_0_18_2': 'CEPTE_16_0_18_9Z12Z',
    'CEPTE_18_1_18_1': 'CEPTE_18_9Z_18_9Z',
    'CEPTE_18_1_18_2': 'CEPTE_18_9Z_18_9Z12Z',
    'CEPTE_18_2_16_0': 'CEPTE_18_9Z12Z_16_0',
    'CEPTE_18_2_18_1': 'CEPTE_18_9Z12Z_18_9Z',

    'DAGK_hs_16_0_16_0': 'DAGK_16_0_16_0',
    'DAGK_hs_16_0_18_1': 'DAGK_16_0_18_9Z',
    'DAGK_hs_16_0_18_2': 'DAGK_16_0_18_9Z12Z',
    'DAGK_hs_18_1_18_1': 'DAGK_18_9Z_18_9Z',
    'DAGK_hs_18_1_18_2': 'DAGK_18_9Z_18_9Z12Z',
    'DAGK_hs_18_2_16_0': 'DAGK_18_9Z12Z_16_0',
    'DAGK_hs_18_2_18_1': 'DAGK_18_9Z12Z_18_9Z',

    # Glycerol-3-phosphate acyltransferase (GPAT)
    'GPAM_hs_16_0': 'GPAT_16_0',
    'GPAM_hs_18_1': 'GPAT_18_9Z',
    'GPAM_hs_18_2': 'GPAT_18_9Z12Z',

    # Phosphpolipase B acting on the LPC 
    'LPASE_16_0': 'LPCLPLB_16_0',
    'LPASE_18_1': 'LPCLPLB_18_9Z',
    'LPASE_18_2': 'LPCLPLB_18_9Z12Z',

    'PI45P5P_16_0_18_1': 'PI45P5P_16_0_18_9Z',
    'PI45P5P_16_0_18_2': 'PI45P5P_16_0_18_9Z12Z',
    'PI45P5P_18_1_18_1': 'PI45P5P_18_9Z_18_9Z',
    'PI45P5P_18_1_18_2': 'PI45P5P_18_9Z_18_9Z12Z',
    'PI45P5P_18_2_16_0': 'PI45P5P_18_9Z12Z_16_0',
    'PI45P5P_18_2_18_1': 'PI45P5P_18_9Z12Z_18_9Z',
    
    'PI45PLC_16_0_18_1': 'PI45PLC_16_0_18_9Z',
    'PI45PLC_16_0_18_2': 'PI45PLC_16_0_18_9Z12Z',
    'PI45PLC_18_1_18_1': 'PI45PLC_18_9Z_18_9Z',
    'PI45PLC_18_1_18_2': 'PI45PLC_18_9Z_18_9Z12Z',
    'PI45PLC_18_2_16_0': 'PI45PLC_18_9Z12Z_16_0',
    'PI45PLC_18_2_18_1': 'PI45PLC_18_9Z12Z_18_9Z',
    
    'PI4P5K_16_0_18_1': 'PI4P5K_16_0_18_9Z',
    'PI4P5K_16_0_18_2': 'PI4P5K_16_0_18_9Z12Z',
    'PI4P5K_18_1_18_1': 'PI4P5K_18_9Z_18_9Z',
    'PI4P5K_18_1_18_2': 'PI4P5K_18_9Z_18_9Z12Z',
    'PI4P5K_18_2_16_0': 'PI4P5K_18_9Z12Z_16_0',
    'PI4P5K_18_2_18_1': 'PI4P5K_18_9Z12Z_18_9Z',
    
    'PI4PLC_16_0_18_1': 'PI4PLC_16_0_18_9Z',
    'PI4PLC_16_0_18_2': 'PI4PLC_16_0_18_9Z12Z',
    'PI4PLC_18_1_18_1': 'PI4PLC_18_9Z_18_9Z',
    'PI4PLC_18_1_18_2': 'PI4PLC_18_9Z_18_9Z12Z',
    'PI4PLC_18_2_16_0': 'PI4PLC_18_9Z12Z_16_0',
    'PI4PLC_18_2_18_1': 'PI4PLC_18_9Z12Z_18_9Z',
    
    'PI4PP_16_0_18_1': 'PI4PP_16_0_18_9Z',
    'PI4PP_16_0_18_2': 'PI4PP_16_0_18_9Z12Z',
    'PI4PP_18_1_18_1': 'PI4PP_18_9Z_18_9Z',
    'PI4PP_18_1_18_2': 'PI4PP_18_9Z_18_9Z12Z',
    'PI4PP_18_2_16_0': 'PI4PP_18_9Z12Z_16_0',
    'PI4PP_18_2_18_1': 'PI4PP_18_9Z12Z_18_9Z',

    # PI 4-kinase
    'PIK4_16_0_16_0': 'PI4K_16_0_16_0',
    'PIK4_16_0_18_1': 'PI4K_16_0_18_9Z',
    'PIK4_16_0_18_2': 'PI4K_16_0_18_9Z12Z',
    'PIK4_18_1_18_1': 'PI4K_18_9Z_18_9Z',
    'PIK4_18_1_18_2': 'PI4K_18_9Z_18_9Z12Z',
    'PIK4_18_2_16_0': 'PI4K_18_9Z12Z_16_0',
    'PIK4_18_2_18_1': 'PI4K_18_9Z12Z_18_9Z',
    
    'PIPLC_16_0_18_1': 'PIPLC_16_0_18_9Z',
    'PIPLC_16_0_18_2': 'PIPLC_16_0_18_9Z12Z',
    'PIPLC_18_1_18_1': 'PIPLC_18_9Z_18_9Z',
    'PIPLC_18_1_18_2': 'PIPLC_18_9Z_18_9Z12Z',
    'PIPLC_18_2_16_0': 'PIPLC_18_9Z12Z_16_0',
    'PIPLC_18_2_18_1': 'PIPLC_18_9Z12Z_18_9Z',
    # PC Phospholipase A2
    'PLA2_2_16_0_16_0': 'PCPLA2_16_0_16_0',
    'PLA2_2_16_0_18_1': 'PCPLA2_16_0_18_9Z',
    'PLA2_2_16_0_18_2': 'PCPLA2_16_0_18_9Z12Z',
    'PLA2_2_18_1_18_1': 'PCPLA2_18_9Z_18_9Z',
    'PLA2_2_18_1_18_2': 'PCPLA2_18_9Z_18_9Z12Z',
    'PLA2_2_18_2_16_0': 'PCPLA2_18_9Z12Z_16_0',
    'PLA2_2_18_2_18_1': 'PCPLA2_18_9Z12Z_18_9Z',
    # Phosphatidate (PA) phosphatase 
    'PPAP_16_0_16_0': 'PAPP_16_0_16_0',
    'PPAP_16_0_18_1': 'PAPP_16_0_18_9Z',
    'PPAP_16_0_18_2': 'PAPP_16_0_18_9Z12Z',
    'PPAP_18_1_18_1': 'PAPP_18_9Z_18_9Z',
    'PPAP_18_1_18_2': 'PAPP_18_9Z_18_9Z12Z',
    'PPAP_18_2_16_0': 'PAPP_18_9Z12Z_16_0',
    'PPAP_18_2_18_1': 'PAPP_18_9Z12Z_18_9Z',
    # To match metabolite changes
    'SK_pchol_hs_16_0_16_0_c': 'SK_pc_hs_16_0_16_0_c',
    'SK_pchol_hs_16_0_18_1_c': 'SK_pc_hs_16_0_18_9Z_c',
    'SK_pchol_hs_16_0_18_2_c': 'SK_pc_hs_16_0_18_9Z12Z_c',
    'SK_pchol_hs_18_1_18_1_c': 'SK_pc_hs_18_9Z_18_9Z_c',
    'SK_pchol_hs_18_1_18_2_c': 'SK_pc_hs_18_9Z_18_9Z12Z_c',
    'SK_pchol_hs_18_2_16_0_c': 'SK_pc_hs_18_9Z12Z_16_0_c',
    'SK_pchol_hs_18_2_18_1_c': 'SK_pc_hs_18_9Z12Z_18_9Z_c',
    'SK_pe_hs_16_0_16_0_c': 'SK_pe_hs_16_0_16_0_c',
    'SK_pe_hs_16_0_18_1_c': 'SK_pe_hs_16_0_18_9Z_c',
    'SK_pe_hs_16_0_18_2_c': 'SK_pe_hs_16_0_18_9Z12Z_c',
    'SK_pe_hs_18_1_18_1_c': 'SK_pe_hs_18_9Z_18_9Z_c',
    'SK_pe_hs_18_1_18_2_c': 'SK_pe_hs_18_9Z_18_9Z12Z_c',
    'SK_pe_hs_18_2_16_0_c': 'SK_pe_hs_18_9Z12Z_16_0_c',
    'SK_pe_hs_18_2_18_1_c': 'SK_pe_hs_18_9Z12Z_18_9Z_c',
    # Exchanges
    'EX_adrnl_e': 'EX_adrnl__L_e',
    'EX_camp_e': 'EX_35camp_e',
    'EX_dhdascb_e': 'EX_dhdascb__L_e',
    'EX_dopa_e': 'EX_dpam__L_e',
    'EX_nrpphr_e': 'EX_nrpphr__L_e',
    'EX_mepi_e': 'EX_mepi__L_e',
    'EX_hdca_e': 'EX_FA_hs_16_0_e',
    'EX_ocdcea_e': 'EX_FA_hs_18_9Z_e',
    'EX_lnlc_e': 'EX_FA_hs_18_9Z12Z_e',
    # To match naming conventions, intracellular demands are irreversible while sinks are reversible.
    'SK_adprbp_c': 'DM_adprbp_c',
    'SK_mi1345p_c': 'DM_mi1345p_c',
    'SK_mi134p_c': 'DM_mi134p_c',
    'SK_mi145p_c': 'DM_mi145p_c',
    'SK_mi14p_c': 'DM_mi14p_c',
}

for old_rxn, new_rxn in id_mapping_dict.items():
    for model_reaction in model.reactions.query(lambda x: x.id == old_rxn):
        model_reaction.id = new_rxn
model.repair()

In [22]:
id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["rxnRetired", "rxns"]
id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]

previous_id_mapping_df = pd.read_csv(
    f"{ROOT_PATH}/data/deprecatedIdentifiers/replacedReactions.tsv",
    sep="\t",
    index_col=0,
)

for idx, row in id_mapping_df.iterrows():
    new_id, retiring = row[["rxns", "rxnRetired"]]
    previously_retired = previous_id_mapping_df[previous_id_mapping_df["rxns"] == retiring]
    retired_set_of_ids = set([retiring])
    if not previously_retired.empty:
        # Get all previously retired IDs
        retired_set_of_ids.update(previously_retired["rxnRetired"].apply(split_string).item())
        # Pulling the ID out of retirement
        if new_id in retired_set_of_ids:
            retired_set_of_ids.remove(new_id)
        retired_set_of_ids.add(retiring)
    id_mapping_df.loc[idx, "rxnRetired"] = build_string(retired_set_of_ids, sep=";")


id_mapping_df.to_csv(
    f"{ROOT_PATH}/data/processed/replacedReactions.tsv",
    sep="\t",
)
id_mapping_df

Unnamed: 0,rxns,rxnRetired
0,NADHload,DM_nadh
1,CAATPS2,CAATPS
2,BILIREDy,BILIRED
3,GTHOy,GTHOr
4,XYLTD_Dx,XYLTD_D
...,...,...
179,DM_adprbp_c,SK_adprbp_c
180,DM_mi1345p_c,SK_mi1345p_c
181,DM_mi134p_c,SK_mi134p_c
182,DM_mi145p_c,SK_mi145p_c


#### Update boundary identifiers

In [23]:
df_bounds = pd.read_csv(
    f"{ROOT_PATH}/data/parameterization/iAB-RBC-283_Bordbar2011.tsv",
    sep="\t",
    index_col=0,
).fillna("")

for idx, id_str in enumerate(df_bounds["ID"].values):
    if id_str in id_mapping_dict:
        df_bounds.loc[idx, "ID"] = id_mapping_dict[id_str]

df_bounds.to_csv(
    f"{ROOT_PATH}/data/processed/iAB-RBC-283_Bordbar2011.tsv",
    sep="\t",
)
df_bounds

Unnamed: 0,ID,Name,Type,Lower Bound,Upper Bound,Unit,References,Notes
0,EX_5aop_e,5 Amino 4 oxopentanoate exchange,exchange,-0.02,0.0,mmol / (hr * L cells),PMID:900140,
1,EX_acald_e,Acetaldehyde exchange,exchange,-10.0,10.0,mmol / (hr * L cells),PMID:28460,
2,EX_acnam_e,N-Acetylneuraminate exchange,exchange,-3.7e-05,0.0,mmol / (hr * L cells),PMID:12765793,
3,EX_ade_e,Adenine exchange,exchange,-0.014,0.01,mmol / (hr * L cells),PMID:2141093,
4,EX_adn_e,Adenosine exchange,exchange,-0.01,0.014,mmol / (hr * L cells),PMID:2141093,
5,EX_adrnl__L_e,Adrenaline exchange,exchange,-0.0378,0.0,mmol / (hr * L cells),PMID:8737076,
6,EX_arg__L_e,L-Arginine exchange,exchange,-0.1152,0.0,mmol / (hr * L cells),"PMID:6817948, PMID:11950212",
7,EX_chol_e,Choline exchange,exchange,-0.012,0.0,mmol / (hr * L cells),PMID:5651769,
8,EX_cys__L_e,L-Cysteine exchange,exchange,-0.08,0.0,mmol / (hr * L cells),PMID:6109287,
9,EX_dhdascb__L_e,Dehydroascorbate exchange,exchange,-0.1111,0.0,mmol / (hr * L cells),"PMID:9586809, PMID:6708818",


#### Subsystem updates

All subsystems are updated to the corresponding group in HumanGEM.

Notable exceptions are as follows:
* "ROS detoxification" is now "Reactive species formation and detoxification"
* "Urea Cycle" is combined with "Arginine and proline metabolism"
* "Oxidative Phosphorylation" contains only `PPA` reaction, which is also the only cytosolic reaction in the HumanGEM subsystem. As reaction could be placed into a variety of categories, it was placed in the "Nucleotide metabolism" subsystem in which inorganic diphosphate is generated.
* Subsystems that contains "Biosynthesis" were replaced with the more generic term "metabolism", better reflecting RBCs. Includes the following:
    * "Phenylalanine, tyrosine and tryptophan biosynthesis"
* "Tyrosine metabolism" was grouped into "Phenylalanine, tyrosine and tryptophan metabolism".
* "Eicosanoid Metabolism contains only `LTA4H` reaction, which becomes part of the `Leukotriene` subsystem in HumanGEM.
* 'Intracellular source/sink' Extracellular exchange are grouped under Pseudoreactions rather than of `Exchange/demand reactions`, 

In [24]:
subsystem_mapping = {
    # Special
    'Oxidative Phosphorylation': 'Nucleotide metabolism',
    'Diacylglycerol Synthesis': 'Glycerolipid metabolism',
    'ROS Detoxification': 'Reactive species formation and detoxification',
    'Eicosanoid Metabolism': 'Leukotriene metabolism',

    'Intracellular source/sink': 'Pseudoreactions',
    'Extracellular exchange': 'Pseudoreactions',
    # Straightforward
    ## Carbohydrate
    'Glycolysis/Gluconeogenesis': 'Glycolysis / Gluconeogenesis',
    'Galactose metabolism': 'Galactose metabolism',
    'Inositol Phosphate Metabolism': 'Inositol phosphate metabolism',
    'Ascorbate and Aldarate Metabolism': 'Ascorbate and aldarate metabolism',

    'Transport, Extracellular': 'Transport reactions',
    'Vitamin B6 Metabolism': 'Vitamin B6 metabolism',
    'Urea cycle/amino group metabolism': 'Arginine and proline metabolism',
    'Porphyrin metabolism': 'Porphyrin metabolism',
    'Methionine Salvage': 'Cysteine and methionine metabolism',
    'Thiamine Metabolism': 'Thiamine metabolism',
    'Miscellaneous': 'Miscellaneous',
    'Pentose Phosphate Pathway': 'Pentose phosphate pathway',
    'Riboflavin Metabolism': 'Riboflavin metabolism',
    'Arginine and Proline Metabolism': 'Arginine and proline metabolism',
    'Glutamate metabolism': 'Alanine, aspartate and glutamate metabolism',
    'Salvage Pathway': 'Purine metabolism',
    'Phenylalanine metabolism': 'Phenylalanine, tyrosine and tryptophan metabolism',
    'Tyrosine metabolism': 'Phenylalanine, tyrosine and tryptophan metabolism',
    'Nucleotides': 'Nucleotide metabolism',
    'Heme Degradation': 'Porphyrin metabolism',
    'Heme Biosynthesis': 'Porphyrin metabolism',
    'Fructose and Mannose Metabolism': 'Fructose and mannose metabolism',
    'Purine Catabolism': 'Purine metabolism',
    'Aminosugar Metabolism': 'Amino sugar and nucleotide sugar metabolism',
    'Glycerophospholipid Metabolism': 'Glycerophospholipid metabolism',
    'Pyrimidine Catabolism': 'Pyrimidine metabolism',
    'Citric Acid Cycle': 'Tricarboxylic acid cycle and glyoxylate/dicarboxylate metabolism',
    'Pyrimidine Biosynthesis': 'Pyrimidine metabolism',
    'Pyruvate Metabolism': 'Pyruvate metabolism',
    'NAD Metabolism': 'Nicotinate and nicotinamide metabolism',
    'Pentose and Glucuronate Interconversions': 'Pentose and glucuronate interconversions',
    'Carnitine shuttle': 'Carnitine shuttle (cytosolic)',
    'Glutathione Metabolism': 'Glutathione metabolism',
    'Methionine Metabolism': 'Cysteine and methionine metabolism',
    'Fatty acid activation': 'Fatty acid activation (cytosolic)',
    'Starch and Sucrose Metabolism': 'Starch and sucrose metabolism',
}

individual_subsystem_updates = {
    "Arginine and proline metabolism": ["MTAP"],
    "Purine metabolism": ["ADK1", "ADNCYC", "ADNK1", "AMPDA", "PDEA", "GUACYC",
                          "NDPK1", "IMPD", "GK1", "NTD7", "NTD9", "GMPR", "GMPS2"],
    "Pyrimidine metabolism": ["UMPK1", "NDPK2", "NDPK3"],
    "Pyruvate metabolism": ["ALDD2x"],
    "Nucleotide metabolism": ["AP4AH1"],
    "Miscellaneous": ["BANDMT"],
    
    
}
for subsystem, reaction_id_list in individual_subsystem_updates.items():
    reaction_list = model.reactions.get_by_any(reaction_id_list)
    for reaction in reaction_list:
        reaction.subsystem = subsystem

model.remove_groups(list(model.groups))
for old_subsystem, new_subsystem in subsystem_mapping.items():
    reaction_list = model.reactions.query(lambda x: x.subsystem == old_subsystem)
    for reaction in reaction_list:
        reaction.subsystem = new_subsystem
    if new_subsystem not in model.groups:
        group = Group(id=new_subsystem, name=new_subsystem, members=reaction_list)
        model.add_groups([group])
    else:
        group = model.groups.get_by_id(new_subsystem).add_members(reaction_list)

for subsystem, reaction_id_list in individual_subsystem_updates.items():
    reaction_list = model.reactions.get_by_any(reaction_id_list)
    if subsystem not in model.groups:
        group = Group(id=subsystem, name=subsystem, members=reaction_list)
        model.add_groups([group])
    else:
        group = model.groups.get_by_id(subsystem).add_members(reaction_list)

model.groups.sort()
model.groups

[<Group Alanine, aspartate and glutamate metabolism at 0x14b3714c0>,
 <Group Amino sugar and nucleotide sugar metabolism at 0x14b371610>,
 <Group Arginine and proline metabolism at 0x14b370c50>,
 <Group Ascorbate and aldarate metabolism at 0x14b370fe0>,
 <Group Carnitine shuttle (cytosolic) at 0x14b371e20>,
 <Group Cysteine and methionine metabolism at 0x14b370a40>,
 <Group Fatty acid activation (cytosolic) at 0x14b3711f0>,
 <Group Fructose and mannose metabolism at 0x14b371be0>,
 <Group Galactose metabolism at 0x14b371580>,
 <Group Glutathione metabolism at 0x14b3715e0>,
 <Group Glycerolipid metabolism at 0x150ac8d70>,
 <Group Glycerophospholipid metabolism at 0x14b371ca0>,
 <Group Glycolysis / Gluconeogenesis at 0x14b3713d0>,
 <Group Inositol phosphate metabolism at 0x14b371550>,
 <Group Leukotriene metabolism at 0x14b3717c0>,
 <Group Miscellaneous at 0x14b3718e0>,
 <Group Nicotinate and nicotinamide metabolism at 0x14b371d30>,
 <Group Nucleotide metabolism at 0x14b267fb0>,
 <Group P

### Standardize metabolite formulas

In [25]:
metabolite_formulas = dict(zip(
    model.metabolites.list_attr("id"), 
    model.metabolites.list_attr("formula")
))
standardized = standardardize_metabolite_formulas(metabolite_formulas)

for mid, updated_formula in standardized.items():
    if metabolite_formulas[mid] != updated_formula:
        print(f"Standardizing formula for `{mid}`")
        model_metabolite = model.metabolites.get_by_id(mid)
        model_metabolite.formula = updated_formula

Standardizing formula for `band_c`
Standardizing formula for `bandmt_c`


## Export updated model
### Version: 0.2.0

In [26]:
write_rbc_model(model, "all")
model

0,1
Name,iAB_RBC_283
Memory address,14a29c890
Number of metabolites,342
Number of reactions,469
Number of genes,349
Number of groups,33
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"
