# RBC-GEM 0.1.0 --> 0.1.1
## Repair iAB-RBC-283 model

The purpose of this notebook is to repair iAB-RBC-283 reconstruction downloaded from BiGG and make it analgous to the original model from the iAB-RBC-283 reconstruction.

**Note:** Repairing the model does not mean that the model will be identical to the iAB-RBC-283 publication. Formatting errors that create inconsistencies from the original intention were fixed to make the model. Updates to synchronize model IDs to new identifiers, new annotations, etc. are addressed in subsequent model version updates and iterations.

Bordbar, A., Jamshidi, N. & Palsson, B.O. iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states. BMC Syst Biol 5, 110 (2011). https://doi.org/10.1186/1752-0509-5-110

## Setup
### Import packages

In [1]:
import pandas as pd
import cobra
from cobra.core import Group

from rbc_gem_utils import (
    COBRA_CONFIGURATION, 
    REPO_PATH, 
    show_versions,
    read_rbc_model,
    write_rbc_model,
)
# Display versions of last time notebook ran and worked
show_versions()


Package Information
-------------------
rbc-gem-utils 0.0.1

Dependency Information
----------------------
cobra       0.29.0
depinfo      2.2.0
memote      0.16.1
notebook     7.0.6
simplejson missing

Build Tools Information
-----------------------
pip        23.3.1
setuptools 68.2.2
wheel      0.41.2

Platform Information
--------------------
Darwin  22.6.0-x86_64
CPython        3.12.0


### Define configuration
#### COBRA Configuration

In [2]:
COBRA_CONFIGURATION

Attribute,Description,Value
solver,Mathematical optimization solver,gurobi
tolerance,"General solver tolerance (feasibility, integrality, etc.)",1e-07
lower_bound,Default reaction lower bound,-1000.0
upper_bound,Default reaction upper bound,1000.0
processes,Number of parallel processes,15
cache_directory,Path for the model cache,/Users/zhaiman/Library/Caches/cobrapy
max_cache_size,Maximum cache size in bytes,104857600
cache_expiration,Model cache expiration time in seconds (if any),


## Load RBC-GEM model
### Version: 0.1.0 (iAB-RBC-283)

In [3]:
# Downloaded SBML model contains annotations, but no subsystems. 
# Downloaded JSON model contains subsystems
json_model = read_rbc_model(filetype='json')
model = read_rbc_model(filetype='xml')
model

Set parameter Username
Academic license - for non-commercial use only - expires 2024-11-28


0,1
Name,iAB_RBC_283
Memory address,13fe0aff0
Number of metabolites,342
Number of reactions,469
Number of genes,346
Number of groups,0
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"


## Metabolites
### Repair extracellular metabolites with missing formulas and charges
1. Extracellular metabolites do not have formulas or charges, rendering them mass-imbalanced. Use intracellular metabolite formula and charges when possible, balance out the rest assuming iAB-RBC-283 is already mass and charge balanced.
2. The reaction `DM_nadh` will not be charge balanced as its a pseudoreaction.

In [4]:
unbalanced = cobra.manipulation.validate.check_mass_balance(model)
print(f"Numbr of unbalanced reactions: {len(unbalanced)}")
unbalanced

Numbr of unbalanced reactions: 130


{<Reaction 3MOXTYRESSte at 0x141a30950>: {'charge': -3.0,
  'C': -27.0,
  'H': -42.0,
  'N': -3.0,
  'O': -6.0},
 <Reaction 5AOPt2 at 0x141a30b30>: {'H': 9.0, 'C': 5.0, 'N': 1.0, 'O': 3.0},
 <Reaction ACNAMt2 at 0x141a320f0>: {'charge': -1.0,
  'H': 18.0,
  'C': 11.0,
  'N': 1.0,
  'O': 9.0},
 <Reaction AGPAT1_16_0_16_1 at 0x141a3f080>: {'C': -16.0,
  'H': -30.0,
  'O': -1.0},
 <Reaction AGPAT1_16_0_18_3 at 0x141a3f890>: {'C': -18.0,
  'H': -32.0,
  'O': -1.0},
 <Reaction AGPAT1_16_0_18_4 at 0x141a3fe30>: {'C': -18.0,
  'H': -30.0,
  'O': -1.0},
 <Reaction CDIPTr_16_0_16_0 at 0x141a3ffb0>: {'charge': -1.0,
  'C': 3.0,
  'H': 1.0,
  'O': 2.0,
  'N': 3.0,
  'P': 1.0},
 <Reaction AGPAT1_18_1_18_3 at 0x141a3ff20>: {'C': -18.0,
  'H': -32.0,
  'O': -1.0},
 <Reaction CDIPTr_16_0_18_1 at 0x141a3fe90>: {'charge': -1.0,
  'C': 3.0,
  'H': 1.0,
  'O': 2.0,
  'N': 3.0,
  'P': 1.0},
 <Reaction CDIPTr_16_0_18_2 at 0x141a4c5c0>: {'charge': -1.0,
  'C': 3.0,
  'H': 1.0,
  'O': 2.0,
  'N': 3.0,
  'P':

In [5]:
# Missing formulas
model.metabolites.get_by_id('3moxtyr_e').formula = model.metabolites.get_by_id('3moxtyr_c').formula
model.metabolites.get_by_id('5aop_e').formula = model.metabolites.get_by_id('5aop_c').formula
model.metabolites.get_by_id('acnam_e').formula = model.metabolites.get_by_id('acnam_c').formula
model.metabolites.get_by_id('etha_e').formula = model.metabolites.get_by_id('etha_c').formula
model.metabolites.get_by_id('fum_e').formula = model.metabolites.get_by_id('fum_c').formula
model.metabolites.get_by_id('hcys__L_e').formula = model.metabolites.get_by_id('hcys__L_c').formula
model.metabolites.get_by_id('leuktrA4_e').formula = model.metabolites.get_by_id('leuktrA4_c').formula
model.metabolites.get_by_id('leuktrB4_e').formula = model.metabolites.get_by_id('leuktrB4_c').formula
model.metabolites.get_by_id('mal__L_e').formula = model.metabolites.get_by_id('mal__L_c').formula
model.metabolites.get_by_id('normete__L_e').formula = model.metabolites.get_by_id('normete__L_c').formula
model.metabolites.get_by_id('orot_e').formula = model.metabolites.get_by_id('orot_c').formula
model.metabolites.get_by_id('ptrc_e').formula = model.metabolites.get_by_id('ptrc_c').formula
model.metabolites.get_by_id('spmd_e').formula = model.metabolites.get_by_id('spmd_c').formula
model.metabolites.get_by_id('sprm_e').formula = model.metabolites.get_by_id('sprm_c').formula

# Missing charges
model.metabolites.get_by_id('3moxtyr_e').charge = model.metabolites.get_by_id('3moxtyr_c').charge
model.metabolites.get_by_id('5aop_e').charge = model.metabolites.get_by_id('5aop_c').charge
model.metabolites.get_by_id('acnam_e').charge = model.metabolites.get_by_id('acnam_c').charge
model.metabolites.get_by_id('etha_e').charge = model.metabolites.get_by_id('etha_c').charge
model.metabolites.get_by_id('fum_e').charge = model.metabolites.get_by_id('fum_c').charge
model.metabolites.get_by_id('hcys__L_e').charge = model.metabolites.get_by_id('hcys__L_c').charge
model.metabolites.get_by_id('leuktrA4_e').charge = model.metabolites.get_by_id('leuktrA4_c').charge
model.metabolites.get_by_id('leuktrB4_e').charge = model.metabolites.get_by_id('leuktrB4_c').charge
model.metabolites.get_by_id('mal__L_e').charge = model.metabolites.get_by_id('mal__L_c').charge
model.metabolites.get_by_id('normete__L_e').charge = model.metabolites.get_by_id('normete__L_c').charge
model.metabolites.get_by_id('orot_e').charge = model.metabolites.get_by_id('orot_c').charge
model.metabolites.get_by_id('ptrc_e').charge = model.metabolites.get_by_id('ptrc_c').charge
model.metabolites.get_by_id('spmd_e').charge = model.metabolites.get_by_id('spmd_c').charge
model.metabolites.get_by_id('sprm_e').charge = model.metabolites.get_by_id('sprm_c').charge

### Repair intracellular metabolites missing formulas and charges

In [6]:
# Missing formulas, use R instead of X
model.metabolites.get_by_id('band_c').formula = "RH"
model.metabolites.get_by_id('bandmt_c').formula = "RCH3"
model.metabolites.get_by_id('dhmtp_c').formula = "C6H10O3S"
model.metabolites.get_by_id('ppp9_c').formula = "C34H32N4O4"

model.metabolites.get_by_id('alpa_hs_16_0_c').formula = "C19H37O7P"
model.metabolites.get_by_id('alpa_hs_18_1_c').formula = "C21H39O7P"
model.metabolites.get_by_id('alpa_hs_18_2_c').formula = "C21H37O7P"
model.metabolites.get_by_id('cdpdag_hs_16_0_16_0_c').formula = "C44H79N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_16_0_18_1_c').formula = "C46H81N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_16_0_18_2_c').formula = "C46H79N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_18_1_18_1_c').formula = "C48H83N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_18_1_18_2_c').formula = "C48H81N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_18_2_16_0_c').formula = "C46H79N3O15P2"
model.metabolites.get_by_id('cdpdag_hs_18_2_18_1_c').formula = "C48H81N3O15P2"
model.metabolites.get_by_id('dag_hs_16_0_16_0_c').formula = "C35H68O5"
model.metabolites.get_by_id('dag_hs_16_0_18_1_c').formula = "C37H70O5"
model.metabolites.get_by_id('dag_hs_16_0_18_2_c').formula = "C37H68O5"
model.metabolites.get_by_id('dag_hs_18_1_18_1_c').formula = "C39H72O5"
model.metabolites.get_by_id('dag_hs_18_1_18_2_c').formula = "C39H70O5"
model.metabolites.get_by_id('dag_hs_18_2_16_0_c').formula = "C37H68O5"
model.metabolites.get_by_id('dag_hs_18_2_18_1_c').formula = "C39H70O5"
model.metabolites.get_by_id('lpchol_hs_16_0_c').formula = "C24H50NO7P"
model.metabolites.get_by_id('lpchol_hs_18_1_c').formula = "C26H52NO7P"
model.metabolites.get_by_id('lpchol_hs_18_2_c').formula = "C26H50NO7P"
model.metabolites.get_by_id('pa_hs_16_0_16_0_c').formula = "C35H67O8P"
model.metabolites.get_by_id('pa_hs_16_0_18_1_c').formula = "C37H69O8P"
model.metabolites.get_by_id('pa_hs_16_0_18_2_c').formula = "C37H67O8P"
model.metabolites.get_by_id('pa_hs_18_1_18_1_c').formula = "C39H71O8P"
model.metabolites.get_by_id('pa_hs_18_1_18_2_c').formula = "C39H69O8P"
model.metabolites.get_by_id('pa_hs_18_2_16_0_c').formula = "C37H67O8P"
model.metabolites.get_by_id('pa_hs_18_2_18_1_c').formula = "C39H69O8P"
model.metabolites.get_by_id('pail45p_hs_16_0_16_0_c').formula = "C41H76O19P3"
model.metabolites.get_by_id('pail45p_hs_16_0_18_1_c').formula = "C43H78O19P3"
model.metabolites.get_by_id('pail45p_hs_16_0_18_2_c').formula = "C43H76O19P3"
model.metabolites.get_by_id('pail45p_hs_18_1_18_1_c').formula = "C45H80O19P3"
model.metabolites.get_by_id('pail45p_hs_18_1_18_2_c').formula = "C45H78O19P3"
model.metabolites.get_by_id('pail45p_hs_18_2_16_0_c').formula = "C43H76O19P3"
model.metabolites.get_by_id('pail45p_hs_18_2_18_1_c').formula = "C45H78O19P3"
model.metabolites.get_by_id('pail4p_hs_16_0_16_0_c').formula = "C41H77O16P2"
model.metabolites.get_by_id('pail4p_hs_16_0_18_1_c').formula = "C43H79O16P2"
model.metabolites.get_by_id('pail4p_hs_16_0_18_2_c').formula = "C43H77O16P2"
model.metabolites.get_by_id('pail4p_hs_18_1_18_1_c').formula = "C45H81O16P2"
model.metabolites.get_by_id('pail4p_hs_18_1_18_2_c').formula = "C45H79O16P2"
model.metabolites.get_by_id('pail4p_hs_18_2_16_0_c').formula = "C43H77O16P2"
model.metabolites.get_by_id('pail4p_hs_18_2_18_1_c').formula = "C45H79O16P2"
model.metabolites.get_by_id('pail_hs_16_0_16_0_c').formula = "C41H78O13P"
model.metabolites.get_by_id('pail_hs_16_0_18_1_c').formula = "C43H80O13P"
model.metabolites.get_by_id('pail_hs_16_0_18_2_c').formula = "C43H78O13P"
model.metabolites.get_by_id('pail_hs_18_1_18_1_c').formula = "C45H82O13P"
model.metabolites.get_by_id('pail_hs_18_1_18_2_c').formula = "C45H80O13P"
model.metabolites.get_by_id('pail_hs_18_2_16_0_c').formula = "C43H78O13P"
model.metabolites.get_by_id('pail_hs_18_2_18_1_c').formula = "C45H80O13P"
model.metabolites.get_by_id('pchol_hs_16_0_16_0_c').formula = "C40H80NO8P"
model.metabolites.get_by_id('pchol_hs_16_0_18_1_c').formula = "C42H82NO8P"
model.metabolites.get_by_id('pchol_hs_16_0_18_2_c').formula = "C42H80NO8P"
model.metabolites.get_by_id('pchol_hs_18_1_18_1_c').formula = "C44H84NO8P"
model.metabolites.get_by_id('pchol_hs_18_1_18_2_c').formula = "C44H82NO8P"
model.metabolites.get_by_id('pchol_hs_18_2_16_0_c').formula = "C42H80NO8P"
model.metabolites.get_by_id('pchol_hs_18_2_18_1_c').formula = "C44H82NO8P"
model.metabolites.get_by_id('pe_hs_16_0_16_0_c').formula = "C37H74NO8P"
model.metabolites.get_by_id('pe_hs_16_0_18_1_c').formula = "C39H76NO8P"
model.metabolites.get_by_id('pe_hs_16_0_18_2_c').formula = "C39H74NO8P"
model.metabolites.get_by_id('pe_hs_18_1_18_1_c').formula = "C41H78NO8P"
model.metabolites.get_by_id('pe_hs_18_1_18_2_c').formula = "C41H76NO8P"
model.metabolites.get_by_id('pe_hs_18_2_16_0_c').formula = "C39H74NO8P"
model.metabolites.get_by_id('pe_hs_18_2_18_1_c').formula = "C41H76NO8P"

# Missing charges
model.metabolites.get_by_id('band_c').charge = 0
model.metabolites.get_by_id('bandmt_c').charge = 0
model.metabolites.get_by_id('dhmtp_c').charge = 0
model.metabolites.get_by_id('ppp9_c').charge = -2

model.metabolites.get_by_id('alpa_hs_16_0_c').charge = -2
model.metabolites.get_by_id('alpa_hs_18_1_c').charge = -2
model.metabolites.get_by_id('alpa_hs_18_2_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_16_0_16_0_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_16_0_18_1_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_16_0_18_2_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_18_1_18_1_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_18_1_18_2_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_18_2_16_0_c').charge = -2
model.metabolites.get_by_id('cdpdag_hs_18_2_18_1_c').charge = -2
model.metabolites.get_by_id('dag_hs_16_0_16_0_c').charge = 0
model.metabolites.get_by_id('dag_hs_16_0_18_1_c').charge = 0
model.metabolites.get_by_id('dag_hs_16_0_18_2_c').charge = 0
model.metabolites.get_by_id('dag_hs_18_1_18_1_c').charge = 0
model.metabolites.get_by_id('dag_hs_18_1_18_2_c').charge = 0
model.metabolites.get_by_id('dag_hs_18_2_16_0_c').charge = 0
model.metabolites.get_by_id('dag_hs_18_2_18_1_c').charge = 0
model.metabolites.get_by_id('lpchol_hs_16_0_c').charge = 0
model.metabolites.get_by_id('lpchol_hs_18_1_c').charge = 0
model.metabolites.get_by_id('lpchol_hs_18_2_c').charge = 0
model.metabolites.get_by_id('pa_hs_16_0_16_0_c').charge = -2
model.metabolites.get_by_id('pa_hs_16_0_18_1_c').charge = -2
model.metabolites.get_by_id('pa_hs_16_0_18_2_c').charge = -2
model.metabolites.get_by_id('pa_hs_18_1_18_1_c').charge = -2
model.metabolites.get_by_id('pa_hs_18_1_18_2_c').charge = -2
model.metabolites.get_by_id('pa_hs_18_2_16_0_c').charge = -2
model.metabolites.get_by_id('pa_hs_18_2_18_1_c').charge = -2
model.metabolites.get_by_id('pail45p_hs_16_0_16_0_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_16_0_18_1_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_16_0_18_2_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_18_1_18_1_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_18_1_18_2_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_18_2_16_0_c').charge = -5
model.metabolites.get_by_id('pail45p_hs_18_2_18_1_c').charge = -5
model.metabolites.get_by_id('pail4p_hs_16_0_16_0_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_16_0_18_1_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_16_0_18_2_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_18_1_18_1_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_18_1_18_2_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_18_2_16_0_c').charge = -3
model.metabolites.get_by_id('pail4p_hs_18_2_18_1_c').charge = -3
model.metabolites.get_by_id('pail_hs_16_0_16_0_c').charge = -1
model.metabolites.get_by_id('pail_hs_16_0_18_1_c').charge = -1
model.metabolites.get_by_id('pail_hs_16_0_18_2_c').charge = -1
model.metabolites.get_by_id('pail_hs_18_1_18_1_c').charge = -1
model.metabolites.get_by_id('pail_hs_18_1_18_2_c').charge = -1
model.metabolites.get_by_id('pail_hs_18_2_16_0_c').charge = -1
model.metabolites.get_by_id('pail_hs_18_2_18_1_c').charge = -1
model.metabolites.get_by_id('pchol_hs_16_0_16_0_c').charge = 0
model.metabolites.get_by_id('pchol_hs_16_0_18_1_c').charge = 0
model.metabolites.get_by_id('pchol_hs_16_0_18_2_c').charge = 0
model.metabolites.get_by_id('pchol_hs_18_1_18_1_c').charge = 0
model.metabolites.get_by_id('pchol_hs_18_1_18_2_c').charge = 0
model.metabolites.get_by_id('pchol_hs_18_2_16_0_c').charge = 0
model.metabolites.get_by_id('pchol_hs_18_2_18_1_c').charge = 0
model.metabolites.get_by_id('pe_hs_16_0_16_0_c').charge = 0
model.metabolites.get_by_id('pe_hs_16_0_18_1_c').charge = 0
model.metabolites.get_by_id('pe_hs_16_0_18_2_c').charge = 0
model.metabolites.get_by_id('pe_hs_18_1_18_1_c').charge = 0
model.metabolites.get_by_id('pe_hs_18_1_18_2_c').charge = 0
model.metabolites.get_by_id('pe_hs_18_2_16_0_c').charge = 0
model.metabolites.get_by_id('pe_hs_18_2_18_1_c').charge = 0

In [7]:
unbalanced = cobra.manipulation.validate.check_mass_balance(model)
print(f"Numbr of unbalanced reactions: {len(unbalanced)}")
unbalanced

Numbr of unbalanced reactions: 0


{}

## Genes
### Repair gene names
1. Specific gene names were lost and replaced with the NCBI Gene ID. They need to be reverted back to the HGNC Symbol in the identifiers. Errors were likely due to loss of the ".1" in the original model, which was used to translate it to "_AT1" on BiGG. They have been added in here to ensure subsequent functions in gene ID corrections work properly.
2.  Specific transport genes ended up with an extracellular compartment tag "_e". These tags have been removed abd genes were updated to reflect the "_AT" format.

In [8]:
id_mapping_dict = {}
# Fix number IDs
id_mapping_dict.update({
    # Old ID: New ID
    "9429": "Abcg2_AT1",
    "55256": "Adi1_AT1",
    "58478": "Enoph1_AT1",
    "55500": "Etnk1_AT1",
    "29124": "Lgals13_AT1",
    "10846": "Pde10A_AT1",
    "55276": "Pgm2_AT1",
    "51084": "Cryl1_AT1",
    "10390_AT1": "Cept1_AT1",
    "55224_AT1": "Etnk2_AT1",
})
# Fix transporter IDs
id_mapping_dict.update({
    # Old ID: New ID
    "Abcc4_1_e": "Abcc4_AT1",
    "Rhag_1_e": "Rhag_AT1",
    "Rhbg_1_e": "Rhbg_AT1",
    "Slc12a7_1_e": "Slc12a7_AT1",
    "Slc14a1_1_e": "Slc14a1_AT1",
    "Slc29a1_1_e": "Slc29a1_AT1",
    "Slc29a2_1_e": "Slc29a2_AT1",
    "Slc2a11_1_e": "Slc2a11_AT1",
    "Slc2a1_1_e": "Slc2a1_AT1",
    "Slc2a2_1_e": "Slc2a2_AT1",
    "Slc2a3_1_e": "Slc2a3_AT1",
    "Slc2a4_1_e": "Slc2a4_AT1",
    "Slc2a5_1_e": "Slc2a5_AT1",
    "Slc2a7_1_e": "Slc2a7_AT1",
    "Slc2a8_1_e": "Slc2a8_AT1",
    "Slc4a1_1_e": "Slc4a1_AT1",
    "Slc5a1_1_e": "Slc5a1_AT1",
    "Slc5a3_1_e": "Slc5a3_AT1",
    "Slc5a5_1_e": "Slc5a5_AT1",
})

cobra.manipulation.modify.rename_genes(model, id_mapping_dict)
model.repair()

id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["geneRetired", "genes"]
id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]
id_mapping_df.to_csv(
    f"{REPO_PATH}/data/interim/replacedGenes.tsv",
    sep="\t",
)
id_mapping_df

Unnamed: 0,genes,geneRetired
0,Abcg2_AT1,9429
1,Adi1_AT1,55256
2,Enoph1_AT1,58478
3,Etnk1_AT1,55500
4,Lgals13_AT1,29124
5,Pde10A_AT1,10846
6,Pgm2_AT1,55276
7,Cryl1_AT1,51084
8,Cept1_AT1,10390_AT1
9,Etnk2_AT1,55224_AT1



## Reactions

### Subsystems

1. The handling of the `subsystem` attribute has changed such that they are now exported and imported as `cobra.Group` objects in SBML formatted models. However, that is not included in the SBML model currently and the imported models do not have any way of populating the `subsystem` field. Using the defined `subsystem` attribute from the JSON models, the groups can be created accordingly.
2. The reaction `DM_nadh` will has its subsystem set as `subsystem='&apos;'`. This is updated to `subsystem='Miscellaneous'`

In [9]:
for json_rxn in json_model.reactions:
    reaction = model.reactions.get_by_id(json_rxn.id)
    if json_rxn.subsystem == '&apos;':
        reaction.subsystem = 'Miscellaneous'
    else:
        reaction.subsystem = json_rxn.subsystem 

unique_subsystems = set(model.reactions.list_attr("subsystem"))
for subsystem in unique_subsystems:
    reaction_list = model.reactions.query(lambda x: x.subsystem == subsystem)
    model.add_groups([
        Group(
            id=subsystem, 
            name=subsystem, 
            members=reaction_list
        )
    ])
model

0,1
Name,iAB_RBC_283
Memory address,13fe0aff0
Number of metabolites,342
Number of reactions,469
Number of genes,346
Number of groups,41
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"



### Repair reaction identifiers
Due to how reactions were stored, the reaction suffix representing the fatty acid representation was altered. 
Therefore, identifiers are updated and incorrect ones are added to files for tracking identifier changes.

In [10]:
# ID fixes
id_mapping_dict = {
    # Old ID: New ID
    'AGPAT1_16_0_16_1': 'AGPAT1_16_0_16_0',
    'AGPAT1_16_0_18_3': 'AGPAT1_16_0_18_1',
    'AGPAT1_16_0_18_4': 'AGPAT1_16_0_18_2',
    'AGPAT1_18_1_18_3': 'AGPAT1_18_1_18_1',
    'AGPAT1_18_1_18_4': 'AGPAT1_18_1_18_2',

    'GPAM_hs_16_1': 'GPAM_hs_16_0',
    'GPAM_hs_18_3': 'GPAM_hs_18_1',
    'GPAM_hs_18_4': 'GPAM_hs_18_2',

    'PI4P5K_16_0_16_1': 'PI4P5K_16_0_16_0',
    'PI4P5K_16_0_18_3': 'PI4P5K_16_0_18_1',
    'PI4P5K_16_0_18_4': 'PI4P5K_16_0_18_2',
    'PI4P5K_18_1_18_3': 'PI4P5K_18_1_18_1',
    'PI4P5K_18_1_18_4': 'PI4P5K_18_1_18_2',
}

model.repair()


for to_replace, replacement_id in id_mapping_dict.items():
    try:
        reaction = model.reactions.get_by_id(to_replace)
    except KeyError as e:
        if replacement_id not in model.reactions:
            raise KeyError(f"Could not find {e} in model")
        print(f"Could not find {e}, already replaced.")
    else:
        # Update ID and annotation
        reaction.id = replacement_id
        reaction.annotation["bigg.reaction"] = replacement_id

id_mapping_df = pd.DataFrame.from_dict(id_mapping_dict, orient="index")
id_mapping_df = id_mapping_df.reset_index(drop=False)
id_mapping_df.columns = ["rxnRetired", "rxns"]
id_mapping_df = id_mapping_df.loc[:, id_mapping_df.columns[::-1]]
id_mapping_df.to_csv(
    f"{REPO_PATH}/data/interim/replacedReactions.tsv",
    sep="\t",
)
id_mapping_df

Unnamed: 0,rxns,rxnRetired
0,AGPAT1_16_0_16_0,AGPAT1_16_0_16_1
1,AGPAT1_16_0_18_1,AGPAT1_16_0_18_3
2,AGPAT1_16_0_18_2,AGPAT1_16_0_18_4
3,AGPAT1_18_1_18_1,AGPAT1_18_1_18_3
4,AGPAT1_18_1_18_2,AGPAT1_18_1_18_4
5,GPAM_hs_16_0,GPAM_hs_16_1
6,GPAM_hs_18_1,GPAM_hs_18_3
7,GPAM_hs_18_2,GPAM_hs_18_4
8,PI4P5K_16_0_16_0,PI4P5K_16_0_16_1
9,PI4P5K_16_0_18_1,PI4P5K_16_0_18_3


## Export repaired model
### Version: 0.1.1

In [11]:
write_rbc_model(model, filetype="all")
model

0,1
Name,iAB_RBC_283
Memory address,13fe0aff0
Number of metabolites,342
Number of reactions,469
Number of genes,346
Number of groups,41
Objective expression,1.0*NaKt - 1.0*NaKt_reverse_db47e
Compartments,"cytosol, extracellular space"
