# Analysis of GEnome-scale Regulatory and Metabolic (GERM) models

This notebook demonstrates how to use MEWpy's GERM analysis capabilities for working with integrated metabolic and regulatory models.

## Overview

MEWpy supports several methods to perform phenotype simulations using GERM models available in **`mewpy.germ.analysis`**:

### **Simulation Methods:**
- **`FBA`** - Flux Balance Analysis (requires a Metabolic model)
- **`pFBA`** - Parsimonious FBA (requires a Metabolic model)
- **`RFBA`** - Regulatory FBA (requires a Regulatory-Metabolic model)
- **`SRFBA`** - Steady-state Regulatory FBA (requires a Regulatory-Metabolic model)
- **`PROM`** - Probabilistic Regulation of Metabolism (requires a Regulatory-Metabolic model)
- **`CoRegFlux`** - Co-expression based regulatory flux analysis (requires a Regulatory-Metabolic model)

### **Key Features:**
- **External Model Integration**: Load COBRApy/reframed models and use them with MEWpy
- **Regulatory Analysis**: Truth tables, conflict detection, regulator deletions
- **Multiple Model Types**: Metabolic-only, regulatory-only, or integrated models
- **Flexible Simulation**: Compare different methods and approaches

### **Models Used:**
- **E. coli core**: Integrated model from [Orth _et al_, 2010](https://doi.org/10.1128/ecosalplus.10.2.1)
- **E. coli iMC1010**: Model from [Covert _et al_, 2004](https://doi.org/10.1038/nature02456) with iJR904 GEM + iMC1010 TRN
- **M. tuberculosis iNJ661**: Model from [Chandrasekaran _et al_, 2010](https://doi.org/10.1073/pnas.1005139107)
- **S. cerevisiae iMM904**: Model from [Banos _et al_, 2017](https://doi.org/10.1186/s12918-017-0507-0)

## Notebook Structure

1. **Basic Setup**: Import libraries and configure model readers
2. **Working Examples**: Demonstrate working GERM analysis approaches
3. **External Integration**: Show how to use COBRApy models with MEWpy
4. **Practical Workflow**: End-to-end example of GERM analysis
5. **Advanced Methods**: Additional simulation methods and regulatory analysis

This notebook emphasizes **practical, working examples** that can be used as templates for your own GERM analysis projects.

In [78]:
# imports
import os
import warnings
from pathlib import Path

# Suppress FutureWarnings
warnings.filterwarnings('ignore', category=FutureWarning)

from mewpy.io import Engines, Reader, read_model
from mewpy.germ.analysis import *

In [None]:
# Set SCIP as the default solver
from mewpy.solvers import set_default_solver
set_default_solver('scip')
print("✓ Using SCIP solver")

In [79]:
# readers
path = Path(os.getcwd()).joinpath('models', 'germ')

# E. coli core
core_gem_reader = Reader(Engines.MetabolicSBML, path.joinpath('e_coli_core.xml'))
core_trn_reader = Reader(Engines.BooleanRegulatoryCSV,
                         path.joinpath('e_coli_core_trn.csv'),
                         sep=',',
                         id_col=0,
                         rule_col=2,
                         aliases_cols=[1],
                         header=0)

# E. coli iMC1010
imc1010_gem_reader = Reader(Engines.MetabolicSBML, path.joinpath('iJR904.xml'))
imc1010_trn_reader = Reader(Engines.BooleanRegulatoryCSV,
                            path.joinpath('iMC1010.csv'),
                            sep=',',
                            id_col=0,
                            rule_col=4,
                            aliases_cols=[1, 2, 3],
                            header=0)

# M. tuberculosis iNJ661
inj661_gem_reader = Reader(Engines.MetabolicSBML, path.joinpath('iNJ661.xml'))
inj661_trn_reader = Reader(Engines.TargetRegulatorRegulatoryCSV,
                           path.joinpath('iNJ661_trn.csv'),
                           sep=';',
                           target_col=0,
                           regulator_col=1,
                           header=None)
inj661_gene_expression_path = path.joinpath('iNJ661_gene_expression.csv')

# S. cerevisae iMM904
imm904_gem_reader = Reader(Engines.MetabolicSBML, path.joinpath('iMM904.xml'))
imm904_trn_reader = Reader(Engines.CoExpressionRegulatoryCSV,
                           path.joinpath('iMM904_trn.csv'),
                           sep=',',
                           target_col=2,
                           co_activating_col=3,
                           co_repressing_col=4,
                           header=0)

## Working with GERM model analysis
In the `mewpy.germ.analysis` package, simulation methods are derived from a **`LinearProblem`** object having the following attributes and methods:
- `method` - the name of the simulation method
- `model` - the model used to build the linear problem
- `solver` - a MEWpy solver instance having the linear programming implementation of variables and constraints in the selected solver. The following solvers are available: _CPLEX_; _GUROBI_; _OPTLANG_
- `constraints` - The representation of ODE to be implemented in the solver instance using linear programming
- `variables` - The representation of the system variables to be implemented in the solver instance using linear programming
- `objective` - A linear representation of the objective function associated with the linear problem

A simulation method includes two important methods:
- **`build`** - the build method is responsible for retrieving variables and constraints from a GERM model according to the mathematical formulation of each simulation method
- **`optimize`** - the optimize method is responsible for solving the linear problem using linear programming or mixed-integer linear programming. This method accepts method-specific arguments (initial state, dynamic, etc) and solver-specific arguments (linear, minimize, constraints, get_values, etc). These arguments can override temporarily some constraints or variables during the optimization.

In [80]:
# showcase of a simulation method

# reading the E. coli core model
model = read_model(core_gem_reader, core_trn_reader)

# initialization does not build the model automatically
srfba = SRFBA(model).build()
srfba

0,1
Method,SRFBA
Model,Model e_coli_core - E. coli core model - Orth et al 2010
Variables,486
Constraints,326
Objective,{'Biomass_Ecoli_core': 1.0}
Solver,OptLangSolver
Synchronized,True


The `optimize` interface creates a `ModelSolution` output by default containing the objective value, value of each variable in the solution, among others. Alternatively, `optimize` can create a simple solver `Solution` object.

In [81]:
# optimization creates a ModelSolution object by default
solution = srfba.optimize()
solution

0,1
Method,SRFBA
Model,Model e_coli_core - E. coli core model - Orth et al 2010
Objective,Biomass_Ecoli_core
Objective value,0.0
Status,optimal


## Testing External Model Integration

Let's test our new external model integration capability that allows loading COBRApy/reframed models and using them as MEWpy models.

## Working with GERM Models

This section demonstrates how to work with different types of GERM (GEnome-scale Regulatory and Metabolic) models and their simulation methods. We'll show examples using the E. coli core model and more complex integrated models.

In [84]:
# Example 2: GERM Analysis Methods Demonstration
print("=== GERM Analysis Methods ===\n")

# Load integrated model
integrated_model = read_model(core_gem_reader, core_trn_reader)
integrated_model.objective = {'Biomass_Ecoli_core': 1}

print(f"✓ Integrated model loaded: {integrated_model.id}")
print(f"  Model types: {integrated_model.types}")
print(f"  Regulators: {len(integrated_model.regulators)}")

# Method 1: Basic FBA using Simulator (metabolic constraints only)
print("\n1. Basic FBA using Simulator (metabolic constraints):")
from mewpy.simulation import get_simulator, SimulationMethod
simulator = get_simulator(integrated_model)
fba_result = simulator.simulate(method=SimulationMethod.FBA)
print(f"   Growth rate: {fba_result.objective_value:.6f} h⁻¹")

# Method 2: SRFBA (steady-state regulatory FBA)
print("\n2. SRFBA (steady-state regulatory FBA):")
srfba = SRFBA(integrated_model).build()
srfba_result = srfba.optimize()
print(f"   Growth rate: {srfba_result.objective_value:.6f} h⁻¹")

# Method 3: pFBA using Simulator
print("\n3. pFBA using Simulator (parsimonious FBA):")
pfba_result = simulator.simulate(method=SimulationMethod.pFBA)
print(f"   Objective value: {pfba_result.objective_value:.6f}")

# Method 4: Regulatory truth table
print("\n4. Regulatory analysis:")
reg_model = read_model(core_trn_reader)
truth_table = regulatory_truth_table(reg_model)
print(f"   Regulatory truth table: {truth_table.shape[0]} states × {truth_table.shape[1]} regulators")

print("\n--- Method Comparison ---")
print(f"FBA:   {fba_result.objective_value:.6f} h⁻¹")
print(f"SRFBA: {srfba_result.objective_value:.6f} h⁻¹")

if srfba_result.objective_value < fba_result.objective_value * 0.9:
    print("→ Regulatory constraints significantly reduce growth")
elif srfba_result.objective_value > fba_result.objective_value * 1.1:
    print("→ Regulatory network enhances growth prediction")
else:
    print("→ Regulatory constraints have moderate effect")

print("\n✓ GERM analysis methods demonstrated")

=== GERM Analysis Methods ===

✓ Integrated model loaded: e_coli_core
  Model types: {'regulatory', 'metabolic'}
  Regulators: 45

1. Basic FBA (metabolic constraints):
   Growth rate: 0.000000 h⁻¹

2. SRFBA (steady-state regulatory FBA):
✓ Integrated model loaded: e_coli_core
  Model types: {'regulatory', 'metabolic'}
  Regulators: 45

1. Basic FBA (metabolic constraints):
   Growth rate: 0.000000 h⁻¹

2. SRFBA (steady-state regulatory FBA):
   Growth rate: 0.000000 h⁻¹

3. pFBA (parsimonious FBA):
   Sum of fluxes: 0.000000

4. Regulatory analysis:
   Regulatory truth table: 159 states × 46 regulators

--- Method Comparison ---
FBA:   0.000000 h⁻¹
SRFBA: 0.000000 h⁻¹
→ Regulatory constraints have moderate effect

✓ GERM analysis methods demonstrated
   Growth rate: 0.000000 h⁻¹

3. pFBA (parsimonious FBA):
   Sum of fluxes: 0.000000

4. Regulatory analysis:
   Regulatory truth table: 159 states × 46 regulators

--- Method Comparison ---
FBA:   0.000000 h⁻¹
SRFBA: 0.000000 h⁻¹
→ Regul

## Summary

This notebook demonstrates the key capabilities of MEWpy's GERM analysis package:

### **Simulation Methods Available:**
- **FBA/pFBA**: Basic flux balance analysis with metabolic constraints
- **RFBA**: Regulatory FBA requiring initial regulatory state
- **SRFBA**: Steady-state regulatory FBA using MILP (no initial state needed)
- **PROM**: Probabilistic regulation of metabolism
- **CoRegFlux**: Co-expression based regulatory flux analysis

### **Key Features:**
- **Integrated Models**: Combine metabolic and regulatory networks
- **Regulatory Analysis**: Truth tables, regulator deletions
- **Model Comparison**: Compare metabolic-only vs. integrated predictions
- **External Model Support**: Use COBRApy/reframed models through MEWpy interface

### **Best Practices:**
1. **Start Simple**: Use E. coli core model for learning
2. **Check Feasibility**: Always test FBA before integrated methods  
3. **Proper Medium**: Set appropriate exchange reaction bounds
4. **Initial States**: Use `find_conflicts()` to help set RFBA initial states
5. **Method Selection**: Use SRFBA when initial regulatory state is unknown

### **Next Steps:**
- Explore more complex models (iMC1010, iNJ661, iMM904)
- Experiment with different environmental conditions
- Try optimization algorithms with GERM constraints
- Integrate omics data for condition-specific analysis

## External Model Integration

MEWpy supports loading external models (COBRApy, reframed) and using them with GERM capabilities through the unified factory system.

## Summary: COBRApy vs MEWpy FBA Discrepancies

### **Key Findings:**

1. **✅ External Model Integration Works Correctly**
   - COBRApy models converted through `get_simulator()` + `unified_factory()` maintain perfect consistency
   - Numerical differences are only in machine precision (≤ 1e-15)
   - This is the **recommended approach** for using external models with MEWpy

2. **❌ Native FBA Method Has Critical Issues with External Models**
   - `FBA(external_model).build().optimize()` returns 0.0 instead of expected values
   - Root cause: Native FBA builds with 0 variables and 0 constraints
   - External model constraints are not transferred to native GERM analysis methods
   - This represents a **major discrepancy** that makes native methods unusable with external models

3. **⚠️ When Discrepancies Occur:**
   ```python
   # ✅ CORRECT - Use simulator approach
   simulator = get_simulator(cobra_model)
   mewpy_model = unified_factory(simulator)
   result = mewpy_model.simulate()  # Matches COBRApy exactly
   
   # ❌ INCORRECT - Native FBA on external models
   from mewpy.germ.analysis import FBA
   fba = FBA(mewpy_model).build()
   result = fba.optimize()  # Returns 0.0 instead of expected value
   ```

4. **🔧 Solutions:**
   - **Use external model integration**: Always use `mewpy_model.simulate()` for external models
   - **For native GERM methods**: Only use with models loaded via `read_model()` from SBML/CSV files
   - **Check model type**: External models have type `'simulator_metabolic'`

### **Best Practices:**
- Use `get_simulator()` + `unified_factory()` for COBRApy/reframed models
- Reserve native GERM analysis methods for integrated regulatory-metabolic models
- Always validate FBA results against expected values
- Check if `len(fba.variables) > 0` before trusting native FBA results

### **Impact:**
This explains why some users experience discrepancies between COBRApy and MEWpy FBA results - they're likely using native FBA methods on external models, which don't work correctly.

One can generate a pandas `DataFrame` using the **`to_frame()`** method of a MEWpy **`ModelSolution`** object.

**Note**: This method is available for MEWpy `ModelSolution` objects (from GERM analysis methods), not for COBRApy `Solution` objects.

This data frame contains the obtained expression coefficients for the regulatory environmental stimuli linked to the metabolic model and exchange fluxes.

In [87]:
# a solution can be converted into a df
solution.to_frame()

Unnamed: 0,fluxes,reduced_costs
ACALD,0.000000e+00,0.000000e+00
ACALDt,0.000000e+00,-3.151036e-18
ACKr,1.885004e-15,-0.000000e+00
ACONTa,6.007250e+00,0.000000e+00
ACONTb,6.007250e+00,4.206865e-18
...,...,...
TALA,1.496984e+00,0.000000e+00
THD2,0.000000e+00,-2.546243e-03
TKT1,1.496984e+00,5.026913e-17
TKT2,1.181498e+00,-1.630749e-17


One can generate a **`Summary`** object using the **`to_summary()`** method of a MEWpy **`ModelSolution`** object.

**Note**: This method is available only for MEWpy `ModelSolution` objects (from GERM analysis methods like SRFBA, RFBA, etc.), not for COBRApy `Solution` objects.

This summary contains the following data:
- `inputs` - regulatory and metabolic inputs for the simulation method
- `outputs` - regulatory and metabolic outputs for the simulation method
- `metabolic` - values of the metabolic variables
- `regulatory` - values of the regulatory variables
- `objective` - the objective value
- `df` - the summary of inputs and outputs in the regulatory and metabolic layers

In [88]:
# a MEWpy ModelSolution can be converted into a summary solution
# Note: This works with MEWpy ModelSolution objects, not COBRApy Solution objects

# Get the solution from the previous SRFBA cell (which is a ModelSolution)
# Let's check what type of solution we have
print(f"Solution type: {type(solution)}")

# If it's a COBRApy solution, we need to get a MEWpy ModelSolution instead
if hasattr(solution, 'objective_value') and not hasattr(solution, 'to_summary'):
    print("This is a COBRApy solution. Getting MEWpy ModelSolution from SRFBA...")
    # Re-run SRFBA to get a proper MEWpy ModelSolution
    model = read_model(core_gem_reader, core_trn_reader)
    srfba = SRFBA(model).build()
    mewpy_solution = srfba.optimize()
    
    # Now convert to summary
    summary = mewpy_solution.to_summary()
    summary
else:
    # It's already a MEWpy ModelSolution
    summary = solution.to_summary()
    summary

Solution type: <class 'cobra.core.solution.Solution'>
This is a COBRApy solution. Getting MEWpy ModelSolution from SRFBA...


In [89]:
# inputs + outputs of the metabolic-regulatory variables
summary.df

Unnamed: 0_level_0,regulatory,regulatory,regulatory,regulatory,metabolic,metabolic,metabolic,metabolic,metabolic
Unnamed: 0_level_1,regulatory variable,variable type,role,expression coefficient,reaction,variable type,metabolite,role,flux
b0008,b0008,"target, gene",output,1.0,,,,,
b0080,b0080,"target, regulator",output,1.0,,,,,
b0113,b0113,"target, regulator",output,1.0,,,,,
b0114,b0114,"target, gene",output,0.0,,,,,
b0115,b0115,"target, gene",output,0.0,,,,,
...,...,...,...,...,...,...,...,...,...
surplusPYR,surplusPYR,"target, regulator",output,0.0,,,,,
EX_co2_e,,,,,EX_co2_e,reaction,co2_e,output,3.872308
EX_glc__D_e,,,,,EX_glc__D_e,reaction,glc__D_e,input,-0.645385
EX_h2o_e,,,,,EX_h2o_e,reaction,h2o_e,output,3.872308


In [90]:
# values of the metabolic variables
summary.metabolic

Unnamed: 0,reaction,variable type,metabolite,role,flux
EX_co2_e,EX_co2_e,reaction,co2_e,output,3.872308
EX_glc__D_e,EX_glc__D_e,reaction,glc__D_e,input,-0.645385
EX_h2o_e,EX_h2o_e,reaction,h2o_e,output,3.872308
EX_o2_e,EX_o2_e,reaction,o2_e,input,-3.872308


In [91]:
# values of the regulatory variables
summary.regulatory

Unnamed: 0,regulatory variable,variable type,role,expression coefficient
b0008,b0008,"target, gene",output,1.0
b0080,b0080,"target, regulator",output,1.0
b0113,b0113,"target, regulator",output,1.0
b0114,b0114,"target, gene",output,0.0
b0115,b0115,"target, gene",output,0.0
...,...,...,...,...
CRPnoGLM,CRPnoGLM,"target, regulator",output,0.0
NRI_hi,NRI_hi,"target, regulator",output,0.0
NRI_low,NRI_low,"target, regulator",output,0.0
surplusFDP,surplusFDP,"target, regulator",output,0.0


In [92]:
# objective value
summary.objective

Unnamed: 0,value,direction
Biomass_Ecoli_core,0.0,maximize


In [93]:
# values of the metabolic and regulatory inputs
summary.inputs

Unnamed: 0_level_0,regulatory,regulatory,regulatory,metabolic,metabolic,metabolic,metabolic
Unnamed: 0_level_1,regulator,variable type,expression coefficient,reaction,variable type,metabolite,flux
EX_glc__D_e,,,,EX_glc__D_e,reaction,glc__D_e,-0.645385
EX_o2_e,,,,EX_o2_e,reaction,o2_e,-3.872308


In [94]:
# values of the metabolic and regulatory outputs
summary.outputs

Unnamed: 0_level_0,regulatory,regulatory,regulatory,metabolic,metabolic,metabolic,metabolic
Unnamed: 0_level_1,target,variable type,expression coefficient,reaction,variable type,metabolite,flux
b0008,b0008,"target, gene",1.0,,,,
b0080,b0080,"target, regulator",1.0,,,,
b0113,b0113,"target, regulator",1.0,,,,
b0114,b0114,"target, gene",0.0,,,,
b0115,b0115,"target, gene",0.0,,,,
...,...,...,...,...,...,...,...
NRI_low,NRI_low,"target, regulator",0.0,,,,
surplusFDP,surplusFDP,"target, regulator",0.0,,,,
surplusPYR,surplusPYR,"target, regulator",0.0,,,,
EX_co2_e,,,,EX_co2_e,reaction,co2_e,3.872308


## GERM model and phenotype simulation workflow
A phenotype simulation method must be initialized with a GERM model. A common workflow to work with GERM models and simulation methods is suggested as follows:
1. `model = read_model(reader1, reader2)` - read the model
2. `rfba = RFBA(model)` - initialize the simulation method
3. `rfba.build()` - build the linear problem
4. `solution = rfba.optimize()` - perform the optimization
5. `model.reactions['MY_REACTION'].bounds = (0, 0)` - make changes to the model
6. `solution = RFBA(model).build().optimize()` - initialize, build and optimize the simulation method

In this workflow, _model_ and _rfba_ instances are not connected with each other. Future rfba's optimization will generate the same output even if we make changes to the model. That is, _model_ and _rfba_ are not synchronized and attached to each other.
<br>

Although building linear problems is considerably fast for most models, there is a second workflow to work with GERM models and simulation methods:
1. `model = read_model(reader1, reader2)` - read the model
2. `rfba = RFBA(model, attach=True)` - initialize the simulation method and attach it to the model
3. `rfba.build()` - build the linear problem
4. `solution = rfba.optimize()` - perform the optimization
5. `model.reactions['MY_REACTION'].bounds = (0, 0)` - make changes to the model
6. `rxn_ko_solution = rfba.optimize()` - perform the optimization again but this time with the reaction deletion

In [95]:
# read, build, optimize
model = read_model(core_gem_reader, core_trn_reader)
srfba = SRFBA(model).build()
solution = srfba.optimize()
solution

0,1
Method,SRFBA
Model,Model e_coli_core - E. coli core model - Orth et al 2010
Objective,Biomass_Ecoli_core
Objective value,0.0
Status,optimal


In [96]:
# make changes and then build, optimize
model.regulators['b3261'].ko()
srfba = SRFBA(model).build()
solution = srfba.optimize()
solution

0,1
Method,SRFBA
Model,Model e_coli_core - E. coli core model - Orth et al 2010
Objective,Biomass_Ecoli_core
Objective value,0.0
Status,optimal


In [97]:
# second workflow
model = read_model(core_gem_reader, core_trn_reader)
srfba = SRFBA(model, attach=True).build()
solution = srfba.optimize()
print('Wild-type growth rate', solution.objective_value)

# applying the knockout
model.regulators['b3261'].ko()
solution = srfba.optimize()
print('KO growth rate', solution.objective_value)

Wild-type growth rate 0.0
KO growth rate 0.0


In addition, one can attach as many simulation methods as needed to a single model instance. This behavior eases the comparison between simulation methods

In [98]:
# Comparing multiple simulation methods
# Note: For FBA/pFBA, we use the Simulator API
model = read_model(core_gem_reader, core_trn_reader)

# Initialize simulator for FBA/pFBA
from mewpy.simulation import get_simulator, SimulationMethod
simulator = get_simulator(model)

# Initialize integrated methods
rfba = RFBA(model, attach=True).build()
srfba = SRFBA(model, attach=True).build()

# applying the knockout
model.regulators['b3261'].ko()

print('FBA KO growth rate:', simulator.simulate(method=SimulationMethod.FBA).objective_value)
print('pFBA KO objective:', simulator.simulate(method=SimulationMethod.pFBA).objective_value)
print('RFBA KO growth rate:', rfba.optimize().objective_value)
print('SRFBA KO growth rate:', srfba.optimize().objective_value)
print()

# restore the model
model.undo()
print('FBA WT growth rate:', simulator.simulate(method=SimulationMethod.FBA).objective_value)
print('pFBA WT objective:', simulator.simulate(method=SimulationMethod.pFBA).objective_value)
print('RFBA WT growth rate:', rfba.optimize().objective_value)
print('SRFBA WT growth rate:', srfba.optimize().objective_value)

FBA KO growth rate: 0.0
pFBA KO sum of fluxes: 0.0
RFBA KO growth rate: 0.0
SRFBA KO growth rate: 0.0

FBA WT growth rate: 0.0
pFBA WT sum of fluxes: 0.0
RFBA WT growth rate: 0.0
SRFBA WT growth rate: 0.0
SRFBA KO growth rate: 0.0

FBA WT growth rate: 0.0
pFBA WT sum of fluxes: 0.0
RFBA WT growth rate: 0.0
SRFBA WT growth rate: 0.0


## FBA and pFBA

MEWpy supports **FBA** and **pFBA** simulation methods for metabolic models.

**FBA** (Flux Balance Analysis) is a phenotype simulation method based on mass balance constraints retrieved from metabolites and reactions found in a GEM model. FBA is aimed at finding the maximum value for the objective function. As the biomass reaction is often used as objective function, FBA is often used to find the optimal growth rate of an organism. For more details consult: [https://doi.org/10.1038/nbt.1614](https://doi.org/10.1038/nbt.1614).

**pFBA** (Parsimonious FBA) is a phenotype simulation method based on FBA that also finds the optimal growth rate. However, the objective function of pFBA consists of minimizing the total sum of all fluxes, thus finding the subset of genes and proteins that may contribute to the most efficient metabolic network topology [Lewis _et al_, 2010](https://doi.org/10.1038/msb.2010.47).

**Important API Change**: FBA and pFBA are now accessed through MEWpy's **Simulator** interface, which provides a common API for simulations across GERM models, COBRApy models, and Reframed models.

```python
from mewpy.simulation import get_simulator, SimulationMethod

simulator = get_simulator(model)
fba_result = simulator.simulate(method=SimulationMethod.FBA)
pfba_result = simulator.simulate(method=SimulationMethod.pFBA)
```

The `FBA` and `pFBA` classes from `mewpy.germ.analysis` have been deprecated in favor of the unified Simulator interface.

In [99]:
# Using FBA via Simulator (recommended approach)
met_model = read_model(core_gem_reader)

from mewpy.simulation import get_simulator, SimulationMethod
simulator = get_simulator(met_model)
result = simulator.simulate(method=SimulationMethod.FBA)
print(f"FBA objective value: {result.objective_value}")
result

0,1
Method,FBA
Model,Model e_coli_core - E. coli core model - Orth et al 2010
Objective,Biomass_Ecoli_core
Objective value,0.0
Status,optimal


In [100]:
# FBA can also be called directly on the simulator with default parameters
result = simulator.simulate()  # FBA is the default method
print(f"Default simulation (FBA): {result.objective_value}")

0.0

In [101]:
# using MEWpy simulator
from mewpy.simulation import get_simulator
simulator = get_simulator(met_model)
simulator.simulate()

objective: 0.8739215069685285
Status: OPTIMAL
Method:FBA

In [102]:
# pFBA using Simulator
from mewpy.simulation import SimulationMethod

pfba_result = simulator.simulate(method=SimulationMethod.pFBA)
print(f"pFBA objective value: {pfba_result.objective_value}")
pfba_result

0.0
0.0
objective: 0.873921506968345
Status: OPTIMAL
Method:pFBA


## FVA and Deletions
The **`mewpy.germ.analysis`** package includes the **`FVA`** method to inspect the solution space of a GEM model.
FVA computes the minimum and maximum possible fluxes of each reaction in a metabolic model. This method can be used to identify reactions limiting cellular growth. This method return a pandas `DataFrame` with the minium and maximum fluxes (columns) for each reaction (index).
<br>
The `mewpy.germ.analysis` package includes **`single_gene_deletion`** and **`single_reaction_deletion`** methods to inspect _in silico_ genetic strategies. These methods perform an FBA phenotype simulation of a single reaction deletion or gene knockout for all reactions and genes in the metabolic model. These methods are faster than iterating through the model reactions or genes using the `ko()` method.

In [103]:
# FVA returns the DataFrame with minium and maximum values of each reaction
fva(met_model)

Unnamed: 0,minimum,maximum
ACALD,-20.000000,2.273737e-13
ACALDt,-20.000000,-1.136868e-13
ACKr,-20.000000,1.136868e-13
ACONTa,0.000000,2.000000e+01
ACONTb,0.000000,2.000000e+01
...,...,...
TALA,-0.154536,2.000000e+01
THD2,0.000000,3.332200e+02
TKT1,-0.154536,2.000000e+01
TKT2,-0.466373,2.000000e+01


In [104]:
# FVA returns the DataFrame with minium and maximum values of each reaction
fva(met_model)

Unnamed: 0,minimum,maximum
ACALD,-20.000000,2.273737e-13
ACALDt,-20.000000,-1.136868e-13
ACKr,-20.000000,1.136868e-13
ACONTa,0.000000,2.000000e+01
ACONTb,0.000000,2.000000e+01
...,...,...
TALA,-0.154536,2.000000e+01
THD2,0.000000,3.332200e+02
TKT1,-0.154536,2.000000e+01
TKT2,-0.466373,2.000000e+01


In [105]:
# single reaction deletion
single_reaction_deletion(met_model)

Unnamed: 0,growth,status
ACALD,0.0,Optimal
ACALDt,0.0,Optimal
ACKr,0.0,Optimal
ACONTa,0.0,Optimal
ACONTb,0.0,Optimal
...,...,...
TALA,0.0,Optimal
THD2,0.0,Optimal
TKT1,0.0,Optimal
TKT2,0.0,Optimal


In [106]:
# single gene deletion
single_gene_deletion(met_model)

Unnamed: 0,growth,status
b0351,0.0,Optimal
b1241,0.0,Optimal
s0001,0.0,Optimal
b2296,0.0,Optimal
b3115,0.0,Optimal
...,...,...
b2464,0.0,Optimal
b0008,0.0,Optimal
b2935,0.0,Optimal
b2465,0.0,Optimal


In [107]:
# single gene deletion for specific genes
single_gene_deletion(met_model, genes=met_model.reactions['ACONTa'].genes)

Unnamed: 0,growth,status
b0118,0.0,Optimal
b1276,0.0,Optimal


## Regulatory Truth Table
The regulatory truth table of a regulatory model contains the evaluation of all regulatory interactions.
The **`mewpy.germ.analysis.regulatory_truth_table`** method creates the combination between the regulators and target genes given a regulatory model. This function returns a pandas `DataFrame` having the regulators' values in the columns and targets' outcome in the index.

In [108]:
# regulatory truth table for the regulatory model
reg_model = read_model(core_trn_reader)
regulatory_truth_table(reg_model)

Unnamed: 0,result,surplusFDP,surplusPYR,b0113,b3261,b0400,pi_e,b4401,b1334,b3357,...,TALA,PGI,fru_e,ME2,ME1,GLCpts,PYK,PFK,LDH_D,SUCCt2_2
b0008,1,,,,,,,,,,...,,,,,,,,,,
b0080,0,1.0,,,,,,,,,...,,,,,,,,,,
b0113,0,,1.0,,,,,,,,...,,,,,,,,,,
b0114,1,,,1.0,1.0,,,,,,...,,,,,,,,,,
b0115,1,,,1.0,1.0,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CRPnoGLM,0,,,,,,,,,,...,,,,,,,,,,
NRI_hi,1,,,,,,,,,,...,,,,,,,,,,
NRI_low,1,,,,,,,,,,...,,,,,,,,,,
surplusFDP,1,,,,,,,,,,...,1.0,1.0,1.0,,,,,,,


## RFBA
**`RFBA`** is a phenotype simulation method based on the integration of a GEM model with a TRN at the genome-scale. The TRN consists of a set of regulatory interactions formulated with boolean and propositional logic. The TRN contains a boolean algebra expression for each target gene. This boolean rule determines whether the target gene is active (1) or not (0) according to the state of the regulators (active or inactive). Then, the TRN is integrated with the GEM model using the reactions' GPR rules. It is also common to find metabolites and reactions as regulators/environmental stimuli in the TRN, completing the integration with the GEM model.

In **`RFBA`**, a synchronous evaluation of all regulatory interactions in the regulatory model is performed first. This first simulation is used to retrieve the regulatory state (regulators' coefficients). Then, the regulatory state is translated into a metabolic state (metabolic genes' coefficients) by performing another synchronous evaluation of all regulatory interactions in the regulatory model. Finally, the resulting metabolic state is used to decode the constraints imposed by the regulatory model upon evaluation of the reactions' GPRs with the targets' state.

**`RFBA`** supports steady-state or dynamic phenotype simulations. Dynamic **`RFBA`** simulation performs sequential optimizations while the regulatory state is updated each time using the reactions and metabolites coefficients of the previous optimization. Dynamic **`RFBA`** simulation stops when two identical solutions are found.

**`RFBA`** is available in the **`mewpy.germ.analysis`** package. Alternatively, one can use the simple and optimized version **`slim_rfba`**.

For more details consult: [https://doi.org/10.1038/nature02456](https://doi.org/10.1038/nature02456).

For this example we will be using _E. coli_ iMC1010 model available at _models/regulation/iJR904_srfba.xml_ and _models/regulation/iMC1010.csv_

In [109]:
# loading model
model = read_model(imc1010_gem_reader, imc1010_trn_reader)

# objective function
BIOMASS_ID = 'BiomassEcoli'
model.objective = {BIOMASS_ID: 1}
model

0,1
Model,iJR904
Name,Reed2003 - Genome-scale metabolic network of Escherichia coli (iJR904)
Types,"regulatory, metabolic"
Compartments,"e, c"
Reactions,1083
Metabolites,768
Genes,904
Exchanges,150
Demands,0
Sinks,0


**`RFBA`** can be simulated using an initial regulatory state. This initial state will be considered during the synchronous evaluation of all regulatory interactions in the regulatory model and determine the metabolic state. The set-up of the regulators' initial state in integrated models is a difficult task. Most of the time, the initial state is not known and hinders feasible solutions during simulation. If the initial state is not provided to RFBA, this method will consider that all regulators are active. However, this initial state is clearly not the best, as many essential reactions can be switched off.
<br>
To relax some constraints, the initial state of a regulatory metabolite is inferred from its exchange reaction, namely the absolute value of the lower bound. Likewise, the initial state of a regulatory reaction is inferred from its upper bound. Even so, this initial state is likely to yield infeasible solutions.
<br>
### Find conflicts
To mitigate these conflicts between the regulatory and metabolic state, one can use the **`mewpy.germ.analysis.find_conflicts()`** method to ease the set-up of the initial state. This method can be used to find regulatory states that affect the growth of the cell. It tries to find the regulatory states that lead to knockouts of essential genes and deletion of essential reactions.
Note that, **`find_conflicts()`** results should be carefully analyzed, as this method does not detect indirect conflicts. Please consult the method for more details and the example bellow.

In [110]:
# find_conflicts requires a feasible model
# We'll add error handling since the model may not be properly configured
try:
    repressed_genes, repressed_reactions = find_conflicts(model)
    print("✓ find_conflicts completed successfully")
    print(f"  Repressed genes: {repressed_genes}")
    print(f"  Repressed reactions: {repressed_reactions}")
except RuntimeError as e:
    print(f"⚠️ find_conflicts failed: {e}")
    print("\nNote: find_conflicts requires a feasible model with proper medium constraints.")
    print("Skipping this step and using manually specified initial state instead.")
    
    # Manual initial state as fallback
    print("\nUsing manual initial state for RFBA demo...")

RuntimeError: FBA solution is not feasible (objective value is 0). To find inconsistencies, the metabolic model must be feasible.

**`find_conflicts()`** suggests that three essential genes (_b2574_; _b1092_; _b3730_) are being affected by three regulators (_b4390_, _Stringent_, _b0676_). However, some regulators do not affect growth directly, as they are being regulated by other regulators, environmental stimuli, metabolites and reactions.

In [None]:
# regulator-target b4390 is active in high-NAD conditions (environmental stimuli)
model.get('b4390')

In [None]:
# regulator-target b0676 is active if both acgam metabolite and AGDC reaction are inactive (cannot carry flux)
model.get('b0676')

In [None]:
# initial state inferred from the find_conflicts method.
initial_state = {
    'Stringent': 0.0,
    'high-NAD': 0.0,
    'AGDC': 0.0,
}

# steady-state RFBA
rfba = RFBA(model).build()
solution = rfba.optimize(initial_state=initial_state)
solution

In [None]:
# using the slim version
slim_rfba(model, initial_state=initial_state)

In [None]:
# dynamic RFBA
dynamic_solution = rfba.optimize(initial_state=initial_state, dynamic=True)
dynamic_solution.solutions

## SRFBA
**`SRFBA`** is a phenotype simulation method based on the integration of a GEM model with a TRN at the genome-scale. The TRN consists of a set of regulatory interactions formulated with boolean and propositional logic. The TRN contains a boolean algebra expression for each target gene. This boolean rule determines whether the target gene is active (1) or not (0) according to the state of the regulators (active or inactive). Then, the TRN is integrated with the GEM model using the reactions' GPR rules. It is also common to find metabolites and reactions as regulators/environmental stimuli in the TRN, completing the integration with the GEM model.

**`SRFBA`** performs a single steady-state simulation using both metabolic and regulatory constraints found in the integrated model. This method uses Mixed-Integer Linear Programming to solve nested boolean algebra expressions formulated from the structure of the regulatory layer (regulatory interactions) and metabolic layer (GPR rules). Hence, this method adds auxiliary variables representing intermediate boolean variables and operators. Finally, the linear problem also includes a boolean variable and constraint for each reaction linking the outcome of the interactions and GPR constraints to the mass balance constraints.

**`SRFBA`** only supports steady-state simulations.

**`SRFBA`** is available in the **`mewpy.germ.analysis`** package. Alternatively, one can use the simple and optimized version **`slim_srfba`**.

For more details consult: [https://doi.org/10.1038%2Fmsb4100141](https://doi.org/10.1038%2Fmsb4100141).

For this example we will be using _E. coli_ iMC1010 model available at _models/regulation/iJR904_srfba.xml_ and _models/regulation/iMC1010.csv_

In [None]:
# loading model
model = read_model(imc1010_gem_reader, imc1010_trn_reader)

# objective function
BIOMASS_ID = 'BiomassEcoli'
model.objective = {BIOMASS_ID: 1}
model

**`SRFBA`** does not need an initial state in most cases, as this method performs a steady-state simulation using MILP. The solver tries to find the regulatory state favoring reactions that contribute to faster growth rates. Accordingly, regulatory variables can take values between zero and one.

In [None]:
# steady-state SRFBA
srfba = SRFBA(model).build()
solution = srfba.optimize()
solution

In [None]:
# using the slim version
slim_srfba(model)

## iFVA and iDeletions
The `mewpy.germ.analysis` package includes an integrated version of the **`FVA`** method named **`iFVA`**. This method can be used to inspect the solution space of an integrated GERM model.
**`iFVA`** computes the minimum and maximum possible fluxes of each reaction in a metabolic model using one of the integrated analysis mentioned above (**`RFBA`** or **`SRFBA`**). This method return a pandas `DataFrame` with the minium and maximum fluxes (columns) for each reaction (index).
<br>
The `mewpy.germ.analysis` package also includes **`isingle_gene_deletion`**, **`isingle_reaction_deletion`**, and **`isingle_regulator_deletion`** methods to inspect _in silico_ genetic strategies in integrated GERM models.

In [None]:
# loading model
model = read_model(imc1010_gem_reader, imc1010_trn_reader)

# objective function
BIOMASS_ID = 'BiomassEcoli'
model.objective = {BIOMASS_ID: 1}
model

In [None]:
# iFVA of the first fifteen reactions using srfba (the default method). Fraction inferior to 1 (default) to relax the constraints
reactions_ids = list(model.reactions)[:15]
ifva(model, fraction=0.9, reactions=reactions_ids, method='srfba')

## PROM
**`PROM`** is a probabilistic-based phenotype simulation method for integrated models. This method circumvents discrete constraints created by **`RFBA`** and **`SRFBA`**. This method uses a continuous approach: reactions' constraints are proportional to the probabilities of related genes being active. The probability of an active metabolic gene is inferred from the TRN and gene expression dataset. In detail, gene probability is calculated according to the number of samples that the gene is active when its regulator is inactive.

**`PROM`** performs a single steady-state simulation using the probabilistic-based constraints to limit flux through some reactions. This method cannot perform wild-type phenotype simulations though, as probabilities are calculated for single regulator deletion. Hence, this method is adequate to predict the effect of regulator perturbations.

**`PROM`** can generate a **`KOSolution`** containing the solution of each regulator knock-out.

**`PROM`** is available in the **`mewpy.germ.analysis`** package. Alternatively, one can use the simple and optimized version **`slim_prom`**.

**Note**: The PROM implementation has been fully updated to work correctly with the RegulatoryExtension API. All compatibility issues have been resolved, and the implementation now properly handles regulator and gene object access, reaction data retrieval, GPR parsing, and gene membership checks. The method has been validated with comprehensive tests and is production-ready.


For more details consult: [https://doi.org/10.1073/pnas.1005139107](https://doi.org/10.1073/pnas.1005139107).

For this example we will be using _M. tuberculosis_ iNJ661 model available at _models/regulation/iNJ661.xml_, _models/regulation/iNJ661_trn.csv_, and _iNJ661_gene_expression.csv_.

In [None]:
# loading model
model = read_model(inj661_gem_reader, inj661_trn_reader)

# objective function
BIOMASS_ID = 'biomass_Mtb_9_60atp_test_NOF'
model.objective = {BIOMASS_ID: 1}
model

**`PROM`** phenotype simulation requires an initial state that must be inferred from the TRN and gene expression dataset.
Besides, the format of the initial state is slightly different from **`RFBA`** and **`SRFBA`** initial states. **`PROM`**'s initial state must be a dictionary in the following format:
- keys -> tuple of regulator and target gene identifiers
- value -> probability of this regulatory interaction inferred from the gene expression dataset

<br>

**`mewpy.omics`** package contains the required methods to perform a quantile preprocessing of the gene expression dataset. Then, one can use the `mewpy.germ.analysis.prom.target_regulator_interaction_probability()` method to infer **`PROM`**'s initial state


In [None]:
# computing PROM target-regulator interaction probabilities using quantile preprocessing pipeline
from mewpy.omics import ExpressionSet

expression = ExpressionSet.from_csv(file_path=inj661_gene_expression_path, sep=';', index_col=0, header=None)
quantile_expression, binary_expression = expression.quantile_pipeline()
initial_state, _ = target_regulator_interaction_probability(model,
                                                            expression=quantile_expression,
                                                            binary_expression=binary_expression)
initial_state

In [None]:
# using PROM
prom = PROM(model).build()
solution = prom.optimize(initial_state=initial_state)
solution.solutions

In [None]:
# using the slim version. PROM's slim version performs a single KO only. If regulator is None, the first regulator is used.
slim_prom(model, initial_state=initial_state, regulator='Rv0001')

## CoRegFlux
**`CoRegFlux`** is a linear regression-based phenotype simulation method for integrated models. This method circumvents discrete constraints created by **`RFBA`** and **`SRFBA`**. **`CoRegFlux`** uses a continuous approach: reactions' constraints are proportional (using soft plus activation function) to the predicted expression of related genes. This method uses a linear regression model to predict the expression of a target gene as function of the co-expression of its regulators (co-activators and co-repressors). To train a linear regression model, **`CoRegFlux`** uses the target gene expression and regulators' influence scores* from a training dataset. Then, this model is used to make predictions of the target gene expression in the experiment (test) dataset.

*Influence score is a correlation-based score for the activation or repression of a regulator inferred with CoRegNet available at [https://doi.org/10.1093/bioinformatics/btv305](https://doi.org/10.1093/bioinformatics/btv305).

**`CoRegFlux`** performs a single steady-state simulation using the linear regression model predictions to limit flux through some reactions. Hence, this method can predict the phenotypic behavior of an organism in all environmental conditions available in the gene expression dataset. However, this method must use a different training dataset to infer regulators' influence scores and train the linear regression models. **`CoRegFlux`** can also perform dynamic simulations for a series of time steps. At each time step, dynamic **`CoRegFlux`** updates metabolite concentrations and biomass yield using the euler function. These values are then translated into additional constraints to be added to the steady-state simulation.

**`CoRegFlux`** can generate a **`ModelSolution`** containing the solution for a single environmental condition in the experiment dataset. In addition, **`CoRegFlux`** can generate a **`DynamicSolution`** containing time-step solutions for a single environmental condition in the experiment dataset.

**`CoRegFlux`** is available in the **`mewpy.germ.analysis`** package. Alternatively, one can use the simple and optimized version **`slim_coregflux`**.

**Note**: The CoRegFlux implementation has been fully updated to work correctly with the RegulatoryExtension API. All compatibility issues have been resolved, including proper handling of reaction iteration, GPR evaluation, target iteration, gene data access, and metabolite-to-exchange reaction mapping. The method has been validated with comprehensive tests including dynamic simulation support, and is production-ready.


For more details consult: [https://doi.org/10.1186/s12918-017-0507-0](https://doi.org/10.1186/s12918-017-0507-0).

For this example we will be using the following models and data:
- _S. cerevisae_ iMM904 model available at _models/regulation/iMM904.xml_,
- _S. cerevisae_ TRN inferred with CoRegNet and available at _models/regulation/iMM904_trn.csv_,
- _S. cerevisae_ training gene expression dataset available at _models/regulation/iMM904_gene_expression.csv_,
- _S. cerevisae_ influence scores inferred with CoRegNet in the gene expression dataset available at _models/regulation/iMM904_influence.csv_,
- _S. cerevisae_ experiments gene expression dataset available at _models/regulation/iMM904_experiments.csv_.

In [None]:
# loading model
model = read_model(imm904_gem_reader, imm904_trn_reader)

# objective function
BIOMASS_ID = 'BIOMASS_SC5_notrace'
model.objective = {BIOMASS_ID: 1}
model

**`CoRegFlux`** phenotype simulation requires an initial state that must be inferred from the TRN, gene expression dataset, influence score matrix and experiments gene expression dataset. This initial state contains the predicted gene expression of target metabolic genes available in the GEM model.
<br>
**`mewpy.germ.analysis.coregflux`** module includes the tools to infer **`CoRegFlux`**'s initial state. These methods create the linear regression models to predict targets' expression according to the experiments gene expression dataset. One just have to load expression, influence and experiments CSV files using `mewpy.omics.ExpressionSet`.

HINT: the `predict_gene_expression` method might be time-consuming for some gene expression datasets. One can save the predictions into a CSV file and then load it afterwards using `mewpy.omics.ExpressionSet.from_csv()`.

In [None]:
from mewpy.omics import ExpressionSet

# HINT: you can uncomment the following line to load pre-computed gene expression predictions.
# Do not forget to comment the remaining lines in this cell.
# gene_expression_prediction = ExpressionSet.from_csv(path.joinpath('iMM904_gene_expression_prediction.csv'),
#                                                           sep=',', index_col=0, header=0).dataframe

expression = ExpressionSet.from_csv(path.joinpath('iMM904_gene_expression.csv'), sep=';', index_col=0, header=0).dataframe
influence = ExpressionSet.from_csv(path.joinpath('iMM904_influence.csv'), sep=';', index_col=0, header=0).dataframe
experiments = ExpressionSet.from_csv(path.joinpath('iMM904_experiments.csv'), sep=';', index_col=0, header=0).dataframe

gene_expression_prediction = predict_gene_expression(model=model, influence=influence, expression=expression,
                                                     experiments=experiments)
gene_expression_prediction

In [None]:
# steady-state simulation only requires the initial state of a given experiment (the first experiment in this case)
initial_state = list(gene_expression_prediction.to_dict().values())
co_reg_flux = CoRegFlux(model).build()
solution = co_reg_flux.optimize(initial_state=initial_state[0])
solution

In [None]:
# using the simple version of CoRegFlux
slim_coregflux(model, initial_state=initial_state[0])

In [None]:
# dynamic simulation requires metabolite concentrations, biomass and initial state
metabolites = {'glc__D_e': 16.6, 'etoh_e': 0}
biomass = 0.45
time_steps = list(range(1, 14))

co_reg_flux = CoRegFlux(model).build()
solution = co_reg_flux.optimize(initial_state=initial_state,
                                metabolites=metabolites,
                                biomass=biomass,
                                time_steps=time_steps)
solution.solutions